Apache Ignite Baseline Topology by Examples
To better understand the baseline topology concept, let’s start at the beginning and try to understand what problems it helps us solve.
Join the DZone community and get the full member experience.
Join For FreeIgnite Baseline Topology, or BLT, represents a set of server nodes in the cluster that persists data on disk.
N1-2 and N5 server nodes are members of the Ignite clusters with native persistence, which enable data to persist on disk. N3-4 and N6 server nodes are members of the Ignite cluster but not a part of the baseline topology.
The nodes from the baseline topology are part of a regular server node that stores data in-memory and on the disk, and also participates in computing tasks. Ignite clusters can have different nodes that are not a part of the baseline topology, such as:
Server nodes that are not used in Ignite native persistence to persist data on disk. Usually, they store data in memory or persist data to a third-party database or NoSQL. In the above equitation, node N3 or N4 might be one of them.
Client nodes that do not store shared data.
To better understand the baseline topology concept, let’s start at the beginning and try to understand what problems it helps us solve.
A database like Ignite is designed to support massive data storage and processing. Ignite databases are highly scalable and fault-tolerant. This high scalability feature of Ignite brings a few challenges for the database administrator, such as: how to manage a cluster? How to add/remove nodes correctly or how to rebalance data after add/remove nodes? Because an Ignite cluster with a multitude of nodes can significantly increase the complexity of the data infrastructure. Let’s look at the example of Apache Ignite.
The Ignite in-memory mode cluster, as a concept, is very simple. There are no master or dedicated node in the cluster, and every node is equal. Each node stores a subset of partitions and can be joined in distributed computing or deployed on any services. In case of any node failures, user requests are served by the other nodes, and the data of the failed nodes will be no longer available. The Ignite cluster management operations are very similar, as follows:
To run a cluster, start all nodes.
To expand the cluster topology, add some nodes.
To reduce the cluster topology, remove some nodes.
Portions of this article were taken from the book The Apache Ignite book. If it got you interested, check out the rest of the book for more helpful information. There is a special 20% discount for the DZone readers, please use the following coupon.
Data redistributes between nodes automatically. Depending on the backup copy configuration of the caches, data partitions move from one node to another.
In the persistence mode, the node keeps their state even after the restart. During any read operation, data is read from the disk and restores the node state. Therefore, unlike in-memory mode, restart of a node in persistence mode does not need to redistribute data from one node to another. The data during node failure will be restored from the disk. This strategy opens up the opportunities to not only preventing moving a massive amount of data during node failure, but also reduces the startup times of the entire cluster after a restart. So, we need to distinguish somehow these nodes that can save their state after restart. The Ignite baseline topology provides this capability.
In a nutshell, Ignite baseline topology is a collection of nodes that have been configured for storing persistence data on disk. Baseline topology tracks the history of the topology changes and prevents data discrepancies in the cluster during recovery. Let’s refresh our memories of the goals of the baseline topology:
Avoid redundant data rebalancing if a node is being rebooted.
Automatically activate a cluster once all the nodes of the baseline topology have joined after a cluster restart.
Prevent data inconsistencies in the case of split-brain.
Apache Ignite provides a command line (CLI) tool that allows you to monitor and manage a cluster baseline topology. In this article, we will review several common scenarios of baseline topology administration with this tool when Ignite persistence is used.
The ./control.sh command line script can be found under /bin folder of an Apache Ignite distribution directory. The primary goal of this script (tool) is to activate/deactivate and management of a set of nodes that represent the baseline topology. However, this tool is a multi-purpose tool and can be actively used for monitoring the cache states or detecting any transaction locks that could occur in the entire cluster.
Preparing the sandbox. As we stated before, the script that runs the tool is located in the {Ignite_home}/bin folder and called control.sh. There are versions of the script for Unix (control.sh) and Windows (control.bat). For demonstration purpose I will use the following configurations:
Name |
Description |
OS |
MacOS, you can use Windows or Linux operating system by your choice. |
Ignite version |
2.6.0 or above. |
The number of Ignite nodes |
3 nodes in a single host. |
JVM |
1.8 |
TCP discovery |
Multicast |
Step 1. We are going to run three Ignite nodes on persistence mode in a single host. By default, Ignite creates a WORK directory under the IGNITR_HOME folder for storing WAL archives and log files. Download the Ignite distribution and unarchive it in 3 different directories on your operating system, for example/usr/ignite/2.6.0-s1, /usr/ignite/2.6.0-s2, /usr/ignite/2.6.0-s3. You should have a similar folder hierarchy as shown in figure 4.
Note that, it is the simplest way to run a few nodes with persistence enable in a single host without any extra configuration. However, you can configure Ignite such a way that allows you to run a few Ignite nodes with different WAL archive folders.
Step 2. To enable the persistence store, we use the Ignite data storage configuration through Spring. Create an XML file with name ignite-book-baseline.xml and copy the following content in it.
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="TestCache"/>
<property name="atomicityMode" value="ATOMIC"/>
<property name="backups" value="1"/>
</bean>
</list>
</property>
<!-- Enabling Apache Ignite Persistent Store. -->
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
<property name="metricsEnabled" value="true"/>
</bean>
</property>
</bean>
</property>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<value>127.0.0.1:47500..47509</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
Save the file somewhere in your filesystem.
Step 3. We will be starting each Ignite server node one at a time starting with our first Ignite node. Open a terminal and change the IGNITE_HOME directory to the folder where you unarchive the Ignite distribution for the Ignite node 1.
export IGNITE_HOME=PATH_TO_THE_IGNITE_NODE_ONE/ignite/2.6.0-s1
Now, start the first Ignite node with the following command:
ignite.sh /PATH_TO_THE_SPRING_CONFIG_FILE/ignite/ignite-book-baseline.xml
Your output on the console should look like this:
__________ ________________ / _/ ___/ |/ / _/_ __/ __/
_///(77 /// ///_/ /___/\___/_/|_/___/ /_/ /___/
ver. 2.6.0#20180710-sha1:669feacc
2018 Copyright(C) Apache Software Foundation
Ignite documentation: http://ignite.apache.org Quiet mode.
^-- Logging to file '/usr/ignite/2.6.0-s1/work/log/ignite-f0ef6ecc.0.log'
Topology snapshot [ver=1, servers=1, clients=0, CPUs=8, offheap=3.2GB, heap=1.\
^-- Node [id=F0EF6ECC-D692-4862-9414-709039FE00CD, clusterState=INACTIVE] Data Regions Configured:
^-- default [initSize=256.0 MiB, maxSize=3.2 GiB, persistenceEnabled=true]
Inspect the logs displays on the console. Log messages confirm that our first Ignite server is up and running and the persistence mode is enabled. Now do the same thing again for the second Ignite node.
export IGNITE_HOME=PATH_TO_THE_IGNITE_NODE_ONE/ignite/2.6.0-s2
ignite.sh /PATH_TO_THE_SPRING_CONFIG_FILE/ignite/ignite-book-baseline.xml
At this moment, you can see that the second Ignite node started on persistence mode and joined to the cluster. You should see very similar messages in the terminal as shown below.
[16:13:35] >>> Ignite cluster is not active (limited functionality available). Use contro\ l.(sh|bat) script or IgniteCluster interface to activate.
[16:13:35] Topology snapshot [ver=2, servers=2, clients=0, CPUs=8, offheap=6.4GB, heap=2.\ 0GB]
[16:13:35] ^-- Node [id=6DB02F31-115C-41E4-BECC-FDB6980F8143, clusterState=INACTIVE] [16:13:35] Data Regions Configured:
[16:13:35] ^-- default [initSize=256.0 MiB, maxSize=3.2 GiB, persistenceEnabled=true]
Ignite also warned that the cluster is not activated yet and you have to activate the cluster by using control.sh script. Let’s activate the cluster and creates a few tables for storing data.
Step 4. Before we activate the cluster, let’s consider specific features of the control.sh tool. The control.sh script currently supports the following commands:
Command |
Description |
--activate |
This command switches the cluster into an active state. In this case, if there is no baseline topology exists in the cluster, a new baseline will be created during activation of the cluster. The new baseline topology will include all of the connected nodes in the cluster topology. |
--deactivate |
Deactivate the cluster. Limited functionality will be available in this state. |
--state |
Print the current cluster state. |
--baseline |
This command is designed to manage the baseline topology. When this command used without any parameters, it prints the current cluster baseline topology information. The following parameters can be used with this command: add, remove, set, and version. |
To invoke a specific command, use the following pattern:
UNIX/LINUX/MacOS
$IGNITE_HOME/bin/control.sh <command> <args>
Now, activate the cluster. Run the following command:
$IGNITE_HOME/bin/control.sh
If the command succeeds, you should see the following messages in the console.
Control utility [ver. 2.6.0#20180710-sha1:669feacc] 2018 Copyright(C) Apache Software Foundation
User: shamim
--------------------------------------------------------------------------------
Cluster activated
At this moment, you can also use the --state command to check the current cluster state. The--state command should return a message that the cluster is activated.
Step 5. Now, create a table and populate some data. We use the SQLLINE tool to connect to the cluster. Run the following command to start the SQLLINE tool:
sqlline.sh --color=true --verbose=true -u jdbc:ignite:thin://127.0.0.1/
Create a table named EMP and insert 1000 rows into the table. Use the following DDL script to create the EMP table as follows:
CREATE TABLE IF NOT EXISTS EMP
(
empno LONG, ename VARCHAR, job VARCHAR, mgr INTEGER, hiredate DATE,
sal LONG,
comm LONG,
deptno LONG,
CONSTRAINT pk_emp PRIMARY KEY (empno)
) WITH "template=partitioned,CACHE_NAME=EMPcache";
Next, use the EMP_001.sql script from the GitHub repository to insert 1000 entries into the table.
0: jdbc:ignite:thin://127.0.0.1/> !run /PATH_TO_THE_FILE/the-apache-ignite-book/chapters/\ chapter-10/baseline/EMP_001.sql
The above command inserts 1000 entries into the EMP table or EMPcache. Use the visor CLI tools to see the size of the cache into the entire cluster. Run the command cache -a in the IgniteVisor console. The command should return the following output as shown in figure 5.
Take a look at the column named SIZE. This column clarifies the number of entries stored into each node. In our case, one of our nodes contains 504 entries, and the other one contains 496 entries into the EMPcache cache.
Step 6. So far, we have launched only two Ignite nodes and created a baseline topology in the cluster. Let’s start another Ignite node. Do the same thing again as before for the third Ignite node.
export IGNITE_HOME=PATH_TO_THE_IGNITE_NODE_ONE/ignite/2.6.0-s3
ignite.sh /PATH_TO_THE_SPRING_CONFIG_FILE/ignite/ignite-book-baseline.xml
Logs on the console should confirm for you that the node is started on persistence mode successfully. Moreover, you should get a warning on the console that the local node is not included in Baseline Topology and will not be used for persistent data storage. Now we can play with the --baseline command. Let’s run the command without any parameter as follows:
$IGNITE_HOME/bin/control.sh --baseline
The output might be as follows:
shamim:~ shamim$ control.sh --baseline
Control utility [ver. 2.6.0#20180710-sha1:669feacc] 2018 Copyright(C) Apache Software Foundation
User: shamim --------------------------------------------------------------------------------
Cluster state: active
Current topology version: 6
Baseline nodes:
ConsistentID=1640f655-4065-438c-92ca-478b5df91def, STATE=ONLINE ConsistentID=d8b04bc3-d175-443c-b53f-62512ff9152f, STATE=ONLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 2
Other nodes: ConsistentID=3c2ad09d-c835-4f4b-b47a-43912d04d30e
Number of other nodes: 1
The above baseline information shows the cluster state, topology version, and nodes with their consistent IDs that are part of the baseline topology, as well as those that are not part of the baseline topology. Here, the number of baseline nodes is 2, and the baseline consists of our first and second Ignite node.
Sometime it may happen that, during the first cluster activation, the baseline topology was not created. In such cases, the --baseline command will return a message like “Baseline nodes not found.” In this situation, stop the third node and wait for a few seconds. Then set the baseline topology manually by using the numerical cluster topology version as follows:
control.sh --baseline version topologyVersion
In the above command, replace the topologyVersion with the actual topology version. You can find the topology version in any Ignite node console as shown below:
Topology snapshot [ver=6, servers=3, clients=0, CPUs=8, offheap=9.6GB, heap=3.0GB]
Pick the latest topology snapshot version from the console.
In this stage, our third Ignite node is not the part of our baseline topology. This node will not be used for persistent data storage. It means if we create any new tables and insert data into it, the node will not store any data for the new table. Let’s verify the concept.
Step 7. Create a new table DEPT with the following DDL script:
CREATE TABLE IF NOT EXISTS DEPT (
deptno LONG,
dname VARCHAR,
loc VARCHAR,
CONSTRAINT pk_dept PRIMARY KEY (deptno)
) WITH "template=partitioned,CACHE_NAME=DEPTcache";
Also, insert 100 departments by using the DEPT.SQL. The DEPT.SQL script is available at the GitHub repository.
0: jdbc:ignite:thin://127.0.0.1/> !run /PATH_TO_THE_FILE/github/the-apache-ignite-book/ch\ apters/chapter-10/baseline/DEPT.sql
Now, run the command cache -a in the visor console, which should print a similar output shown in figure 6.
The above figure confirms that the third node does not contain any persistence data. However, the node that is not a part of the baseline topology can participate in any in-memory computing.
Step 8. Next, let’s add the new empty node to the baseline topology to hold persistence data. Invoke the command --baseline add <node's consistentId> to add the new node to the existing baseline.
control.sh --baseline add 3c2ad09d-c835-4f4b-b47a-43912d04d30e
In the above command, replace the consistent id 3c2ad09d-c835-4f4b-b47a-43912d04d30ewith your consistentId of the third Ignite node. After completing the --baseline add command, a message will confirm that the new baseline topology contains 3 nodes.
Cluster state: active
Current topology version: 10
Baseline nodes:
ConsistentID=1640f655-4065-438c-92ca-478b5df91def, STATE=ONLINE
ConsistentID=3c2ad09d-c835-4f4b-b47a-43912d04d30e, STATE=ONLINE
ConsistentID=d8b04bc3-d175-443c-b53f-62512ff9152f, STATE=ONLINE
-------------------------------------------------------------------------------- Number of baseline nodes: 3
Other nodes not found.
After forming the new baseline topology from 3 nodes, a data rebalancing will proceed immediately. The new empty node (in our case it’s the third node) will receive his portion of data from other nodes. If you run the cache -a command in Ignite Visor CLI again, you can confirm the data rebalancing. Figure 7 shows the result of the data rebalancing after adding the 3rd node in the baseline topology.
Now each node stores almost evenly partition of entries (about 300 entries) for cache EMPcache. However, what will happen if one of the baseline topology nodes is restarted? Let’s stop one node and try to insert some data into the table EMP.
Step 9. Stop the second node by hitting the key CRTL+X. Execute the command --baseline without any parameter to print the state of the baseline topology.
control.sh --baseline
The above command will display the current baseline topology status very similar to the next message:
--------------------------------------------------------------------------------
Cluster state: active
Current topology version: 11
Baseline nodes:
ConsistentID=1640f655-4065-438c-92ca-478b5df91def, STATE=OFFLINE
ConsistentID=3c2ad09d-c835-4f4b-b47a-43912d04d30e, STATE=ONLINE
ConsistentID=d8b04bc3-d175-443c-b53f-62512ff9152f, STATE=ONLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3
Other nodes not found
One of the nodes is offline as expected. Now try to insert some data into the EMP table by SQLLINE tool as follows:
insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (2009, 'Sall\ ie', 'Sales Associate', 96, null, 3619, 34, 78);
insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (2010, 'Cori\ ', 'Human Resources Manager', 65, null, 1291, 86, 57);
insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (2011, 'Myrt\ le', 'VP Quality Control', 88, null, 5103, 21, 48);
insert into EMP (empno, ename, job, mgr, hiredate, sal, comm, deptno) values (2012, 'Ches\ ', 'Desktop Support Technician', 46, null, 6352, 29, 21);
You should notice that a few inserts statement failed with errors which shown in the next snippet.
Caused by: class org.apache.ignite.internal.cluster.ClusterTopologyServerNotFoundExceptio\ n: Failed to map keys for cache (all partition nodes left the grid).
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSing\ leUpdateFuture.mapSingleUpdate(GridNearAtomicSingleUpdateFuture.java:562)
This error occurred because we have no backup copies for our EMP table. The node that should store the data has been stopped, and Ignite cannot store the data. To avoid such a situation, consider a cache/table with one backup. If one node fails, it will lose no data. For now, we have a few options:
• Reboot the offline node as soon as possible with minimal downtime for preventing data loss.
• Remove the offline node from the baseline topology and rebalance the data.
Step 10. Let’s remove the offline node from the baseline topology. Execute the following command:
control.sh --baseline remove 1640f655-4065-438c-92ca-478b5df91def
After completing the remove command, the baseline topology changed excluding the stopped node. Note that by removing a node from the baseline topology, you acknowledge that you will no longer be able to use the data stored on that node after its restart. At this moment, no error will occur during data manipulation into the cluster. You can insert new entries or update existing entries into the cache successfully.
Note that the node that you want to remove from the baseline topology should be disconnected from the cluster before removing from the baseline. Otherwise, the error “Failed to remove nodes from baseline” occurs, specifying the nodes that you have to stop before deleting from the baseline.
In addition to topology management, control.sh script can also be used for monitoring and control a cluster state which is well documented in Ignite site. So, please refer to the control script section of the Ignite documentation for more information.
Opinions expressed by DZone contributors are their own.
Comments