Understanding the Logical Clusters Within a Junos Space Cluster
You can connect multiple Junos Space appliances (hardware or virtual) together to form a Junos Space cluster. Figure 1 shows the logical clusters (Apache Load Balancer cluster, the JBoss cluster, and MySQL cluster) that are formed within each Junos Space cluster.
Apache Load-Balancer Cluster
The Apache HTTP server, with the mod_proxy load-balancer module enabled, runs on two nodes in the cluster at any given time. These servers form an active-standby logical cluster. They both listen on the TCP port 443 for HTTP requests from GUI and NBI clients. All clients use the virtual IP (VIP) address of the cluster to access its services. At any time, the VIP address is owned by only one node in the cluster. Hence, the Apache HTTP server on the node that owns the VIP address receives all HTTP requests from GUI and NBI clients and acts as the active load-balancer server, whereas the other server acts as the standby. A round-robin load-balancing algorithm is used to distribute requests to JBoss servers running on all nodes in the cluster. The load-balancer also employs session-stickiness to ensure that all HTTP requests from a user session are sent to the same node in the cluster. To achieve this, the server sets a cookie named JSESSIONID. The value of this cookie identifies the specific node in the cluster that serves requests that belong to this user session. All additional requests contain this cookie and the load-balancer forwards the request to the JBoss server that runs on the node that this cookie identifies.
If the Apache HTTP server on a node goes down, the server is automatically restarted by the watchdog service on that node. If this node owns the VIP address, then the GUI and NBI clients might experience a brief service outage until the Apache HTTP server is restarted. However, this outage lasts only a few seconds (typically, two seconds) and is hardly noticed by the clients. On the other hand, if the Apache HTTP server goes down on the node that does not currently own the VIP address, no side -effects are noticed by any clients or any other components. The watchdog service restarts the server and the server comes back up in about two seconds.
JBoss Cluster
The JBoss application server runs on all nodes except dedicated database nodes in the Junos Space cluster. The nodes form a single all-active logical cluster and the load-balancer server (described previously) distributes the load across all the nodes. Even if one or more of the JBoss servers in the cluster fails, the application logic still continues to be accessible from the surviving nodes. JBoss servers on all nodes are started with the same configuration and use UDP multicast to detect each other and form a single cluster. JBoss also uses UDP multicast for session replication and caching services across all the nodes.
The JBoss server does not run on Fault Monitoring and Performance Monitoring (FMPM) nodes and hosted virtual machines.
When the JBoss server on a node goes down, other nodes in the JBoss cluster detect this change and automatically reconfigure themselves to remove the failed node from the cluster. The time taken by other cluster members to detect a failed JBoss server depends on whether the JBoss server process crashed abnormally or is unresponsive. In the former case, cluster members detect the failure immediately (around two seconds) because their TCP connections to the crashed JBoss server are closed by the operating system. In the latter case, cluster members detect the failure in about 52 seconds. If a JBoss server crashes, the JBoss server is restarted automatically by the watchdog service (jmp-watchdog) running on the node. When the JBoss server comes back up, the JBoss server is automatically discovered by other cluster members and added to the cluster. The JBoss server then synchronizes its cache from the other nodes in the cluster. The typical restart time for the JBoss server is two to five minutes, but it can take more time depending on the number of applications installed, the number of devices being managed, the number of DMI schema versions installed, and so forth.
One JBoss server in the cluster always acts as the primary of the cluster. The main purpose of the primary designation is to host services that are deployed as cluster-wide singletons (HA singletons)—for example, services that must be deployed on only one server in the cluster at any time. Junos Space uses a several services of this type, including the Job Poller service, which provides a single timer for scheduling jobs across the cluster, and the Distributed Resource Manager (DRM) service, which monitors and manages the nodes in the cluster. These services are deployed only on the JBoss server that is designated as the primary.
This does not mean that the primary does not host other services. Non-cluster singleton services are also hosted on the primary server. Junos Space is configured such that the first JBoss server that comes up in the cluster becomes the primary. If the primary server goes down, other members in the JBoss cluster detect this and elect a new primary.
MySQL Cluster
The MySQL server runs on two nodes in the Junos Space cluster at any given time. These nodes form a logical active-standby cluster and both nodes listen on TCP port 3306 for database requests from JBoss servers. By default, JBoss servers are configured to use the Virtual IP (VIP) address of the cluster to access database services. At any time, the VIP address is owned by only one node in the cluster. Thus, the MySQL server on the node that owns the VIP address receives all database requests from the JBoss server, which acts as the active database server while the other server acts as the standby.
If you want to improve the performance of Junos Space Network Management Platform and Junos Space applications, you can add two Junos Space nodes to run as dedicated database nodes. When you add any two Junos Space nodes as the primary and secondary database nodes, the MySQL server is moved to the two dedicated database nodes and is disabled on the first two nodes of the Junos Space cluster. This frees system resources on the Junos Space active VIP node, improving the performance of the node.
JBoss servers use a separate database virtual IP (VIP) address to access database services on dedicated database nodes. You specify the VIP address for the database when you add nodes as dedicated database nodes to the Junos Space cluster. This VIP address is owned by the node designated the primary database node. The MySQL server on the primary database node acts as the active database server, and the server on the secondary database node acts as the standby. Figure 2 shows the logical clusters (Apache Load Balancer cluster, the JBoss cluster, and MySQL cluster) that are formed within a Junos Space cluster when you have dedicated database nodes as part of the Junos Space cluster.
MySQL servers on each of the nodes are configured with unique server IDs. The primary-/backup relationship is also configured symmetrically on the nodes so that the server on the first node is configured with the second node as the primary; and the server on the second node is configured with the first node as the primary. Thus, both nodes are capable of acting as a backup to the other, and the server running on the node that owns the VIP address acts as the primary at any time, which ensures that the primary-backup relationship switches dynamically as the VIP ownership switches from one node to the other. All transactions committed on the active (primary) server are replicated to the standby (backup) server in near real time, by means of the asynchronous replication solution [2] provided by MySQL, which is based on the binary logging mechanism. The MySQL server operating as the primary (the source of the database changes) writes updates and changes as “events” to the binary log. The information in the binary log is stored in different logging formats according to the database changes that are recorded. The backup server is configured to read the binary log from the primary and to execute all the events in the binary log on the backup's local database.
If the MySQL server on a node goes down, the server is restarted automatically by the watchdog service on that node. When restarted, the MySQL server should come up within 20 to 60 seconds. If this node owns the VIP address, JBoss might experience a brief database outage for this 20 to 60 second duration. Any requests that require database access fail during this period. On the other hand, if the MySQL server goes down on the node that does not currently own the VIP address, there are no side- effects noticed by JBoss. The watchdog service restarts the server and the server comes back up in less than one minute. After the server is back up, it resynchronizes with the primary in the background and the resynchronization time depends on the number of changes that occurred during the outage.
Cassandra Cluster
Starting in Release 15.2R2, Cassandra cluster is an optional logical cluster that you can include within the Junos Space cluster. The Cassandra cluster is formed when there are two or more dedicated Cassandra nodes or two or more JBoss nodes with the Cassandra service running, or a combination of both, within the Junos Space fabric. You can choose to run the Cassandra service on none or all of the nodes in the fabric except dedicated database nodes and FMPM nodes. The Cassandra service running on Junos Space nodes provides a distributed file system to store device images and files from Junos Space applications (such as Juniper Message Bundle [JMB] generated by Service Now and RRD files generated by Network Director). If there are no Cassandra nodes in the fabric, device image files and Junos Space application files are stored in the MySQL database. Figure 3 shows the logical clusters (Apache Load Balancer cluster, JBoss cluster, MySQL cluster, and Cassandra cluster) that are formed within a Junos Space cluster when you have Cassandra nodes as part of the Junos Space cluster.
The Cassandra service runs on all the Cassandra nodes in an active-active configuration with real-time replication of the Cassandra database. All the files uploaded to the Cassandra database are copied to all the nodes in the Cassandra cluster. JBoss servers send requests to the Cassandra nodes in the Cassandra cluster in a round-robin manner and access the nodes by using the IP address (of the eth0 interface) of the respective Cassandra node.
If any Cassandra node goes down, Junos Space Platform cannot upload files to or delete files from the Cassandra database until the node that is down is deleted from the fabric. If all existing Cassandra nodes are deleted, the files stored in the Cassandra database are lost.