close

keyboard_arrow_left

Table of Contents Expand all

play_arrow High Availability
- play_arrow Overview
  - Junos Space High Availability Overview
- play_arrow Understanding the High Availability Software Architecture
  - Junos Space High Availability Software Architecture Overview
  - Software Components for Junos Space Nodes
- play_arrow Understanding the Junos Space Cluster (Fabric) Architecture
- play_arrow Configuring High Availability Overview
  - Configuring the Junos Space Cluster for High Availability Overview
- play_arrow High Availability Failover Scenarios
  - Understanding High-Availability Failover Scenarios
play_arrow Disaster Recovery
- play_arrow Disaster Recovery Solution
- play_arrow Configuring the Disaster Recovery Process
  - Configuring the Disaster Recovery Process Between an Active and a Standby Site
  - Stopping the Disaster Recovery Process on Junos Space Network Management Platform Release 14.1R3 and Earlier
- play_arrow Configuring the Disaster Recovery Process in the GUI
  - Validate Peer Site
  - Manage Disaster Recovery
- play_arrow Managing the Disaster Recovery Solution
- play_arrow Upgrading Junos Space Network Management Platform with Disaster Recovery Enabled
  - Upgrade Procedure

list Table of Contents

English

Configuring the Disaster Recovery Process Between an Active and a Standby Site

date_range 25-Mar-24

arrow_backward

arrow_forward

You configure disaster recovery between an active site and a standby site to ensure geographical redundancy of network management services.

Before you initiate the disaster recovery process between both sites, perform the following tasks:

Ensure that the connectivity requirements as described in the Disaster Recovery Overview topic are met.
Check whether identical cluster configurations exist on both sites. We recommend that both clusters have the same number of nodes so that, even in the case of a disaster, the standby site can operate with the same capacity as the active site.
Ensure that the same versions of Junos Space Network Management Platform, high-level Junos Space applications, and device adapters are installed at both sites.
Shut down the disaster recovery process configured on Junos Space Network Management Platform Release 14.1R3 and earlier before upgrading to Junos Space Network Management Platform Release 15.2R1 and configuring the new disaster recovery process. For more information, see Stopping the Disaster Recovery Process on Junos Space Network Management Platform Release 14.1R3 and Earlier.

You cannot configure the new disaster recovery process if you do not stop the disaster recovery you set up on 14.1R3 and earlier releases. You do not need to perform this step on a clean installation of Junos Space Network Management Platform Release 15.2R1.
Ensure that the same SMTP server configuration exists on both sites to receive e-mail alerts related to the disaster recovery process. You can add SMTP servers from the SMTP Servers task group in the Administration workspace. For more information about adding SMTP servers, see Adding an SMTP Server in the Junos Space Network Management Platform Workspaces Feature Guide.
Copy a file with the list of arbitrator devices (one IP address per row) in the CSV format or the custom failure-detection scripts on the VIP node at the active site. You can refer to the sample files at /var/cache/jmp-geo/doc/samples/.
Decide on the values for the following parameters depending on your network connectivity and disaster recovery requirements:
- VIP address and password of both the active and standby sites
- Backup, restoration, and Secure Copy Protocol (SCP) synchronization settings
- Heartbeat time intervals
- E-mail address of the administrator and the dampening interval in seconds to avoid reporting the same errors to avoid an e-mail flood
- Failure-detection settings such as the failover threshold and the time during which the standby site stays standby if the arbiter devices are unreachable

The following sections explain how to configure disaster recovery at the active and standby sites and initiate the disaster recovery between both sites.

Configuring Disaster Recovery at the Active Site

You use the jmp-dr init –a command to configure disaster recovery at the active site. You need to enter values for the parameters that are displayed. The values you enter here are saved in a configuration file.

To configure disaster recovery at the active site:

Log in to the CLI of the Junos Space node at the active site on which the VIP or the eth0:0 interface is configured.
The Junos Space Settings Menu is displayed.

Enter 7 (if you are using a virtual appliance) at the Junos Space Settings Menu prompt to run shell commands.

The following is a sample output from a virtual appliance:

content_copy zoom_out_map
admin@10.206.41.183's password:
Last login: Mon Aug 17 06:17:58 2015 from 10.206.41.42

Welcome to the Junos Space network settings utility.

Initializing, please wait


Junos Space Settings Menu

1> Change Password
2> Change Network Settings
3> Change Time Options
4> Retrieve Logs
5> Security
6> Expand VM Drive Size
7> (Debug) run shell

A> Apply changes
Q> Quit
R> Redraw Menu

Choice [1-7,AQR]: 7

You are prompted to enter the administrator password.

Enter the administrator password.
Enter jmp-dr init –a at the shell prompt.
The values you need to input to configure disaster recovery at the active site are displayed.

The Load Balancers part of the disaster recovery configuration file is displayed.
Enter the values for the parameters displayed:
1. Enter the VIP address of the standby site and press Enter.
2. Enter the administrator passwords of the load-balancer nodes at the standby site and press Enter.
  You can enter multiple passwords separated with commas.
  
  If multiple nodes use a common password, you need to enter the password only once.
3. Enter the timeout value to detect a failure in transferring files through SCP from the active site to the standby site, in seconds, and press Enter.
  The minimum and default value is 120.
4. Enter the maximum number of backups to retain at the active site and press Enter.
  The minimum and default value is 3.
5. Enter the times of the day to back up files (in hours) at the active site, separated with commas, and press Enter.
  You can enter any value from 0 through 23. You can also enter * to back up files every hour.
6. Enter the days of the week to back up files at the active site, separated with commas, and press Enter.
  You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to back up files every day.
7. Enter the times of the day to copy files (in hours) from the active site to the standby site, separated with commas, and press Enter.
  You can enter any value from 0 through 23. You can also enter * to poll files every hour.
8. Enter the days of the week to copy files from the active site to the standby site, separated with commas, and press Enter.
  You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to poll files every day.
The following is a sample output:
```
#########################
#
# Load Balancers
#
#########################

What's the vip for load balancers at the standby site? 10.206.41.225
What are the unique admin passwords for load balancer nodes at the standby site (separated by comma, no space)? $ABC123
What's the scp timeout value (seconds)? 120

# backup for data in file system instead of DB

What's the max number of backup files to keep? 3
What are the times of the day to run file backup (0-23)? 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
What are the days of the week to run file backup (0-6)? 0,1,2,3,4,5,6

# restore for data in file system instead of DB

What are the times of the day to poll files from the active site (0-23)? 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
What are the days of the week to poll files from the active site (0-6)? 0,1,2,3,4,5,6
```
When you enter the values for all parameters, the DR Watchdog part of the disaster recovery configuration file is displayed.

Enter values for the parameters displayed:

Enter the number of times the active site should send heartbeat messages to the standby site through ping after a heartbeat message times out and press Enter.
The minimum and default value is 4.
Enter the timeout value of each heartbeat message, in seconds, and press Enter.
The minimum and default value is 5.
Enter the time interval between two consecutive heartbeat messages to the standby site, in seconds, and press Enter.
The minimum and default value is 30.
Enter the e-mail address of the administrator to whom e-mail messages about disaster recovery service issues must be sent and press Enter.
Enter the time interval during which the same issues are not reported through e-mail (dampening interval), in seconds, and press Enter.
The default value is 3,600. The minimum value is 300.
Specify the failure-detection mechanism.
If you intend to use a custom failure-detection script:
- Enter Yes in the failureDetection section and press Enter.
If you intend to use the device arbitration algorithm:
1. Enter No in the failureDetection section and press Enter.
2. Enter the threshold percentage to trigger a failover to the standby site by using the device arbitration algorithm and press Enter.
  
  You can enter any value from 0 to 1. The default value is 0.5.
Enter the path of the file containing the arbiter devices or the custom failure-detection scripts and press Enter.

The following is a sample output:

content_copy zoom_out_map
#########################
#
# DR Watchdog
#
#########################


# heartbeat

What's the number of times to retry heartbeat message? 4
What's the timeout of each heartbeat message (seconds)? 5
What's the heartbeat message interval between sites (seconds)? 30

# notification

What's the contact email address of service issues? user1@example.com
What's the dampening interval between emails of affected services (seconds)? 300

# failureDetection

Do you want to use custom failure detection? No
What's the threshold percentage to trigger failover? 0.5
What's the arbiters list file (note: please refer to example in /var/cache/jmp-geo/doc/samples/arbiters.list)? /home/admin/user1
Check status of DR remote site: up
Prepare /var/cache/jmp-geo/incoming                                                                   [ OK ]
Configure contact email                                                                               [ OK ]
Modify firewall for DR remote IPs                                                                     [ OK ]
Configure NTP                                                                                         [ OK ]
Configure MySQL database                                                                              [ OK ]
Configure PostgreSQL database                                                                         [ OK ]
Copy files to DR slave                                                                                [ OK ]
Command completed.

When you have entered values for all parameters, disaster recovery is initialized at the active site.

Configuring Disaster Recovery at the Standby Site

You use the jmp-dr init –s command to configure disaster recovery at the standby site. You need to enter values for the parameters that are displayed. The values you enter here are saved in a configuration file. By default, the standby site uses the failure-detection mechanism you configured at the active site, values you entered for file backup and restoration, heartbeat, and notifications if the standby site becomes an active site.

To configure disaster recovery at the standby site:

Log in to the CLI of the Junos Space node at the standby site on which the VIP or the eth0:0 interface is configured.
The Junos Space Settings Menu is displayed.
Enter 7 while using a virtual appliance at the Junos Space Settings Menu prompt to run shell commands.
You are prompted to enter the administrator password.
Enter the administrator password.
Enter jmp-dr init –s at the shell prompt.
The values you need to input to configure disaster recovery at the standby site are displayed.

The Load Balancers part of the disaster recovery configuration file is displayed.
The script asks you if you have re-initialised the DR active site, that is run jmp-dr init -a --skip-user-config at the DR active site. Select Yes or No accordingly.
Enter the values for the parameters displayed:
1. Enter the VIP address of the active site and press Enter.
2. Enter the administrator passwords of the load-balancer nodes at the active site and press Enter.
  You can enter multiple passwords separated with commas.
  
  If multiple nodes use a common password, you need to enter the password only once.
3. Enter the timeout value to detect a failure in transferring files through SCP from the standby site to the active site, in seconds, and press Enter.
  The minimum and default value is 120.
4. Enter the maximum number of backups to retain at the standby site and press Enter.
  The minimum and default value is 3.
5. Enter the times of the day to back up files (in hours) at the standby site, separated with commas, and press Enter.
  You can enter any value from 0 through 23. You can also enter * to back up files every hour.
6. Enter the days of the week to back up files at the standby site, separated with commas, and press Enter.
  You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to back up files every day.
7. Enter the times of the day to copy files (in hours) from the standby site to the active site (when failed over to the standby site), separated with commas, and press Enter.
  You can enter any value from 0 through 23. You can also enter * to restore files every hour.
8. Enter the days of the week to copy files from the standby site to the active site (when failed over to the standby site), separated with commas, and press Enter.
  You can enter any value from 0 through 6, where Sunday equals zero. You can also enter * to restore files every day.
The following is a sample output:
```
#########################
#
# Load Balancers
#
#########################

What's the vip for load balancers at the active site? 10.206.41.220
What are the unique admin passwords for load balancer nodes at the active site (separated by comma, no space)? $ABC123
What's the scp timeout value (seconds)? 120

# backup for data in file system instead of DB

What's the max number of backup files to keep? 3
What are the times of the day to run file backup (0-23)? 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
What are the days of the week to run file backup (0-6)? 0,1,2,3,4,5,6

# restore for data in file system instead of DB

What are the times of the day to poll files from the active site (0-23)? 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
What are the days of the week to poll files from the active site (0-6)? 0,1,2,3,4,5,6
```
When you enter the values for all parameters, the DR Watchdog part of the disaster recovery configuration file is displayed.

Enter the values for the parameters displayed.

Enter the number of times the standby site should send heartbeat messages to the active site through ping after a heartbeat message times out and press Enter.
The minimum and default value is 4.
Enter the timeout value for each heartbeat message, in seconds, and press Enter.
The minimum and default value is 5.
Enter the time interval between two consecutive heartbeat messages to the active site, in seconds, and press Enter.
The minimum and default value is 30.
Enter the e-mail address of the administrator to whom e-mail messages about disaster recovery service issues must be sent and press Enter.
Enter the time during which the same issues are not reported through e-mail (dampening interval), in seconds, and press Enter.
The default value is 3,600. The minimum value is 300.

The following is a sample output:

content_copy zoom_out_map
#########################
#
# DR Watchdog
#
#########################


# heartbeat

What's the number of times to retry heartbeat message? 4
What's the timeout of each heartbeat message (seconds)? 5
What's the heartbeat message interval between sites (seconds)? 30

# notification

What's the contact email address of service issues? user1@example.com
What's the dampening interval between emails of affected services (seconds)? 300
Check status of DR remote site: up
Load /var/cache/jmp-geo/incoming/init.properties                                                      [ OK ]
Configure contact email                                                                               [ OK ]
Modify firewall for DR remote IPs                                                                     [ OK ]
Configure NTP                                                                                         [ OK ]
Sync jmp-geo group                                                                                    [ OK ]
Configure MySQL database                                                                              [ OK ]
Configure PostgreSQL database                                                                         [ OK ]
Command completed.

When you have entered values for all parameters, disaster recovery is initialized at the standby site.

Starting the Disaster Recovery Process

You use the jmp-dr start command to start the disaster recovery process at both sites. You can also use the jmp-dr start-a command to start the disaster recovery process on the active site and the jmp-dr start-s command to start the disaster recovery process on the standby site.

Note:

When you trigger DR start operation from the active site by choosing option both through the GUI and the operation completes without triggering start on standby site due to any network or environmental issue, apply the following workaround:

content_copy zoom_out_map
Workaround:Login to the CLI on target Standby site VIP node, and use the jmp-dr start -s command.

To start the disaster recovery process:

Log in to the CLI of the Junos Space node at the active site on which the VIP or the eth0:0 interface is configured.
The Junos Space Settings Menu is displayed.
Enter 7while using a virtual appliance at the Junos Space Settings Menu prompt to run shell commands.
You are prompted to enter the administrator password.
Enter the administrator password.

Enter jmp-dr start at the shell prompt.

The disaster recovery process is initiated on both sites.

The following is a sample output at the active site:

content_copy zoom_out_map
[user1@host]# jmp-dr start
Stop dr-watchdog if it's running                                                                      [ OK ]
Check status of DR remote site: up
Check current DR role: active

INFO: => start DR at current site: active 

Add device management IPs of DR remote site to up devices                                             [ OK ]
Setup MySQL replication: master-master                                                                [ OK ]
Start MySQL dump                                                                                      [ OK ]
Setup PostgreSQL replication                                                                          [ OK ]
Start file & RRD replication                                                                          [ OK ]
Open firewall for device traffic                                                                      [ OK ]
Start services(jboss,jboss-dc,etc.)                                                                   [ OK ]
Start dr-watchdog                                                                                     [ OK ]
Copy files to DR slave site                                                                           [ OK ]
Update DR role of current site: active                                                                [ OK ]

INFO: => start DR at DR remote site: standby 

Stop dr-watchdog if it's running                                                                      [ OK ]
Check status of DR remote site: up
Check current DR role: standby
Load /var/cache/jmp-geo/incoming/start.properties                                                     [ OK ]
Stop services(jboss,jboss-dc,etc.)                                                                    [ OK ]
Block firewall for device traffic                                                                     [ OK ]
Reset MySQL init script and stop replication                                                          [ OK ]
Scp backup file from peer site: /var/cache/jmp-geo/data/db.gz                                         [ OK ]
Start MySQL restore                                                                                   [ OK ]
Setup MySQL replication and start replication                                                         [ OK ]
Setup PostgreSQL replication                                                                          [ OK ]
Start files & RRD replication                                                                         [ OK ]
Start dr-watchdog                                                                                     [ OK ]
Clean up /var/cache/jmp-geo/incoming                                                                  [ OK ]
Update DR role of current site: standby                                                               [ OK ]
Command completed.
Command completed.

The disaster recovery process is initialized on the active site and the standby site.

Verifying the Status of the Disaster Recovery Process

We recommend that you execute the jmp-dr health command to verify the status (overall health) of the disaster recovery process at both the active and standby sites when you start the disaster recovery process on both sites. For more information about executing the jmp-dr health command, see Checking the Status of the Disaster Recovery Configuration.

arrow_backward PREVIOUS Understanding How the Standby Site Becomes Operational When the Active Site Goes Down

NEXT arrow_forward Stopping the Disaster Recovery Process on Junos Space Network Management Platform Release 14.1R3 and Earlier

High Availability and Disaster Recovery Guide

ON THIS PAGE

Configuring the Disaster Recovery Process Between an Active and a Standby Site

Configuring Disaster Recovery at the Active Site

Configuring Disaster Recovery at the Standby Site

Starting the Disaster Recovery Process

Verifying the Status of the Disaster Recovery Process

Stay in touch

Helped me install or use the product
Accelerated my deployment

Discuss the product with a co-worker
Make a purchase inquiry

High Availability and Disaster Recovery Guide

ON THIS PAGE

Configuring the Disaster Recovery Process Between an Active and a Standby Site

Configuring Disaster Recovery at the Active Site

Configuring Disaster Recovery at the Standby Site

Starting the Disaster Recovery Process

Verifying the Status of the Disaster Recovery Process

Related Documentation

Stay in touch