Transitioning a Conductor from Standalone to HA
This guide provides a procedure for transitioning a standalone SSR Conductor to an HA SSR Conductor.
This example starts with a standalone Conductor named Conductor
and one node named T104_DUT1
, along with a single managed Router named Router
, which has two nodes T104_DUT3
and T104_DUT4
:
admin@T104_DUT1.Conductor# show assets
Wed 2019-12-04 18:27:59 UTC
=========== =========== =========== ============== ========= ========
Router Node Asset Id SSR Version Status Errors
=========== =========== =========== ============== ========= ========
Conductor T104_DUT1 T104_DUT1 5.6.9-3.el7 running 0
Router T104_DUT3 T104_DUT3 5.6.9-3.el7 running 0
T104_DUT4 T104_DUT4 5.6.9-3.el7 running 0
Completed in 0.14 seconds
This is the starting basic configuration on the standalone Conductor node:
admin@T104_DUT1.Conductor# configure authority router Conductor
admin@T104_DUT1.Conductor (router[name=Conductor])# show
name Conductor
node T104_DUT1
name T104_DUT1
asset-id T104_DUT1
role conductor
exit
The first step is to configure the second Conductor node. If you start with the same basic configuration for the second node then you will see warnings below when you attempt to validate. The first warning states that a non-forwarding interfaces between the Conductor nodes needs to be configured to establish connectivity. For more information regarding non-forwarding interfaces please refer to this blog: Configuring Non Forwarding Interfaces. The second warning states that SSR needs to be restarted after creating a new node. This will be addressed later.
admin@T104_DUT1.Conductor# configure authority router Conductor
admin@T104_DUT1.Conductor (router[name=Conductor])# show
name Conductor
node T104_DUT1
name T104_DUT1
asset-id T104_DUT1
role conductor
exit
node T104_DUT2
name T104_DUT2
asset-id T104_DUT2
role conductor
exit
admin@T104_DUT1.Conductor# validate
Validating...
Warnings:
1. Router 'Conductor' is missing a non-forwarding fabric or shared interface on each node. The router may not have inter node
connectivity unless the interfaces were set up manually.
reported by router 'Conductor'
config
authority
router Conductor
2. The field 'name' cannot be created while the system is running. To apply the configuration, 'T104_DUT1.Conductor' and
'T104_DUT2.Conductor' must be restarted after committing this change.
reported by router 'Conductor'
config
authority
router Conductor
node T104_DUT2
Candidate configuration is valid
While the conductor is not a router, it shares the same configuration model as the router and therefore is configured as a router
.
Next, configure two non-forwarding interfaces so connectivity can be established between both Conductor nodes. Once the configuration has been created, you should see that validation warning go away:
admin@T104_DUT1.Conductor# configure authority router Conductor
admin@T104_DUT1.Conductor (router[name=Conductor])# show
name Conductor
node T104_DUT1
name T104_DUT1
asset-id T104_DUT1
device-interface mgmt
name mgmt
type ethernet
pci-address 0000:00:04.0
forwarding false
network-interface ha-fabric
name ha-fabric
type fabric
address 172.16.1.1
ip-address 172.16.1.1
prefix-length 24
gateway 172.16.1.201
exit
exit
exit
exit
node T104_DUT2
name T104_DUT2
asset-id T104_DUT2
role conductor
device-interface mgmt
name mgmt
type ethernet
pci-address 0000:00:04.0
forwarding false
network-interface ha-fabric
name ha-fabric
type fabric
address 172.16.1.2
ip-address 172.16.1.2
prefix-length 24
gateway 172.16.1.201
exit
exit
exit
exit
admin@T104_DUT1.Conductor (router[name=Conductor])# validate
Validating...
Warnings:
1. The field 'name' cannot be created while the system is running. To apply the configuration, 'T104_DUT1.Conductor' and
'T104_DUT2.Conductor' must be restarted after committing this change.
reported by router 'Conductor'
config
authority
router Conductor
node T104_DUT2
Candidate configuration is valid
Commit the configuration, then stop SSR on the standalone Conductor node by dropping into the Linux shell and executing systemctl stop 128T
. Wait for SSR to completely stop. The user can monitor the status of the service with systemctl status 128T
.
Once the SSR is fully stopped, the user needs to make changes to the salt minion configuration file. Since the Conductor was installed as a Standalone Conductor, the salt minion config is initialized with the loopback address. The user need to fix it manually to be the IP address of our management interface because when the second Conductor node is installed, it uses the IP address in these files to learn the management IP address of its peer node. Here is what the salt minion config look like before any modification:
[root@t104-dut1 ~]# cat /etc/salt/minion
auth_safemode: True
autoload_dynamic_modules: False
enable_legacy_startup_events: False
include: /usr/lib/128technology/python/salt/file_roots/_beacons/*.conf
log_level_logfile: debug
master:
- 127.0.0.1
master_alive_interval: 30
master_tries: -1
ping_interval: 1
random_reauth_delay: 120
recon_default: 5000
recon_max: 30000
recon_randomize: True
tcp_authentication_retries: -1
tcp_keepalive_cnt: 3
tcp_keepalive_idle: 5
tcp_keepalive_intvl: 10
transport: tcp
Now here is what it looks like after the management IP address has been properly updated:
auth_safemode: True
autoload_dynamic_modules: False
enable_legacy_startup_events: False
include: /usr/lib/128technology/python/salt/file_roots/_beacons/*.conf
log_level_logfile: debug
master:
- 172.16.1.1
master_alive_interval: 30
master_tries: -1
ping_interval: 1
random_reauth_delay: 120
recon_default: 5000
recon_max: 30000
recon_randomize: True
tcp_authentication_retries: -1
tcp_keepalive_cnt: 3
tcp_keepalive_idle: 5
tcp_keepalive_intvl: 10
transport: tcp
At this point, the first Conductor node is still stopped and it is time to install the second Conductor node. The user will launch the interactive installer to install the second node. You need to double check that the node name entered in the interactive installer matches the node name in the configuration. Wait for the interactive installer to finish then pick yes when it prompts the user to reboot. While the second node is rebooting, this is a good time to double check the global.init file and salt minion configuration file on the first Conductor node have been modified and the second Conductor node has been added:
When installing a new HA node (conductor or router), ensure that the same version is being installed on both nodes.
[root@t104-dut1 ~]# cat /etc/128technology/global.init
{
"init" : {
"control" : {},
"conductor" : {
"T104_DUT1" : {
"host" : "172.16.1.1"
},
"T104_DUT2" : {
"host" : "172.16.1.2"
}
},
"routerName" : "Conductor"
}
[root@t104-dut1 ~]# cat /etc/salt/minion
auth_safemode: True
autoload_dynamic_modules: False
enable_legacy_startup_events: False
include: /usr/lib/128technology/python/salt/file_roots/_beacons/*.conf
log_level_logfile: debug
master:
- 172.16.1.1
- 172.16.1.2
master_alive_interval: 30
master_tries: -1
ping_interval: 1
random_reauth_delay: 120
recon_default: 5000
recon_max: 30000
recon_randomize: True
tcp_authentication_retries: -1
tcp_keepalive_cnt: 3
tcp_keepalive_idle: 5
tcp_keepalive_intvl: 10
transport: tcp
Now that the global.init and salt minion config look correct we can restart the salt-minion so it picks up the config changes and then start SSR. From the Linux shell first execute systemctl restart salt-minion
then systemctl start 128T
. Once they are up login and verify they both have connectivity to each other:
admin@T104_DUT1.Conductor# show system connectivity
Wed 2019-12-04 20:35:35 UTC
===================== ===================== ===========
Local Node Remote Node State
===================== ===================== ===========
T104_DUT1.Conductor T104_DUT2.Conductor connected
T104_DUT1.Conductor T104_DUT3.Router connected
T104_DUT1.Conductor T104_DUT4.Router connected
Completed in 0.11 seconds
Note that the second Conductor node will not have connectivity to any managed routers yet, we will address that next:
admin@T104_DUT2.Conductor# show system connectivity
Wed 2019-12-04 20:44:18 UTC
===================== ===================== ==============
Local Node Remote Node State
===================== ===================== ==============
T104_DUT2.Conductor T104_DUT1.Conductor connected
T104_DUT2.Conductor T104_DUT3.Router disconnected
T104_DUT2.Conductor T104_DUT4.Router disconnected
Completed in 0.12 seconds
In order for this procedure to work correctly, the user should have the conductor-address
field configured for their standalone Conductor, which is the public IP address of the Conductor:
admin@T104_DUT1.Conductor# configure authority
admin@T104_DUT1.Conductor (authority)# conductor-address
192.168.1.8
The next step is to configure the second Conductor node as well as a conductor-address
for the second Conductor node. This example shows the second conductor-address
being configured:
admin@T104_DUT1.Conductor (authority)# conductor-address 192.168.1.13
admin@T104_DUT1.Conductor (authority)# conductor-address
192.168.1.8
192.168.1.13
Adding the second conductor address will also produce a new validation warning, which is expected:
admin@T104_DUT1.Conductor (router[name=Conductor])# validate
Validating...
Warnings:
1. Incorrectly modifying this field can result in permanent loss of connectivity to managed routers and will require manual intervention
to fix. Make sure the configured address is valid and reachable before committing this change
reported by router 'Conductor'
config
authority
conductor-address
Candidate configuration is valid
Commit this configuration and you see all managed assets disconnect and reconnect because the new Conductor IP address was added and the minion was restarted:
admin@T104_DUT1.Conductor# show assets
Wed 2019-12-04 19:00:17 UTC
=========== =========== =========== ============== ============== ========
Router Node Asset Id SSR Version Status Errors
=========== =========== =========== ============== ============== ========
Conductor T104_DUT1 T104_DUT1 5.6.9-3.el7 running 0
T104_DUT2 T104_DUT2 5.6.9-3.el7 running 0
Router T104_DUT3 T104_DUT3 None connected 0
T104_DUT4 T104_DUT4 None connected 0
Completed in 0.14 seconds
Wait until all assets transition back into the running
state:
admin@T104_DUT1.Conductor# show assets
Wed 2019-12-04 19:01:59 UTC
=========== =========== =========== ============== ============== ========
Router Node Asset Id SSR Version Status Errors
=========== =========== =========== ============== ============== ========
Conductor T104_DUT1 T104_DUT1 5.6.9-3.el7 running 0
T104_DUT2 T104_DUT2 5.6.9-3.el7 running 0
Router T104_DUT3 T104_DUT3 5.6.9-3.el7 running 0
T104_DUT4 T104_DUT4 5.6.9-3.el7 running 0
Completed in 0.16 seconds
There is no need to restart SSR on any managed routers all of them will properly recognize the new Conductor!
admin@T104_DUT3.Router# show system connectivity
Sun 2019-12-08 14:26:10 UTC
============ ===================== ===========
Local Node Remote Node State
============ ===================== ===========
T104_DUT3 T104_DUT1.Conductor connected
T104_DUT3 T104_DUT2.Conductor connected
T104_DUT3 T104_DUT4.Router connected
Completed in 0.04 seconds