Upgrading Contrail Networking to Release 21.4.L3 using Ansible Deployer in Service Software Upgrade Procedure in OpenStack Environment
When to Use This Procedure?
Before performing any upgrade procedure, install Docker serially over containers. However, you can upgrade computes in parallel to Docker via script. After upgrading each docker host, verify the status of contrail and services. Do not proceed with upgrade on next hosts until all the services of contrail-status reports are running properly.
Use the following script to stop the running containers, upgrade the docker, and bring containers back:
docker ps --format '{{.Names}}' > running_containers for CONTAINER in $(cat running_containers); do sudo docker stop $CONTAINER; done yum install -y docker-ce-20.10.9 docker-ce-cli-20.10.9 docker-ce-rootless-extras-20.10.9 for CONTAINER in $(cat running_containers); do sudo docker start $CONTAINER; done
It is recommended to use Zero Impact Upgrade (ZIU) procedures to upgrade Contrail Networking with minimal network disruption in most environments using Openstack orchestration.
To perform a ZIU upgrade, follow the instructions in How to Perform a Zero Impact Contrail Networking Upgrade using the Ansible Deployer. If you are running Red Hat Openstack 13 or 16.1, see Updating Contrail Networking using the Zero Impact Upgrade Process in an Environment using Red Hat Openstack 13 or Updating Contrail Networking using the Zero Impact Upgrade Process in an Environment using Red Hat Openstack 16.1.
The procedure in this document also provides a method of upgrading Contrail Networking with minimal network disruption using the Ansible deployer in environments using Openstack orchestration.
The procedure in this document has been validated to upgrade Contrail Networking from Release 3.2 or later to Release 5.0 or later. The starting Contrail release for this upgrade can be any Contrail Networking Release after Release 3.2, including all Contrail Networking 4, 5, 19, 20, and 21 releases. The target release for this upgrade can be any Contrail Networking Release after Release 5.0, including all Contrail Networking 5, 19, 20, and 21 releases.
Starting Contrail Networking Release |
Target Upgraded Contrail Networking Release |
---|---|
Release 3.2 or Later Any Release 4 Any Release 5 Any Release 19 Any Release 20 Any Release 21 |
Any Release 5 Any Release 19 Any Release 20 Any Release 21 |
Contrail In-Service Software Upgrade (ISSU) Overview
If your installed version is Contrail Release 3.2 or higher, you can perform an in-service software upgrade (ISSU) to perform this upgrade using the Ansible deployer. In performing the ISSU, the Contrail controller cluster is upgraded side-by-side with a parallel setup, and the compute nodes are upgraded in place.
We recommend that you take snapshots of your current system before you proceed with the upgrade process.
The procedure for performing the ISSU using the Contrail Ansible deployer is similar to previous ISSU upgrade procedures.
This Contrail ansible deployer ISSU procedure does not include steps for upgrading OpenStack. If an OpenStack version upgrade is required, it should be performed using applicable OpenStack procedures.
In summary, the ISSU process consists of the following parts, in sequence:
Deploy the new cluster.
Synchronize the new and old clusters.
Upgrade the compute nodes.
Finalize the synchronization and complete the upgrades.
Prerequisites
The following prerequisites are required to use the Contrail ansible deployer ISSU procedure:
A previous version of Contrail installed, no earlier than Release 3.2.
There are OpenStack controller and compute nodes, and Contrail nodes.
OpenStack needs to have been installed from packages.
Contrail and OpenStack should be installed on different nodes.
Upgrade for compute nodes with Ubuntu 14.04 is not supported. Compute nodes need to be upgraded to Ubuntu 16.04 first.
Preparing the Contrail System for the Ansible Deployer ISSU Procedure
In summary, these are the general steps for the system preparation phase of the Contrail ansible deployer ISSU procedure:
-
Deploy the new version of Contrail using the Contrail ansible deployer, but make sure to include only the following Contrail controller services:
-
Config
-
Control
-
Analytics
-
Databases
-
Any additional support services like rmq, kafka, and zookeeper. (The vrouter service will be deployed later on the old compute nodes.)
Note:You must provide keystone authorization information for setup.
-
-
After deployment is finished, you can log into the Contrail web interface to verify that it works.
The detailed steps for deploying the new controller using the ansible deployer are as follows:
-
To deploy the new controller, download contrail-ansible-deployer-release-tag.tgz onto your provisioning host from Juniper Networks.
-
The new controller file config/instances.yaml appears as follows, with actual values in place of the variables as shown in the example:
provider_config: bms: domainsuffix: local ssh_user: user ssh_pwd: password instances: server1: ip: controller 1 ip provider: bms roles: analytics: null analytics_database: null config: null config_database: null control: null webui: null contrail_configuration: CONTROLLER_NODES: controller ip-s from api/mgmt network CONTROL_NODES: controller ip-s from ctrl/data network AUTH_MODE: keystone KEYSTONE_AUTH_ADMIN_TENANT: old controller's admin's tenant KEYSTONE_AUTH_ADMIN_USER: old controller's admin's user name KEYSTONE_AUTH_ADMIN_PASSWORD: password for admin user KEYSTONE_AUTH_HOST: keystone host/ip of old controller KEYSTONE_AUTH_URL_VERSION: "/v3" KEYSTONE_AUTH_USER_DOMAIN_NAME: user's domain in case of keystone v3 KEYSTONE_AUTH_PROJECT_DOMAIN_NAME: project's domain in case of keystone v3 RABBITMQ_NODE_PORT: 5673 IPFABRIC_SERVICE_HOST: metadata service host/ip of old controller AAA_MODE: cloud-admin METADATA_PROXY_SECRET: secret phrase that is used in old controller kolla_config: kolla_globals: kolla_internal_vip_address: keystone host/ip of old controller kolla_external_vip_address: keystone host/ip of old controller
-
Finally, run the ansible playbooks to deploy the new controller.
ansible-playbook -v -e orchestrator=none -i inventory/ playbooks/configure_instances.yml ansible-playbook -v -e orchestrator=openstack -i inventory/ playbooks/install_contrail.yml
After successful completion of these commands, the new controller should be up and alive.
Provisioning Control Nodes and Performing Synchronization Steps
In summary, these are the general steps for the node provisioning and synchronization phase of the Contrail ansible deployer ISSU procedure:
Provision new control nodes in the old cluster and old control nodes in the new cluster.
Stop the following containers in the new cluster on all nodes:
contrail-device-manager
contrail-schema-transformer
contrail-svcmonitor
Switch the new controller into maintenance mode to prevent provisioning computes in the new cluster.
Prepare the config file for the ISSU.
Run the pre-sync script from the ISSU package.
Run the run-sync script from the ISSU package in background mode.
The detailed steps to provision the control nodes and perform the synchronization are as follows:
-
Pair the old control nodes in the new cluster. It is recommended to run it from any config-api container.
config_api_image=`docker ps | awk '/config-api/{print $1}' | head`
-
Run the following command for each old control node, substituting actual values where indicated:
docker exec -it $config_api_image /bin/bash -c "LOG_LEVEL=SYS_NOTICE source /common.sh ; python /opt/contrail/utils/provision_control.py --host_name hostname of old control node --host_ip IP of old control node --api_server_ip $(hostname -i) --api_server_port 8082 --oper add --router_asn 64512 --ibgp_auto_mesh \$AUTH_PARAMS"
-
Pair the new control nodes in the old cluster with similar commands (the specific syntax depends on the deployment method of the old cluster), again substituting actual values where indicated.
python /opt/contrail/utils/provision_control.py --host_name new controller hostname --host_ip new controller IP --api_server_ip old api-server IP/VIP --api_server_port 8082 --oper add --admin_user admin --admin_password password --admin_tenant_name admin --router_asn 64512 --ibgp_auto_mesh
-
Stop all the containers for contrail-device-manager, contrail-schema-transformer, and contrail-svcmonitor in the new cluster on all controller nodes.
docker stop config_devicemgr_1 docker stop config_schema_1 docker stop config_svcmonitor_1
-
Perform the following steps to delete the contrail-device-manager queue from Contrail rabbitmq after the contrail-device-manager container is stopped.
Note:Run the commands listed in this step from only one new controller.
Enter the Contrail rabbitmq container.
docker exec -it config_rabbitmq_rabbitmq_1 bash
-
Find the name of the contrail-device-manager queue.
rabbitmqctl list_queues | grep -F device_manager | grep $(hostname) | grep -v ztp
-
Delete the contrail-device-manager queue.
rabbitmqctl delete_queue <device_manager.queue>
These next steps should be performed from any new controller. Then the configuration prepared for ISSU runs. (For now, only manual preparation is available.)
In various deployments, old cassandra may use port 9160 or 9161. You can learn the configuration details for the old services on any old controller node, in the file /etc/contrail-contrail-api.conf.
The configuration appears as follows and can be stored locally:
[DEFAULTS] # details about oldrabbit old_rabbit_user = contrail old_rabbit_password = ab86245f4f3640a29b700def9e194f72 old_rabbit_q_name = vnc-config.issu-queue old_rabbit_vhost = contrail old_rabbit_port = 5672 old_rabbit_address_list = ip-addresses # details about new rabbit # new_rabbit_user = rabbitmq # new_rabbit_password = password # new_rabbit_ha_mode = new_rabbit_q_name = vnc-config.issu-queue new_rabbit_vhost = / new_rabbit_port = 5673 new_rabbit_address_list = ip-addresses # details about other old/new services old_cassandra_user = controller old_cassandra_password = 04dc0540b796492fad6f7cbdcfb18762 old_cassandra_address_list = ip-address:9161 old_zookeeper_address_list = ip-address:2181 new_cassandra_address_list = ip-address:9161 ip-address:9161 ip-address:9161 new_zookeeper_address_list = ip-address:2181 # details about new controller nodes new_api_info = {"ip-address": [("root"), ("password")], "ip-address": [("root"), ("password")], "ip-address": [("root"), ("password")]}
Detect the config-api image ID.
image_id=`docker images | awk '/config-api/{print $3}' | head -1`
Run the pre-synchronization.
docker run --rm -it --network host -v $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh $image_id -c "/usr/bin/contrail-issu-pre-sync -c /etc/contrail/contrail-issu.conf"
Run the run-synchronization.
docker run --rm --detach -it --network host -v $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh --name issu-run-sync $image_id -c "/usr/bin/contrail-issu-run-sync -c /etc/contrail/contrail-issu.conf"
Check the logs of the run-sync process. To do this, open the run-sync container.
docker exec -it issu-run-sync /bin/bash cat /var/log/contrail/issu_contrail_run_sync.log
Transferring the Compute Nodes into the New Cluster
In summary, these are the general steps for the node transfer phase of the Contrail ansible deployer ISSU procedure:
Before transferring the Compute Nodes into a new cluster, make sure that docker was successfully updated.
-
Select the compute node(s) for transferring into the new cluster. This selects the virtual machines of the compute node(s).
-
Migrate the Virtual Machines (VMs) manually from one compute node to another. The steps are as follows:
Note:This procedure is useful when live migration cannot be done.
-
Identify the VM to migrate and its host. Run the following command to stop VM.
openstack server stop <vm-uuid>
-
Identify the VM disk image location on the source compute node where the VM instance was launched, Usually, the disk image location is:
/var/lib/docker/volumes/nova_compute/_data/instances/<vm-UUID>
-
Copy this directory to the destination compute node.
-
On the destination compute node, run the following command to change the permission of this directory:
chown -R 42436:42436 /var/lib/docker/volumes/nova_compute/_data/instances/<vm-UUID>
-
Update the nova database of this instance to new host.
docker exec -it mariadb bash mysql -u <username> -p <password> nova update instances set node='new host fqname', host='new hostname' where uuid='<VM-UUID>'
Example:
(mariadb)[mysql@nodem1 /]$ mysql -u root -p contrail123 nova MariaDB [nova]> update instances -> set node='nodem2.englab.juniper.net', host='nodem2' -> where uuid='b7178be6-d4da-4074-9124-d246fa3a2105' -> ;
-
Run the following command to start the VM
openstack server start <vm-uuid>
-
-
For Contrail Release 3.x, remove Contrail from the node(s) as follows:
-
Stop the vrouter-agent service.
-
Remove the
vhost0
interface. -
Switch the physical interface down, then up.
-
Remove the vrouter.ko module from the kernel.
-
-
For Contrail Release 4.x and later, remove Contrail from the node(s) as follows:
-
Stop the agent container.
-
Restore the physical interface.
-
Update docker.
docker ps --format '{{.Names}}' > running_containers for CONTAINER in $(cat running_containers); do sudo docker stop $CONTAINER; done yum install -y docker-ce-20.10.9 docker-ce-cli-20.10.9 docker-ce-rootless-extras-20.10.9 for CONTAINER in $(cat running_containers); do sudo docker start $CONTAINER; done rm running_containers
Remove vrouter_vrouter-agent_1 and vrouter_nodemgr_1.
docker rm -f vrouter_vrouter-agent_1 docker rm -f vrouter_nodemgr_1
Stop vhost0.
Ifdown vhost0
-
Add the required node(s) to instances.yml with the roles
vrouter
andopenstack_legacy_compute
. -
Run the Contrail ansible deployer to deploy the new vrouter and to configure the old compute service.
-
All new compute nodes will have:
-
The collector setting pointed to the new Contrail cluster
-
The Control/DNS nodes pointed to the new Contrail cluster
-
The config-api setting in vnc_api_lib.ini pointed to the new Contrail cluster
-
-
(Optional) Run a test workload on transferred nodes to ensure the new vrouter-agent works correctly.
Follow these steps to rollback a compute node, if needed:
-
Move the workload from the compute node.
-
Stop the new Contrail containers.
-
Ensure the network configuration has been successfully reverted.
-
Deploy the previous version of Contrail using the deployment method for that version.
The detailed steps for transferring compute nodes into the new cluster are as follows:
After moving workload from the chosen compute nodes, you should remove the previous version of contrail-agent. For example, for Ubuntu 16.04 and vrouter-agent installed directly on the host, these would be the steps to remove the previous contrail-agent:
# stop services systemctl stop contrail-vrouter-nodemgr systemctl stop contrail-vrouter-agent # remove packages apt-get purge -y contrail* # restore original interfaces definition cd /etc/network/interfaces.d/ cp 50-cloud-init.cfg.save 50-cloud-init.cfg rm vrouter.cfg # restart networking systemctl restart networking.service # remove old kernel module rmmod vrouter # maybe you need to restore default route ip route add 0.0.0.0/0 via 10.0.10.1 dev ens3
For other kind of deployments remove the vrouter-agent and vrouter-agent-nodemgr containers, and disable vhost0 interface.
-
The new instance should be added to instances.yaml with two roles: vrouter and openstack_compute_legacy. To avoid reprovisioning the compute node, set the maintenance mode to
TRUE
. For example:instances: server10: ip: compute 10 ip provider: bms roles: vrouter: MAINTENANCE_MODE: TRUE VROUTER_ENCRYPTION: FALSE openstack_compute_legacy: null
Make sure that instances.yaml nodes definition includes only the compute nodes you want to upgrade. All other nodes should be commented out.
-
Run the ansible playbooks.
ansible-playbook -v -e orchestrator=none -e config_file=/root/contrail-ansible-deployer/instances.yaml playbooks/configure_instances.yml ansible-playbook -v -e orchestrator=openstack -e config_file=/root/contrail-ansible-deployer/instances.yaml playbooks/install_contrail.yml
-
The contrail-status for the compute node appears as follows:
vrouter kernel module is PRESENT == Contrail vrouter == nodemgr: active agent: initializing (No Configuration for self)
- Restart contrail-control on all new controller nodes after the upgrade is
complete:
docker restart control_control_1
-
After upgrading the compute nodes, XMPP goes down due to SSLhandshake issue. Example:
vrouter kernel module is PRESENT == Contrail vrouter == nodemgr: active agent: initializing (XMPP:control-node:10.10.10.1, XMPP:dns-server:10.10.10.1 connection down, No Configuration for self)
The steps to bring up XMPP are as follows:
- Copy the following two files from new control node to upgraded compute
node:
/etc/contrail/ssl/private/server-privkey.pem /etc/contrail/ssl/certs/server.pem
-
Restart the VRouter agent of upgraded compute node.
docker restart vrouter_vrouter-agent_1
- Copy the following two files from new control node to upgraded compute
node:
-
Check status of new compute nodes by running
contrail-status
on them. All components should be active now. You can also check the status of the new instance by creating AZ/aggregates with the new compute nodes and run some test workloads to ensure it operates correctly.
Finalizing the Contrail Ansible Deployer ISSU Process
Finalize the Contrail ansible deployer ISSU as follows:
-
Stop the issu-run-sync container.
docker rm -f issu-run-sync
-
Run the post synchronization commands.
docker run --rm -it --network host -v $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh --name issu-run-sync $image_id -c "/usr/bin/contrail-issu-post-sync -c /etc/contrail/contrail-issu.conf" docker run --rm -it --network host -v $(pwd)/contrail-issu.conf:/etc/contrail/contrail-issu.conf --entrypoint /bin/bash -v /root/.ssh:/root/.ssh --name issu-run-sync $image_id -c "/usr/bin/contrail-issu-zk-sync -c /etc/contrail/contrail-issu.conf"
-
Run the following commands on all the new controller nodes.
docker-compose -f /etc/contrail/config/docker-compose.yaml restart api docker-compose -f /etc/contrail/config/docker-compose.yaml up -d
Restart the container.
docker-compose -f /etc/contrail/config/docker-compose.yaml restart API docker-compose -f /etc/contrail/config/docker-compose.yaml up -d
Disengage maintenance mode and start all previously stopped containers. To do this, set the entry
MAINTENANCE_MODE
in instances.yaml to FALSE, then run the following command from the deployment node:ansible-playbook -v -e orchestrator=openstack -i inventory/ playbooks/install_contrail.yml
During this step, only compute nodes should be included in the instances.yaml, and other nodes should be commented out.
-
Clean up and remove the old Contrail controllers. Use the provision-issu.py script called from the config-api container with the config issu.conf. Replace the credential variables and API server IP with appropriate values as indicated.
[DEFAULTS] db_host_info={"ip-address": "node-ip-address or hostname", "ip-address": "node-ip-address or hostname", "ip-address": "node-ip-address or hostname"} config_host_info={"ip-address": "node-ip-address or hostname", "ip-address": "node-ip-address or hostname", "ip-address": "node-ip-address or hostname"} analytics_host_info={"ip-address": "node-ip-address or hostname", "ip-address": "node-ip-address or hostname", "ip-address": "node-ip-address or hostname"} control_host_info={"ip-address": "node-ip-address or hostname", "ip-address": "node-ip-address or hostname", "ip-address": "node-ip-address or hostname"} admin_password = <admin password> admin_tenant_name = <admin tenant> admin_user = <admin username> api_server_ip= <any IP of new config-api controller> api_server_port=8082
Note:Currently, the previous step works only with hostname and not with IP Address.
-
Run the following commands from any controller node.
Note:All *host_info parameters should contain the list of new hosts.
docker cp issu.conf config_api_1:issu.conf docker exec -it config_api_1 python /opt/contrail/utils/provision_issu.py -c issu.conf
-
Servers can be cleaned up if there are no other services present.
- Navigate to the following path in old
controller:
[root@nodem1 ~]# cd /etc/kolla/neutron-server/ [root@nodem1 neutron-server]# pwd /etc/kolla/neutron-server [root@nodem1 neutron-server]# cat ContrailPlugin.ini [APISERVER] api_server_port = 8082 api_server_ip = <old_controller_ip> multi_tenancy = True contrail_extensions = ipam:neutron_plugin_contrail.plugins.opencontrail.contrail_plugin_ipam.NeutronPluginContrailIpa m,policy:neutron_plugin_contrail.plugins.opencontrail.contrail_plugin_policy.NeutronPluginContr ailPolicy,routetable:neutron_plugin_contrail.plugins.opencontrail.contrail_plugin_vpc.NeutronPluginContrailVpc ,contrail:None,service-interface:None,vf-binding:None [COLLECTOR] analytics_api_ip = <old_controller_ip> analytics_api_port = 8081 [keystone_authtoken] auth_host = <old_controller_ip> auth_port = 5000 auth_protocol = http admin_user = admin admin_password = password admin_tenant_name = admin insecure = True region_name = RegionOne
-
Modify the api_server_ip and analytics_api_ip addresses with the new controller IP addresses.
[root@nodem1 neutron-server]# pwd /etc/kolla/neutron-server [root@nodem1 neutron-server]# cat ContrailPlugin.ini [APISERVER] api_server_port = 8082 api_server_ip = <new_controller_ip> multi_tenancy = True contrail_extensions = ipam:neutron_plugin_contrail.plugins.opencontrail.contrail_plugin_ipam.NeutronPluginContrailIpa m,policy:neutron_plugin_contrail.plugins.opencontrail.contrail_plugin_policy.NeutronPluginContr ailPolicy,routetable:neutron_plugin_contrail.plugins.opencontrail.contrail_plugin_vpc.NeutronPluginContrailVpc ,contrail:None,service-interface:None,vf-binding:None [COLLECTOR] analytics_api_ip = <new_controller_ip> analytics_api_port = 8081 [keystone_authtoken] auth_host = <keystone_ip_addr> auth_port = 5000 auth_protocol = http admin_user = admin admin_password = password admin_tenant_name = admin insecure = True region_name = RegionOne
Restart the neutron-server container in old controller.
[root@nodem1]# docker restart neutron_server
Go to neutron_server container in the old control node. Verify whether the ContrailPlugin.ini file contains new controller IP's or not. It should contain new controller IP's.
[root@nodem1 ~]# docker exec -it neutron_server bash (neutron-server)[neutron@nodem1 /]$ cd /etc/neutron/plugins/opencontrail (neutron-server)[neutron@nodem1 /etc/neutron/plugins/opencontrail]$ cat ContrailPlugin.ini [APISERVER] api_server_port = 8082 api_server_ip = <new_controller_ip> multi_tenancy = True contrail_extensions = ipam:neutron_plugin_contrail.plugins.opencontrail.contrail_plugin_ipam.NeutronPluginContrailIpa m,policy:neutron_plugin_contrail.plugins.opencontrail.contrail_plugin_policy.NeutronPluginContr ailPolicy,routetable:neutron_plugin_contrail.plugins.opencontrail.contrail_plugin_vpc.NeutronPluginContrailVpc ,contrail:None,service-interface:None,vf-binding:None [COLLECTOR] analytics_api_ip = <new_controller_ip> analytics_api_port = 8081 [keystone_authtoken] auth_host = <keystone_ip_addr> auth_port = 5000 auth_protocol = http admin_user = admin admin_password = password admin_tenant_name = admin insecure = True region_name = RegionOne
The heat configuration needs the same changes. Locate the parameter
[clients_contrail]/api_server
and change it to point to the list of the new config-api IP addresses.To resync the database:
Login to the zookeeper container.
docker exec -it config_database_zookeeper_1 bash
Go to the bin directory.
cd bin
Connect to zookeeper.
zkCli.sh -server <controller-ip>:2181
Delete the lock on zookeeper container.
delete /vnc_api_server_locks/dbe-resync-complete
Quit and exit from the zookeeper container.
Restart the Config API server container and wait for the Contrail status to be up.
docker restart config_api_1
Restart the Control container.
docker restart control_control_1
Troubleshooting link-loop in Release 21.4.L2
The ansible deployer of Contrail Networking Release 21.4.L2 introduces link-loop in the
/var/log/contrail
directory present in the contrail config nodes. This happens
every time the Contrail Networking Release 21.4.L2 ansible deployer is started. Re-running
ansible deployer playbooks fails due to mentioned recursion. This issue is resolved in Contrail
Networking Release 21.4.L3. However, for Contrail Networking Release 21.4.L2, it requires a
manual intervention to follow the given workaround.
Workaround: Manually remove the incorrect symlink from all contrail config nodes:
sudo unlink /var/log/contrail/config-database-rabbitmq/config-database-rabbitmq