Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Contrail Cloud Configuration

This chapter covers Contrail Cloud configuration.

Contrail Cloud Configuration File Structure Overview

Contrail Cloud is configured using YAML files. A series of pre-configured YAML file templates are provided as part of a Contrail Cloud installation. These user-configurable YAML files are downloaded onto the jumphost server during the initial phase of the Contrail Cloud installation. The YAML files can be accessed and edited by users from within the jumphost.

Contrail Cloud configuration changes are applied using Ansible playbook scripts, which are also provided as part of the Contrail Cloud bundle. The Ansible playbooks read the configuration provided in the YAML files. The Ansible playbook scripts populate parameters from the YAML files into a second set of configuration files that are used by RedHat Openstack Director to provision servers and configure the components of Contrail Cloud.

See Deploying Contrail Cloud for additional information on YAML file locations and configuration updating procedures.

Table 1 lists commonly-used YAML file parameters in Contrail Cloud and provides a summary of the purpose of each parameter.

Table 1: YAML File Parameters

YAML File Parameter

Purpose

site.yml

global:

DNS, NTP, domain name, time zone, satellite URL, and proxy configuration for the deployment environment.

jumphost:

Provision NIC name definition and PXE boot interface for the jumphost.

control_hosts:

Control host parameters. Includes disk mappings for bare metal servers and control plane VM sizing per role for functions like analytics.

compute_hosts:

Parameters for SR-IOV, DPDK, and TSN in compute nodes. Root disk configuration per hardware profile.

storage_hosts:

Ceph and block storage profiles definition for storage nodes.

undercloud:

Nova flavors for roles. Applicable when using additional hardware profiles.

overcloud:

Hardware profile and leaf number-based:

  • disk mappings

  • network definitions—names, subnets, VLANs, DHCP pools, and roles for network. Other network definitions like TLS cert, keystone LDAP backend enablement, post deployment extra actions, tripleO extra configurations

ceph:

Ceph enablement and disk assignments (pools, OSDs) on storage nodes.

ceph_external:

Externally deployed Ceph integration parameters.

appformix:

Enable HA, VIP IPs, and network devices monitoring for Appformix.

inventory.yml

inventory_nodes:

Name, IPMI IP, Ironic driver used for LCM, root disk, and other related functions for all Contrail cluster nodes.

control-host-nodes.yml

control_host_nodes:

Internal IP and DNS (per control node) for control hosts and the control plane.. Statically added IPs for controllers need to be outside of DHCP pools for networks that use them.

control_host_nodes_network_ config:

Bridges, bonds, DHCP/IP, and MTU for control hosts.

control_hosts:

VM interface to bridge on control-host mapping.

overcloud-nics.yml

contrail_network_config:controller_network_config:appformixController_network_config:computeKernel_network_config:compute_dpdk_network_config:cephStorage_network_config:

Interface to network mapping, routes, DHCP-IP allocation, bonds, VLAN to interface maps, and bond options for control, storage, and compute nodes.

compute-nodes.yml

compute_nodes_kernel:compute_nodes_dpdk: compute_nodes_sriov:

Mapping hosts from inventory to compute roles and profiles for compute nodes.

storage-nodes.yml

storage_nodes:

Names of storage nodes.

vault-data.yml

global:

satellite key and contrail user password for the Red Hat Open Stack Vault function.

undercloud: overcloud: control_hosts:

VM & Bare metal server (BMS) passwords for Contrail Cluster nodes and the undercloud when using the Red Hat Open Stack Vault function.

appformix:

MySQL and RabbitMQ passwords for Appformix when using the Red Hat Open Stack Vault function.

ceph_external:

Client key used by Ceph External with the Red Hat Open Stack Vault function..

inventory_nodes:

IPMI credentials for Contrail cluster nodes when using the Red Hat Open Stack Vault function.

The Ansible playbooks initially read variables from the default.yml file. Configuration files are then read in the order presented in Table 1. Variables are stored and updated in the default.yml file as the script runs. If the same variable has different configuration settings in different YAML files, the setting in the YAML file that is read later in the configuration file processing order is implemented.

Sample YAML files should be copied from the /var/lib/contrail_cloud/samples directory to the /var/lib/contrail_cloud/config directory and updated according to the requirements of the current deployment. Parameter values in the files in the /var/lib/contrail_cloud/samples directory can be used as default values in most cases where guidance for setting values is not given in this document. See Contrail Cloud Deployment Guide.

Hardware Profiles

A hardware profile allows administrators to apply the same configuration to a group of servers acting as compute or storage nodes.

As new servers are added to a deployment, each server might have different disk and network hardware. The method that networks and storage are configured may differ between servers.

Servers are associated to hardware profiles in the compute-nodes.yml file. The leaf number is also set in this file. This sample configuration snippet from the compute-nodes.yml file shows a hardware profile configuration:

The hardware profile configurations are applied to servers using the site.yml and overcloud-nics.yml files, using the sections of the files that use the following semantic:

where:

  • The role is one of the following values: ComputeKernel, ComputeDPDK, ComputeSriov, CephStorage

  • The leaf number is the number of the leaf. For instance, 0.

  • The Hardware profile tag is any alphanumeric string starting with a capital letter. We strongly recommend using the Hw[number] convention.

For example, the following sample names could be used for compute nodes of different types in leaf 0:

  • ComputeKernel0Hw1 for kernel-mode

  • ComputeDpdk0Hw1 for DPDK

  • ComputeSriov0Hw1 for SR-IOV

Two sample hardware profile configurations within the site.yml file are provided below. The example allocates different SCSI disks for local VM ephemeral storage using variable HCTL.

Example of hardware profiles in the overcloud-nics.yml file are provided later in this reference architecture.

Network Configuration

This section describes how the networks are configured in Contrail Cloud. All properties of the networks—with the exceptions of the IPMI and Provision networks—are specified in the network: section within the site.yml file.

IPMI Network

The IPMI network is generally a set of network addresses reserved for hardware management within a data center. The management switches must be configured with access to the default gateway for the IPMI network. Servers in the environment must have IP addresses statically allocated for IPMI or sent via DHCP using the MAC address as the key for address allocation.

Provision Network

The properties of the provision network are configured in the undercloud: section of the site.yml file.

A sample configuration snippet from the site.yml file:

Note:

The control network in the overcloud is the same as the provision network in the undercloud.

Note:

The example configuration provides for up to two hundred compute nodes and fifteen IP address for control and storage nodes.

Note:

The inspection block specifies a range of IP addresses that the installer introspection service uses during the PXE boot and provisioning process. Use the ip_range variable to define the start and end values within this range. When batch provisioning is used—which is recommended in this reference architecture—only a small number of these addresses are in use at the time. The range, therefore, can be much smaller than the DHCP range.

A server PXE boots from the provisioning network and receives an IP address via DHCP when it is provisioned. The boot preference is then changed to disk, the IP address is configured by the Ironic service into the operating system, and the server boots from the disk with the same IP address.

External Network

The External network is used for cloud users to access the public API addresses of the control hosts. A VIP IP address as well as a pool of IP addresses that can be used by DHCP is specified.

The External network parameters are set in the site.yml file.

A sample External network configuration snippet from the site.yml file:

There are only a limited number of control hosts. The external network, therefore, can be a small subnet of IP addresses.

The External network is associated with an interface on control hosts in the control-host-nodes.yml files.

Management

The properties of the management network are configured in the network: section of the site.yml file.

A sample Management network configuration snippet from the site.yml file:

Internal API, Tenant, Storage, and Storage Management Networks

Red Hat Openstack 13 (RHOSP 13) supports the capability to use separate subnets per rack. This feature is used for the networks that are connected to the IP Fabric in this reference architecture. These are the internal API, tenant, storage and storage management networks. Each of these networks is assigned a supernet IP address range (/16), which includes all the rack subnets (/24) for that network.

The concept of spine-leaf networking in TripleO is described in

Red Hat OpenStack Platform 13: Spine Leaf Networking

TripleO uses the term leaf to group servers that have shared connectivity. In this Contrail Cloud reference architecture, leaves in the TripleO context refer to grouped servers in the same rack. In this reference architecture, a Red Hat leaf is implemented by a management switch and a pair of top of rack switches (ToR switches).

The names of networks in configuration files follow the convention established in the compute and storage nodes for a rack, or a leaf. The names are a concatenation of base network name and leaf number. Leaf numbers are assigned to nodes in the compute_nodes.yml file. Base network names include InternalApi, Storage, StorageMgmt, and Tenant networks.

Compute nodes in Leaf 0 should use networks:

  • InternalApi0

  • Storage0

  • Tenant0

Compute nodes in Leaf 1 should have defined networks:

  • InternalApi1

  • Storage1

  • Tenant1

Compute nodes in Leaf N should have defined networks:

  • InternalApiN

  • StorageN

  • TenantN

For an example of splitting a supernet, see the example below for the Internal API and Tenant networks. The same procedure must be performed for the remaining networks used for compute and storage nodes (Storage, Storage Mgmt).

In the following configuration snippet, these parameters are set for all networks:

  • supernet - supernet definition.

  • cidr - subnet configuration for a leaf with the first subnet used for the controllers.

  • default_route - static route definition pointing to a “supernet” via a given operating system interface, such as bond1, vhost0, or other interfaces.

  • vrouter_gateway - default route definition for vrouter encapsulated traffic in the overlay network. This variable is defined as a gateway parameter in the /etc/contrail/contrail-vrouter-agent.conf file. This gateway IP address is used to reach DC gateways (MX routers and other vRouters to setup MPLSoUDP or other overlays).

  • role - the role or hardware profile that assigns the subnet. The first subnet is always for controllers and the remaining subnets are assigned to compute and storage nodes based on the leaf identifier.

When a network is specified to share an interface in the role_network_config.yml file, the network is assigned a VLAN. The VLAN can be identical in all racks when VXLAN is used, which is how the VLANs are labelled in this reference architecture.

Example Networks Used in this Reference Architecture

Table 2 presents a sample addressing scheme in a Contrail Cloud environment with four racks. This addressing scheme is used in the configuration file examples in this reference architecture.

Table 2: Addressing Scheme
Network Supernet Subnet

Provision

192.168.212.0/23

External

10.10.10.0/25

internal_api

172.16.0.0/16

172.16.0.0/24

internal_api[0-3]

172.16.1-4.0/24

management

192.168.0.0/23

Storage

172.19.0.0/16

172.19.0.0/24

storage[0-3]

172.19.1-4.0/24

storage_mgmt

172.20.0.0/16

172.20.0.0/24

storage_mgmt[0-3]

172.20.1-4.0/24

Tenant

172.18.0.0/16

172.18.0.0/24

tenant[0-3]

172.18.1-4./24

Note:

In the site.yml file, the provision network is subdivided into an inspection block with addresses that are used during PXE booting. These addresses are configured onto servers during provisioning.

Supernet addresses are specified for networks that contain a rack with a separate subnet. The supernet address is used in a static route on servers to send inter-rack traffic through the correct interface and corresponding VLAN.

Batch Deployment Configuration

Juniper Networks recommends running compute and storage node deployments in batches of 5 to 10 nodes. We make this recommendation based on the potential for timeouts during the Triple0 Heat automation process during larger deployments.

Batch deployments are configured in the site.yml file. A sample batch deployment configuration snippet from the site.yml file:

In this configuration, 5 CephStorage, 5 ComputeDPDK, and 5 ComputeSriov nodes are deployed when the openstack-deploy.sh script is run. The script should be run repeatedly until it reports that there are no more nodes to be deployed.

Jumphost

The jumphost is the host from which an administrator initiates provisioning of a Contrail Cloud environment. This section covers jumphost configuration options.

Jumphost Overview

The jumphost:

  • hosts the undercloud. The undercloud is a VM responsible for provisioning and managing all control hosts, storage nodes, and compute nodes in a Contrail Cloud. All Contrail-related setup and configuration is performed through the undercloud in a Contrail Cloud.

  • stores Contrail Cloud configuration-related files. The YAML files that configure Contrail Cloud are stored on the jumphost. The Ansible scripts that apply the configurations made in the YAML files to the Contrail Cloud nodes are also stored on the jumphost.

    The Contrail Cloud scripts are stored in the /var/lib/contrail_cloud directory.

  • hosts the Contrail Command web user interface virtual machine.

  • runs Red Hat Enterprise Linux with only base packages installed.

  • provides SNAT for traffic from the Provisioning network to intranet and external networks.

  • provides access to the servers of a Contrail Cloud environment if a management network is not present

A jumphost must be operational as a prerequisite for a Contrail Cloud installation. The jumphost should not run any virtual machines besides the undercloud and Contrail Command VMs.

Figure 1 illustrates the jumphost network connections.

Figure 1: Jumphost Network ConnectionsJumphost Network Connections

The Intranet network is the network configured manually before the Contrail Cloud 13 packages are downloaded. This network is used by Contrail Cloud to download packages and to provide outside connectivity for the nodes in a Contrail Cloud environment to reach outside networks. The Intranet network IP address can be the IP address from the External network.

The jumphost is configured during the installation process to use an IP masquerade for SNAT of outbound traffic from hosts connected to the provisioning network. The “public” IP address on the jumphost—the IP address of the “Intranet” port—is used as a source address for traffic exiting the cloud during provisioning. This IP address should be permitted in firewalls to allow access to public repositories, Juniper Satellite servers, and Red Hat subscription managers. If external access is not permitted from the jumphost for security purposes, a Capsule proxy server can be used as described in Capsule Configuration. See Miscellaneous.

You can also configure a proxy, in which case the “Intranet” port should have access to the proxy. Note that the externally accessible VIP addresses for Openstack, Appformix, and Contrail APIs should be excluded from proxying (to avoid issues during provisioning).

A sample proxy configuration snippet from the site.yml file:

Where 10.10.100.50, 10.10.10.51, 10.10.10.52 are VIPs in the External network.

Adding an IP Address for the Jumphost to the Provision Network

The jumphost should be allocated an IP address in the Provision network to enable SSH to the other hosts in the Contrail Cloud environment. This IP address allocation is a convenience to enable troubleshooting from the jumphost.

This IP address is added to the jumphost in the site.yml file as shown in this configuration snippet:

Contrail Command Configuration

Contrail Command is a standalone solution based on two containers: contrail_command and ccontrail_psql. Contrail Command has no HA capabilities and only new UI services are provided. The Contrail Command VM is created on the jumphost.

The Contrail Command web UI can be reached in a Contrail Cloud environment by entering this URL in a web browser: https://[jumphost-IP-address]:9091

Contrail Command can be accessed after a Contrail Cloud deployment without user configuration. Contrail Cloud configuration parameters are updated in the site.yml file.

A sample site.yml file configuration snippet where the Contrail Command parameters are updated:

Authentication details for Contrail Command are provided in the vault-data.yml file.

A sample vault-data.yml file configuration snippet with modified Contrail Command attributes:

Controller Node Configuration

This section describes the controller node configuration in Contrail Cloud. It also includes sections related to VM resources and networking for controller nodes.

Control Node VM Resources

This reference architecture is designed to support a large-scale Contrail Cloud environment. The following memory resources should be allocated to controller VMs to support this architecture.

Table 3: Controller VMs Requirements

Role

vCPU (Threads)

Memory (GB)

Disk (GB)

Undercloud VM

28

128

500

OpenStack Controller VM*

8

48

500

Contrail Analytics DB VM

12

48

500 & 1000

Contrail Analytics VM

12

48

250

Contrail Controller VM

16

64

250

AppFormix VM

16

32

500

TSN (Contrail Service Node)

4

8

100

Control Host OS

4

8

100

* The Openstack Controller VM size is significantly smaller than Red Hat’s recommended Openstack Controller VM size. The VM uses less resources in Contrail Cloud because several network functions are handled by Contrail Networking and telemetry functions are performed by Appformix. See Red Hat OpenStack Platform 13: Recommendations for Large Deployments to see the recommended VM sizes.

Note:

Operating system resources need to be reserved, not overtaken by resources allocated to controller VMs (assuming no oversubscription on controller hosts). There are no configuration file options to configure resources for operating systems.

The following configuration snippet from the control_hosts:vm: section of the site.yml files configures control host options:

Controller Host Network Configuration

Corresponding interfaces of controller hosts are in the same L2 network, and are preferably deployed in different racks and connected to leaf devices in the EVPN-VXLAN IP Fabric. Figure 2 illustrates these connections.

Figure 2: Control Host Network ConnectionsControl Host Network Connections

The following VMs are running on the control hosts.

  • OS - Openstack Controller

  • CC - Contrail Controller

  • CA - Contrail Analytics

  • CADB - Contrail Analytics DB

  • AFX - Appformix

  • TSN - ToR Service Node (optional)

    A TSN is enabled in the control_hosts: vm: contrail-tsn: hierarchy within the site.yml file. The function has been renamed CSN (Contrail Service Node) in Contrail Networking but tripleO manifests continue to use the TSN term.

Figure 3 illustrates how the physical and logical interfaces on a control host connect to it’s management switch and leaf switches.

Figure 3: Control Host—Physical and Logical InterfacesControl Host—Physical and Logical Interfaces
Note:

Physical NICs typically contain multiple physical interfaces. In Red Hat configuration files, however, the naming convention is to use nicN to indicate the Nth physical interface on a server. For information on finding the order of interfaces using an introspection command, see Miscellaneous.

Figure 4 illustrates how networks and ports are assigned to bridges on a controller node.

Figure 4: Control Host—Physical Interface to Bridge AllocationControl Host—Physical Interface to Bridge Allocation

Physical NICs typically contain two ports which are named nic3 and nic4 in the first physical NIC.

Table 4 shows which files are used for configuring each network.

Table 4: Files for Configuring Control Host Networks
Port/NIC Configuration

IPMI

Address is entered as an input to the inventory.yml file.

br-eno1

1 x 1G/10G NIC - untagged interface (e.g. built-in copper interface) with addresses generated automatically from the provisioning network pool.

br-eno2

1 x 1G/10G NIC - Management network, untagged interface with address defined in the overcloud: network: section of the site.yml file.

br-bond0

2 x 10G/25G/40G NICs made of first ports from both NICs. Tagged interface with networks: Tenant, Internal API and External networks.

Bond physical allocation is defined in the control-host-nodes.yml file. Addressing is set in the overcloud: network: section of the site.yml file.

br-bond1

2 x 10G/25G/40G NICs made of second ports from both NICs. Tagged interface with networks: Storage and Storage Mgmt networks.

Bond physical allocation is defined in the control-host-nodes.yml file. Addressing is set in the overcloud: network: section of the site.yml file.

The configuration for bond interfaces is performed in the control_host_nodes_network_config: hierarchy of the control-hosts-nodes.yml file. Bond interfaces should be configured with the following parameters:

  • Linux bond - mode 4/LACP (802.3ad)

  • Hash policy - layer3+4

  • Use ovs-bridge and linux_bond. An OVS-bridge Linux bridge is configurable, but is not recommended by Red Hat

This configuration snippet from the control-hosts.nodes.yml file shows various control host configurations, including bond interface configurations.

Note:

Use validation tools to check numbered to named NICs allocations. See Miscellaneous.

Mapping Controller VM Interfaces to Host Bridges

The control host VM interface connections to the bridges configured in Controller Host Network Configuration is done in the control_hosts: hierarchy of the control-host-nodes.yml file.

A sample configuration snippet:

The first interface—eth0—must connect to the bridge for the provision network to allow the VM to PXE boot. The other interface names must be sequential, which matches the sample configuration snippet. You should configure one interface for each bridge.

Appformix VM Configuration

AppFormix is provided with the Contrail Cloud bundle. AppFormix provides monitoring and troubleshooting for the networking and server infrastructure of Contrail Cloud. Appformix provides the same services for the workloads running in Contrail Cloud. For additional information on Appformix, see the Appformix TechLibrary page.

Appformix provides a WebUI as well as a REST API. The WebUI and the REST API are exposed to the Internal API and External networks.

The recommended Appformix deployment—which is also the default deployment—is deployed in a 3-node configuration for high availability.

Appformix node configuration for Contrail Cloud is defined in the site.yml file. A sample configuration snippet:

The network configuration of Appformix is defined in the overcloud-nics.yml file.

A sample configuration snippet:

The following resources are automatically monitored when AppFormix is installed in Contrail Cloud.

  • Openstack API endpoints and processes

  • Contrail API endpoint and processes

  • Openstack MySQL

  • Rabbit cluster status

  • Compute nodes including vRouter, Nova compute, operating system health and metrics

Appformix is not automatically configured to monitor the physical networking infrastructure, but adapters (without any configuration) for monitoring network devices can be installed during a Contrail Cloud deployment. AppFormix must be configured manually to monitor network devices after deployment. See the Network Devices section of the Appformix User Guide.

Network-related adapters can be installed for Appformix during a Contrail Cloud deployment in the site.yml file.

A sample configuration snippet:

Custom Appformix plugins can also be installed during a Contrail Cloud deploying in the site.yml file.

A sample configuration snippet:

For more information, see Extensibility Using Plug-Ins in the Appformix User Guide.

Contrail Service Node (CSN) - Optional

The Contrail Service Node (CSN) provides DHCP, ARP, and multicast services when Contrail is managing the full lifecycle of bare-metal servers, including provisioning the OS. CSNs are not needed in Contrail Cloud deployments that aren’t using bare-metal servers.

The triple-O templates for Contrail Cloud 13 release continue to use the Red Hat term ToR Service Node(TSN) to refer to the CSN function in Contrail Cloud. This reference architecture, therefore, uses both terms.

To enable CSN support in Contrail Cloud, edit the compute_hosts: hierarchy in the site.yml file.

A sample configuration snippet:

TSN VMs are created on all controller hosts by default. We recommend running TSNs on two control hosts in Contrail Cloud environments that include at least one bare-metal server (BMS). TSN VMs should not be run in Contrail Cloud environments that are not using a BMS. You can change the number of TSN instances in the control_hosts: hierarchy in the site.yml file.

Compute Node Configuration

Compute nodes in Contrail Cloud are installed in racks and connected to a pair of top-of-rack (ToR) switches and a management switch. The ToR switches are the leaf nodes in the EVPN-VXLAN IP Fabric.

The networking for compute nodes is configured to place compute nodes in racks that compose separate Layer 3 subnets. Each rack is it’s own separate layer 3 for the networks that are used by tenant workloads. Figure 5 illustrates this compute node networking structure.

Figure 5: Compute Node Network ConnectionsCompute Node Network Connections

Figure 6 illustrates how the physical and logical interfaces of a compute node connect it to a management switch and the IP Fabric leaf switches.

Figure 6: Compute Node Network InterfacesCompute Node Network Interfaces

Figure 7 illustrates networks and port assignments to bridges on a compute node.

Figure 7: Compute Node to Bridge ConnectionsCompute Node to Bridge Connections

Table 5 shows which files are used for configuring each network connection on a compute node.

Table 5: Files for Configuring Networks for a Compute Node
Port/NIC Configuration

IPMI

Address is entered an input to the inventory.yml file.

br-eno1

1 x 1G/10G NIC - untagged interface (e.g. built-in copper interface) with addresses generated automatically from the provisioning network pool.

br-eno2

1 x 1G/10G NIC - Optional Management network, untagged interface with address defined in the overcloud: network: hierarchy in the site.yml file.

br-bond0

2 x 10G/25G NICs made of first ports from both NICs. Untagged interface with Tenant network. Bond physical allocation is defined in the overcloud-nics.yml file and addressing is set in the overcloud: network: hierarchy within the site.yml file.

br-bond1

2 x 10G/25G NICs made of second ports from both NICs. Tagged interface with Internal-API and Storage networks for the leaf. Bond physical allocation is defined in the overcloud-nics.yml file and addressing is set in the overcloud: network: hierarchy within the site.yml file.

Provisioning and optional Management networks are connected via out-of-band management switches and must be configured as a Layer 2 stretch across the management switches.

Compute Node Networking

The data plane interfaces of compute nodes can be configured to support the following forwarding methods.

Table 6: Compute Node Forwarding Methods

Kernel-mode

The vRouter forwarding function is performed in the Linux kernel by replacing the default Linux bridge code with Contrail Networking code.

DPDK

vRouter runs in a user space in a specified number of cores.

SR-IOV

VM or container interface connects directly to the NIC, bypassing the vRouter.

Your traffic forwarding method choice depends on the traffic profile expectations for each individual compute node. A Contrail Cloud environment can have different compute nodes configured with different interface types, and workloads can be placed on the most appropriate compute node using various technologies, such as OpenStack availability zones.

The IP address of the management interface and the addressing and configuration of bond interfaces are configured in the overcloud-nics.yml file.

Kernel-Mode vRouter Configuration

The vRouter vhost0 interface is connected into the br-bond0 bridge in compute nodes which run the vRouter in kernel-mode. The br-bond0 bridge is connected via a bond interface to the Tenant network.

Figure 8 illustrates these connections.

Figure 8: Kernel Mode vRouterKernel Mode vRouter

Compute Resources

The following guidelines should be following to optimize vRouter performance.

  • Guarantee minimum 4 cores for host operating system and minimum 2 of them can be used by vRouter kernel module. There is no mechanism to allocate cores but we assuming that Host OS processes will consume no more than 2 cores and remaining 2 a kernel scheduler will allocate for vRouter.

  • Guarantee minimum 8GB RAM for host operating system where 4GB will be used for vRouter.

vRouters running in kernel mode should achieve up to 500kpps per vRouter (<1000 flows in a table) when these guidelines are followed.

Bond Interface Configuration

The following options should be set for bond interfaces when kernel mode is used:

  • bond_mode: 4 (IEEE 802.3ad)

  • bond_policy: layer3+4

The bond configuration is defined in profile definitions in the overcloud-nics.yml file:

Optimizations

We recommend increasing the maximum number of flows per vRouter from the default value of 500,000 to 2 million flows to increase performance and scaling. We also recommend allocating 4 threads for flow processing.

These configuration parameters are specified in the site.yml file. A configuration snippet:

Adding Kernel-Mode Compute Nodes

A compute node operates in kernel mode when it is added into the compute_nodes_kernel: hierarchy in the compute-nodes.yml file.

A sample configuration snippet:

Complete Leaf/Profile Configuration Snippet

This section provides a full configuration network snippet from the overcloud-nics.yml file for a compute node.

Profiles for network configuration in the overcloud-nics.yml file are written in this format:

where:

  • role is one of the following options:

    • ComputeKernel - for vRouter in kernel mode

    • ComputeDpdk

    • ComputeSriov

    • CephStorage

  • leaf number - number or name of the leaf device. For instance, ”0” or “leaf0”

  • hardware profile tag - any name to define a profile tag. We strongly recommend using the Hw[number] format for your hardware profile tags.

Examples:

Compute node DPDK in leaf “0” with hardware profile “hw1”

Compute SR-IOV in leaf “0” with hardware profile “hw1”

Compute kernel mode in leaf “0” with hardware profile “hw1”

The sample compute-nodes.yml file:

Note:

MTU settings are inherited from network settings in the site.yml file. The name is in “heat” notation (camel case) with added MTU value at the end.

Note:

Use validation tools to check numbered to named NICs allocation. See Miscellaneous.

In the example above, the default route is configured using the management network, assuming it provides access for administrators to external resources. If the management network is not present, the default route should be via the provision network, for which SNAT to the Intranet is configured on the jumphost. In this scenario, the default: true statement should be in the provision: section of the compute-nodes.yml file. It is also possible to put the default route on the tenant network, which is useful if SNAT is used in tenant networks.

DPDK-mode vRouter Configuration

A vRouter in DPDK mode runs in a user space on the compute node. Network traffic is handled by a special DPDK dedicated interface or interfaces that handle VLANs and bonds. A specified number of cores is assigned to perform the vRouter forwarding function.

Figure 9 illustrates a vRouter in DPDK mode.

Figure 9: DPDK vRouterDPDK vRouter

DPDK vRouters provide higher throughput than kernel vRouters. See Configuring the Data Plane Development Kit (DPDK) Integrated with Contrail vRouter for additional information on DPDK in Contrail networking.

DPDK Bond Interface Configuration

LACP bonding is under DPDK control when a vRouter is in DPDK mode. There is no Linux bond.

LACP bonding is defined in the overcloud-nics.yml file and configured using these options:

  • bond_mode: 4 (IEEE 802.3ad)

  • bond_policy: layer3+4

A sample overcloud-nics.yml file for the bond configuration:

The LACP rate with DPDK is taken during the LACP negotiation process between switches acting as LACP partners and applied to what is configured on the switches. Most Juniper switches are configured in fast LACP mode by default, and the vRouter applies that setting.

If you want to force LACP into fast LACP mode, set the LACP_RATE: field to 1 in the site.yml file:

Performance Tuning for DPDK

To maximize throughput for a vRouter in DPDK mode, set the following parameters:

  • Allocate 4 threads for flow processing in the vRouter agent.

  • Allocate huge pages.

  • Allocate CPUs for DPDK, Host OS, and Nova.

  • Increase the flow table size to 2 million from the 500,000 default setting.

  • Increase the buffer sizes to 2048 to reduce packet drops from microbursts.

  • Double the vrouter memory pool size to 131072.

  • Set the CPU scaling governor into performance mode.

Enabling CPU performance mode

CPU frequency scaling enables the operating system to scale the CPU frequency to save power. CPU frequencies can be scaled automatically depending on the system load, in response to ACPI events, or manually by user space programs. To run the CPU at the maximum frequency, set the scaling_governor parameter to performance.

In the BIOS all power saving options should be disabled, including power performance tuning, CPU P-State, CPU C3 Report and CPU C6 Report. Select Performance as the CPU Power and Performance policy.

The configuration can be defined in the site.yml file as an extra_action (post deployment action) parameter.

A sample configuration snippet from the site.yml file:

Configuring Flow Threads and Huge Pages

Memory that does not need to be used by the operating system should be segmented into huge pages to maximize efficiency. For instance, suppose a server with 256GB RAM needs 8GB for the OS (including the vRouter) and therefore has 248GB remaining for other functions. This server should allocate the remaining memory into a 1GB huge page to maximize memory usage.

Additionally, some 2MB huge pages should be configured to maximize memory usage for the vRouter.

The number of threads used for flow processing and huge page allocations are configured in the site.yml file.

A sample configuration snippet from the site.yml file:

CPU Allocation

CPU partitioning must be properly defined and configured to optimize performance. CPU partitioning issues can cause transient packet drops in moderate and high throughput environments.

Consider the following parameters when planning physical CPU core assignments:

  • Numa topology

  • Usage of hyperthreading (HT)

  • Number of cores assigned for vrouter DPDK

  • Number of cores allocated to VMs

  • Number of cores left for system processes (include vRouter agent)

The following core mapping illustrates a NUMA topology for a CPU with 2 NUMAs. Each NUMA has 18 physical cores and supports hyper-threading.

Output key:

  • Blue: allocated to DPDK

  • Red: allocated to Nova for VMs

  • Green: should not be allocated.

  • Black: remainder used for operating system

Note:

Cores 0 and 1 with their corresponding HT siblings must not be allocated for either DPDK or Nova.

Six cores, without hyperthreading, are allocated for vRouter and are shown in blue in the example snippet. Our test results found that six cores is the maximum number that should be allocated in this environment, since multi-queue virtio does not handle larger core numbers effectively. The number of cores delivering maximum throughput may vary with different hardware specifications. Use of hyperthreading for DPDK has been shown to cause reduced throughput.

If more cores are used for DPDK, then the vr_mempool__sz parameter should be modified from the value suggested below, according to the formula:

vr_mempool_sz = 2 * (dpdk_rxd_sz + dpdk_txd_sz) * (num_cores) * (num_ports)

where:

  • num_cores is the number of cores allocated for DPDK (including HT siblings, which are not recommended)

  • num_ports is the number of physical ports in the DPDK bond interface

  • dpdk_rxd_sz is the number of receive buffer descriptors

  • dpdk_txd_sz is the number of transmit buffer descriptors

The default values for dpdk_txd_sz and dpdk_rxd_sz are set at 128 descriptors, but Intel recommends setting these values to 2048 descriptors to handle microbursts. We recommend not exceeding the 2048 descriptors setting as larger settings can cause unexpected latencies.

The cores that are used for virtual machine workloads are set using Nova CPU pinning, shown in red in the example snippet.

The operating system is allocated 4 physical cores with hyper-threading in this setup, as shown in black in the example snippet. The first cores on each NUMA must be allocated for the operating system. These operating system core reservations are not explicitly set by the user; the reservations are implicitly specified when the NUMAs are not allocated for DPDK or Nova.

The DPDK cores are all on NUMA0 in this architecture, which is where the corresponding NICs are located. It is good practice to place as many OS cores on NUMA1 as possible to create an environment where a higher proportion of VM workloads run on NUMA0 where network performance is maximized.

DPDK core allocations are defined in the overcloud-nics.yml file using hardware profiles.

A sample configuration snippet from the overcloud-nics.yml file:

DPDK parameters—including the maximum number of flows and buffer sizes as well as parameters related to Nova pinning and general Nova functions—are defined in the extra-config: hierarchy in the site.yml file.

A sample configuration snippet from the site.yml file:

Note:

The tuning profile cpu-partitioning causes the cpu_affinity value in the tuned.conf file to be set to the set of cores that are not in the IsolCpusList.

Adding DPDK Compute Nodes

Compute nodes that run in DPDK mode are identified in the compute_nodes_dpdk hierarchy of the compute-nodes.yml file.

A sample configuration snippet from the compute-nodes.yml file:

SR-IOV Mode Compute Nodes

A compute node in SR-IOV mode provides direct access from the NIC to a VM. Because network traffic bypasses the vRouter in SR-IOV mode, no network policy or flow management is performed for traffic. See Configuring Single Root I/O Virtualization (SR-IOV) for additional information on SR-IOV in Contrail networking.

Figure 10 illustrates the VM connections in computes nodes using SR-IOV mode.

Figure 10: Compute Node—VM and vRouter Connections in SR-IOV ModeCompute Node—VM and vRouter Connections in SR-IOV Mode

In SR-IOV terminology, a NIC is represented by Physical Function (PF) and the VMs, which are viewed as virtual versions of the NIC, are referred to as Virtual Functions (VFs). A VM with multiple interfaces can connect using overlay networks on some interfaces and SR-IOV on other interfaces.

VMs can use the overlay network or go directly to an underlay network through VFs using SR-IOV. This configuration is performed in the compute_nodes_sriov: hierarchy within the compute-nodes.yml file.

A configuration snippet from the compute-nodes.yml file.

Note:

SR-IOV must be enabled in BIOS.

The two types of vRouter deployments for SR-IOV:

  • vRouter in Kernel mode on top of SR-IOV

  • vRouter in DPDK mode on top of SR-IOV

Figure 11 illustrates the vrouter in both modes.

Figure 11: SR-IOV—Kernel Mode and DPDK ModeSR-IOV—Kernel Mode and DPDK Mode

When SR-IOV is enabled on compute nodes, each vRouter is attached to a bond interface configured from the SR-IOV Physical Functions (PF). A VM interface that is in a network that has SR-IOV enabled is connected to a NIC Virtual Function (VF) and exposed to the fabric underlay network by Nova PCI passthrough.

You can select SR-IOV-mode kernel or dpdk mode in the site.yml file.

A configuration snippet from the site.yml file:

Note:

Interface names must be used in the sriov: hierarchy in the site.yml file. See Interface naming conventions for server naming details.

The bond needs to be configured in the overcloud-nics.yml file.

A sample configuration snippet from the overcloud-nic.yml file:

In the compute_hosts: section of the site.yml file, SR-IOV must be enabled and the mode must be set to kernel or dpdk.

In the num_vf section of the site.yml file, set the number of virtual functions (VFs) that are allocated for NIC interfaces by the operating system. In this sample configuration snippet, 7 VFs are allocated. VF0 from ens7f1 and ens7ens2 are allocated for Nova PCI passthrough - provider networks with names: sriov1 and sriov2. The interface names are those seen when logging into a server using IPMI. The network names are arbitrary and only used inside the SR-IOV configuration.

The following snippet displays the ip link command output from a host when SR-IOV is enabled.

Storage Nodes (Ceph OSD)

The storage nodes run Ceph software in Contrail Cloud. For additional information on using Red Hat Ceph storage software, see Product Documentation for Red Hat Ceph Storage.

We recommend following these guidelines to optimize Ceph software performance in your Contrail Cloud deployment:

  • Storage nodes must be separate from compute nodes in a Contrail Cloud environment. Hyperconverged nodes are not supported.

  • Ceph requires a minimum of three storage nodes to operate.

If storage other than Ceph configured through Contrail Cloud is needed for your deployment, your deployment design might require additional support and engagement to ensure contractual support. Send an email to mailto:sre@juniper.net before moving forward with non-Ceph storage providers to ensure your deployment remains in compliance with your support contract.

Ceph Configuration

Ceph storage configuration is defined in the site.yml file.

Disk mapping for Ceph Storage nodes need to be defined on overcloud: disk_mapping: hierarchy for each storage node. The recommended disk settings are 4 OSD disks + 1 SSD journal disk. This configuration can be scaled appropriately; for instance, 8 OSD disks + 2 SSD journal disks for a larger environment. CPU, RAM, and storage traffic needs to be considered when using multiple 4 OSDs+1 journal bundles.

The mapping section of the site.yml file allows disks to be referenced by name rather than by device id.

A sample site.yml configuration file snippet to configure disk mappings:

This sample site.yml configuration file snippet configures Ceph storage for a 4 OSD disks + 1 SSD journal bundle:

The disk layouts, notably, are per hardware profile.

Ceph service configuration hides the complexity of computing placement groups, by default. This configuration can be manually provided in the site.yml file if required, such as in cases where the environment has a small number of OSDs.

Storage Node Network Configuration

In this reference architecture, separate bonds are used on storage nodes for the Storage Mgmt (replication) and Storage (user access) networks.

Figure 12 illustrates these connections.

Figure 12: Storage Node NetworksStorage Node Networks

Figure 13 illustrates how the physical and logical interfaces of a storage node connect to a management switch and leaf switches.

Figure 13: Storage Node—Physical and Logical InterfacesStorage Node—Physical and Logical Interfaces

Figure 14 illustrates how the physical interfaces of a storage host connect to the bridges configured on it.

Figure 14: Storage Node—InterfacesStorage Node—Interfaces

Table 7 shows which files are used for configuring each network connection on a storage node.

The IP address of the management interface and addressing and configuration of bond interfaces on a storage node are configured according to the configuration files that are used by the Contrail Cloud provisioning system. The ports on each storage host are configured as follows.

Table 7: Storage Node Interface Naming Configuration Files
Port/NIC Configuration

IPMI

Address is entered as an input to the inventory.yml file.

br-eno1

1 x 1G/10G NIC - untagged interface (e.g. built-in copper interface) with addresses generated automatically from the provisioning network pool.

br-eno2

1 x 1G/10G NIC - Optionally Management network, untagged interface with address defined in the overcloud: network: section of the site.yml file.

br-bond0

2 x 10G/25G NICs made of first ports from both NICs. Untagged interface with Storage network. Bonds physical allocation defined in the overcloud-nics.yml file. Addressing is set in the overcloud: network: section of the site.yml file.

br-bond1

2 x 10G/25G NICs made of second ports from both NICs. Untagged interface with Storage Mgmt network. Bonds physical allocation is defined in the overcloud-nics.yml file. Addressing is set in the overcloud: network: section of the site.yml file.

Linux bond configuration is defined per storage host in the overcloud-nics.yml file,

A sample configuration snippet from the overcloud-nics.yml file:

Where:

  • NIC-0 - onboard copper NIC 1/10G

  • NIC-1 and NIC-2 - 2 ports 10G/25G/40G Intel Fortville family NICs in PCI slot connected to NUMA0