ON THIS PAGE
Contrail Cloud Configuration
This chapter covers Contrail Cloud configuration.
Contrail Cloud Configuration File Structure Overview
Contrail Cloud is configured using YAML files. A series of pre-configured YAML file templates are provided as part of a Contrail Cloud installation. These user-configurable YAML files are downloaded onto the jumphost server during the initial phase of the Contrail Cloud installation. The YAML files can be accessed and edited by users from within the jumphost.
Contrail Cloud configuration changes are applied using Ansible playbook scripts, which are also provided as part of the Contrail Cloud bundle. The Ansible playbooks read the configuration provided in the YAML files. The Ansible playbook scripts populate parameters from the YAML files into a second set of configuration files that are used by RedHat Openstack Director to provision servers and configure the components of Contrail Cloud.
See Deploying Contrail Cloud for additional information on YAML file locations and configuration updating procedures.
Table 1 lists commonly-used YAML file parameters in Contrail Cloud and provides a summary of the purpose of each parameter.
YAML File Parameter |
Purpose |
site.yml | |
global: |
DNS, NTP, domain name, time zone, satellite URL, and proxy configuration for the deployment environment. |
jumphost: |
Provision NIC name definition and PXE boot interface for the jumphost. |
control_hosts: |
Control host parameters. Includes disk mappings for bare metal servers and control plane VM sizing per role for functions like analytics. |
compute_hosts: |
Parameters for SR-IOV, DPDK, and TSN in compute nodes. Root disk configuration per hardware profile. |
storage_hosts: |
Ceph and block storage profiles definition for storage nodes. |
undercloud: |
Nova flavors for roles. Applicable when using additional hardware profiles. |
overcloud: |
Hardware profile and leaf number-based:
|
ceph: |
Ceph enablement and disk assignments (pools, OSDs) on storage nodes. |
ceph_external: |
Externally deployed Ceph integration parameters. |
appformix: |
Enable HA, VIP IPs, and network devices monitoring for Appformix. |
inventory.yml | |
inventory_nodes: |
Name, IPMI IP, Ironic driver used for LCM, root disk, and other related functions for all Contrail cluster nodes. |
control-host-nodes.yml | |
control_host_nodes: |
Internal IP and DNS (per control node) for control hosts and the control plane.. Statically added IPs for controllers need to be outside of DHCP pools for networks that use them. |
control_host_nodes_network_ config: |
Bridges, bonds, DHCP/IP, and MTU for control hosts. |
control_hosts: |
VM interface to bridge on control-host mapping. |
overcloud-nics.yml | |
contrail_network_config:controller_network_config:appformixController_network_config:computeKernel_network_config:compute_dpdk_network_config:cephStorage_network_config: |
Interface to network mapping, routes, DHCP-IP allocation, bonds, VLAN to interface maps, and bond options for control, storage, and compute nodes. |
compute-nodes.yml | |
compute_nodes_kernel:compute_nodes_dpdk: compute_nodes_sriov: |
Mapping hosts from inventory to compute roles and profiles for compute nodes. |
storage-nodes.yml | |
storage_nodes: |
Names of storage nodes. |
vault-data.yml | |
global: |
satellite key and contrail user password for the Red Hat Open Stack Vault function. |
undercloud: overcloud: control_hosts: |
VM & Bare metal server (BMS) passwords for Contrail Cluster nodes and the undercloud when using the Red Hat Open Stack Vault function. |
appformix: |
MySQL and RabbitMQ passwords for Appformix when using the Red Hat Open Stack Vault function. |
ceph_external: |
Client key used by Ceph External with the Red Hat Open Stack Vault function.. |
inventory_nodes: |
IPMI credentials for Contrail cluster nodes when using the Red Hat Open Stack Vault function. |
The Ansible playbooks initially read variables from the default.yml file. Configuration files are then read in the order presented in Table 1. Variables are stored and updated in the default.yml file as the script runs. If the same variable has different configuration settings in different YAML files, the setting in the YAML file that is read later in the configuration file processing order is implemented.
Sample YAML files should be copied from the /var/lib/contrail_cloud/samples directory to the /var/lib/contrail_cloud/config directory and updated according to the requirements of the current deployment. Parameter values in the files in the /var/lib/contrail_cloud/samples directory can be used as default values in most cases where guidance for setting values is not given in this document. See Contrail Cloud Deployment Guide.
Hardware Profiles
A hardware profile allows administrators to apply the same configuration to a group of servers acting as compute or storage nodes.
As new servers are added to a deployment, each server might have different disk and network hardware. The method that networks and storage are configured may differ between servers.
Servers are associated to hardware profiles in the compute-nodes.yml file. The leaf number is also set in this file. This sample configuration snippet from the compute-nodes.yml file shows a hardware profile configuration:
compute_nodes_kernel: - name: compute-1-rack0 #Compute name leaf: '0' #Leaf number profile: hw0 #Server Hardware profile tag - name: compute-1-rack1 leaf: ‘1' profile: hw1 - name: compute-1-rack2 leaf: '2' profile: hw1 - name: compute-2-rack2 leaf: '2' profile: hw2
The hardware profile configurations are applied to servers using the site.yml and overcloud-nics.yml files, using the sections of the files that use the following semantic:
[role][leaf number][hardware profile tag]
where:
The role is one of the following values: ComputeKernel, ComputeDPDK, ComputeSriov, CephStorage
The leaf number is the number of the leaf. For instance, 0.
The Hardware profile tag is any alphanumeric string starting with a capital letter. We strongly recommend using the Hw[number] convention.
For example, the following sample names could be used for compute nodes of different types in leaf 0:
ComputeKernel0Hw1 for kernel-mode
ComputeDpdk0Hw1 for DPDK
ComputeSriov0Hw1 for SR-IOV
Two sample hardware profile configurations within the site.yml file are provided below. The example allocates different SCSI disks for local VM ephemeral storage using variable HCTL.
overcloud: # Contains a list of label to disk mappings for roles disk_mapping: ComputeKernel1Hw0: #compute in leaf 1 with hardware profile label ‘hw0’ - label: ephemeral-0 hctl: '7:0:0:0' - label: ephemeral-1 hctl: '8:0:0:0' ComputeKernel2Hw1: #compute in leaf 2 with hardware profile label ‘hw1’ - label: ephemeral-0 hctl: '5:0:0:0' - label: ephemeral-1 hctl: '6:0:0:0'
overcloud: [...] network: [...] internal_api0: heat_name: InternalApi0 cidr: "172.16.1.0/24" default_route: 172.16.1.1 role: - ComputeDpdk0Hw1 - ComputeSriov0Hw1 - ComputeKernelHw1 [...]
Example of hardware profiles in the overcloud-nics.yml file are provided later in this reference architecture.
Network Configuration
This section describes how the networks are configured in Contrail Cloud. All properties of the networks—with the exceptions of the IPMI and Provision networks—are specified in the network: section within the site.yml file.
- IPMI Network
- Provision Network
- External Network
- Management
- Internal API, Tenant, Storage, and Storage Management Networks
IPMI Network
The IPMI network is generally a set of network addresses reserved for hardware management within a data center. The management switches must be configured with access to the default gateway for the IPMI network. Servers in the environment must have IP addresses statically allocated for IPMI or sent via DHCP using the MAC address as the key for address allocation.
Provision Network
The properties of the provision network are configured in the undercloud: section of the site.yml file.
A sample configuration snippet from the site.yml file:
undercloud: vm: network: provision: #undercloud VM ip ip: 192.168.212.1 cidr: "192.168.212.0/23" gateway: 192.168.212.1 #undercloud_dhcp_start (from this range hosts will have IPs in provision network dhcp: start: 192.168.212.20 end: 192.168.213.200 inspection: #undercloud_inspection_ip_range ip_range: start: 192.168.213.201 end: 192.168.213.253 overcloud: network: control: # In Contrail Cloud and recent tripleO the network is called provisioning heat_name: ControlPlane default_route: 192.168.212.1 cidr: "192.168.212.0/23" mtu: 9100
The control network in the overcloud is the same as the provision network in the undercloud.
The example configuration provides for up to two hundred compute nodes and fifteen IP address for control and storage nodes.
The inspection block specifies a range of IP addresses that the installer introspection service uses during the PXE boot and provisioning process. Use the ip_range variable to define the start and end values within this range. When batch provisioning is used—which is recommended in this reference architecture—only a small number of these addresses are in use at the time. The range, therefore, can be much smaller than the DHCP range.
A server PXE boots from the provisioning network and receives an IP address via DHCP when it is provisioned. The boot preference is then changed to disk, the IP address is configured by the Ironic service into the operating system, and the server boots from the disk with the same IP address.
External Network
The External network is used for cloud users to access the public API addresses of the control hosts. A VIP IP address as well as a pool of IP addresses that can be used by DHCP is specified.
The External network parameters are set in the site.yml file.
A sample External network configuration snippet from the site.yml file:
network: external: cidr: "192.168.176.0/25" vlan: 305 vip: "192.168.176.100" pool: start: "192.168.176.10" end: "192.168.176.99" mtu: 9100
There are only a limited number of control hosts. The external network, therefore, can be a small subnet of IP addresses.
The External network is associated with an interface on control hosts in the control-host-nodes.yml files.
Management
The properties of the management network are configured in the network: section of the site.yml file.
A sample Management network configuration snippet from the site.yml file:
overcloud: network: [...] management: heat_name: Management # Network name used by TripleO Heat Templates cidr: "192.168.0.0/23" mtu: 9100 start: 192.168.0.5 # Range end for the DHCP pool end: 192.168.1.220 [...]
Internal API, Tenant, Storage, and Storage Management Networks
Red Hat Openstack 13 (RHOSP 13) supports the capability to use separate subnets per rack. This feature is used for the networks that are connected to the IP Fabric in this reference architecture. These are the internal API, tenant, storage and storage management networks. Each of these networks is assigned a supernet IP address range (/16), which includes all the rack subnets (/24) for that network.
The concept of spine-leaf networking in TripleO is described in
Red Hat OpenStack Platform 13: Spine Leaf Networking
TripleO uses the term leaf to group servers that have shared connectivity. In this Contrail Cloud reference architecture, leaves in the TripleO context refer to grouped servers in the same rack. In this reference architecture, a Red Hat leaf is implemented by a management switch and a pair of top of rack switches (ToR switches).
The names of networks in configuration files follow the convention established in the compute and storage nodes for a rack, or a leaf. The names are a concatenation of base network name and leaf number. Leaf numbers are assigned to nodes in the compute_nodes.yml file. Base network names include InternalApi, Storage, StorageMgmt, and Tenant networks.
Compute nodes in Leaf 0 should use networks:
InternalApi0
Storage0
Tenant0
Compute nodes in Leaf 1 should have defined networks:
InternalApi1
Storage1
Tenant1
Compute nodes in Leaf N should have defined networks:
InternalApiN
StorageN
TenantN
For an example of splitting a supernet, see the example below for the Internal API and Tenant networks. The same procedure must be performed for the remaining networks used for compute and storage nodes (Storage, Storage Mgmt).
In the following configuration snippet, these parameters are set for all networks:
supernet - supernet definition.
cidr - subnet configuration for a leaf with the first subnet used for the controllers.
default_route - static route definition pointing to a “supernet” via a given operating system interface, such as bond1, vhost0, or other interfaces.
vrouter_gateway - default route definition for vrouter encapsulated traffic in the overlay network. This variable is defined as a gateway parameter in the /etc/contrail/contrail-vrouter-agent.conf file. This gateway IP address is used to reach DC gateways (MX routers and other vRouters to setup MPLSoUDP or other overlays).
role - the role or hardware profile that assigns the subnet. The first subnet is always for controllers and the remaining subnets are assigned to compute and storage nodes based on the leaf identifier.
When a network is specified to share an interface in the role_network_config.yml file, the network is assigned a VLAN. The VLAN can be identical in all racks when VXLAN is used, which is how the VLANs are labelled in this reference architecture.
overcloud: network: internal_api: # This is the network for control nodes (no suffix) heat_name: InternalApi supernet: "172.16.0.0/16" cidr: "172.16.0.0/24" default_route: 172.16.0.1 vlan: 100 mtu: 9100 pool: start: 172.16.0.100 end: 172.16.0.199 vip: 172.16.0.90 role: - ContrailController - ContrailAnalytics - ContrailAnalyticsDatabase - ContrailTsn internal_api0: # Network for nodes attached to leaf 0 heat_name: InternalApi0 cidr: "172.16.1.0/24" # Leaf subnet default_route: 172.16.1.1 vlan: 100 mtu: 9100 vip: false pool: start: 172.16.1.100 end: 172.16.1.200 role: - ComputeDpdk1Hw2 - ComputeSriov1Hw4 - ComputeKernel1Hw1 - ComputeKernel1Hw0 internal_api1: # Network for nodes attached to leaf 1 heat_name: InternalApi1 cidr: "172.16.2.0/24" default_route: 172.168.2.1 vlan: 100 mtu: 9100 vip: false pool: start: 172.16.2.100 end: 172.16.2.199 role: - ComputeDpdk1Hw3 # Different hardware profiles for this leaf - ComputeSriov1Hw5 - ComputeKernel1Hw1 - ComputeKernel1Hw0 internal_api2: # Network for nodes attached to leaf 2 heat_name: InternalApi2 cidr: "172.16.3.0/24" default_route: 172.16.3.1 vlan: 100 mtu: 9100 vip: false pool: start: 172.16.3.100 end: 172.16.3.199 role: - ComputeDpdk2Hw3 - ComputeSriov2Hw5 - ComputeKernel2Hw1 - ComputeKernel2Hw0 internal_api3: [...] tenant: # This is the network for control nodes (no suffix) heat_name: Tenant supernet: "172.18.0.0/16" cidr: "172.18.0.0/24" default_route: 172.18.0.1 # passed to host OS vrouter_gateway: 172.18.0.1 # passed to vRouter for encapsulated traffic vlan: 200 mtu: 9100 vip: false pool: start: 172.18.0.100 end: 172.18.0.199 role: - ContrailController - ContrailAnalytics - ContrailAnalyticsDatabase - ContrailTsn tenant0: heat_name: Tenant1 cidr: "172.18.1.0/24" default_route: 172.18.1.1 # passed to host OS vrouter_gateway: 172.18.1.1 # passed to vRouter for encapsulated traffic vlan: 200 mtu: 9100 vip: false pool: start: 172.18.1.100 end: 172.18.1.199 role: - ComputeDpdk0Hw2 - ComputeSriov0Ww4 - ComputeKernel0Hw0 - ComputeKernel0Hw1 tenant1: heat_name: Tenant1 cidr: "172.18.2.0/24" default_route: 172.18.2.1 vrouter_gateway: 172.18.2.1 [...} tenant2: heat_name: Tenant2 cidr: "172.18.3.0/24" default_route: 172.18.3.1 vrouter_gateway: 172.18.3.1 [...] tenant3: heat_name: Tenant3 cidr: "172.18.4.0/24" default_route: 172.18.4.1 vrouter_gateway: 172.18.4.1 [...] storage # Storage and storage_mgt have same format [...] storage_mgmt [...]
Example Networks Used in this Reference Architecture
Table 2 presents a sample addressing scheme in a Contrail Cloud environment with four racks. This addressing scheme is used in the configuration file examples in this reference architecture.
Network | Supernet | Subnet |
Provision |
192.168.212.0/23 |
|
External |
10.10.10.0/25 |
|
internal_api |
172.16.0.0/16 |
172.16.0.0/24 |
internal_api[0-3] |
172.16.1-4.0/24 |
|
management |
192.168.0.0/23 |
|
Storage |
172.19.0.0/16 |
172.19.0.0/24 |
storage[0-3] |
172.19.1-4.0/24 |
|
storage_mgmt |
172.20.0.0/16 |
172.20.0.0/24 |
storage_mgmt[0-3] |
172.20.1-4.0/24 |
|
Tenant |
172.18.0.0/16 |
172.18.0.0/24 |
tenant[0-3] |
172.18.1-4./24 |
In the site.yml file, the provision network is subdivided into an inspection block with addresses that are used during PXE booting. These addresses are configured onto servers during provisioning.
Supernet addresses are specified for networks that contain a rack with a separate subnet. The supernet address is used in a static route on servers to send inter-rack traffic through the correct interface and corresponding VLAN.
Batch Deployment Configuration
Juniper Networks recommends running compute and storage node deployments in batches of 5 to 10 nodes. We make this recommendation based on the potential for timeouts during the Triple0 Heat automation process during larger deployments.
Batch deployments are configured in the site.yml file. A sample batch deployment configuration snippet from the site.yml file:
# To use batch deployment, you will need to run openstack-deploy.sh script multiple times (each run will deploy new nodes). overcloud: batch_deployment: CephStorage: 5 ComputeDpdk: 5 ComputeKernel: 5
In this configuration, 5 CephStorage, 5 ComputeDPDK, and 5 ComputeSriov nodes are deployed when the openstack-deploy.sh script is run. The script should be run repeatedly until it reports that there are no more nodes to be deployed.
Jumphost
The jumphost is the host from which an administrator initiates provisioning of a Contrail Cloud environment. This section covers jumphost configuration options.
- Jumphost Overview
- Adding an IP Address for the Jumphost to the Provision Network
- Contrail Command Configuration
Jumphost Overview
The jumphost:
hosts the undercloud. The undercloud is a VM responsible for provisioning and managing all control hosts, storage nodes, and compute nodes in a Contrail Cloud. All Contrail-related setup and configuration is performed through the undercloud in a Contrail Cloud.
stores Contrail Cloud configuration-related files. The YAML files that configure Contrail Cloud are stored on the jumphost. The Ansible scripts that apply the configurations made in the YAML files to the Contrail Cloud nodes are also stored on the jumphost.
The Contrail Cloud scripts are stored in the /var/lib/contrail_cloud directory.
hosts the Contrail Command web user interface virtual machine.
runs Red Hat Enterprise Linux with only base packages installed.
provides SNAT for traffic from the Provisioning network to intranet and external networks.
provides access to the servers of a Contrail Cloud environment if a management network is not present
A jumphost must be operational as a prerequisite for a Contrail Cloud installation. The jumphost should not run any virtual machines besides the undercloud and Contrail Command VMs.
Figure 1 illustrates the jumphost network connections.
The Intranet network is the network configured manually before the Contrail Cloud 13 packages are downloaded. This network is used by Contrail Cloud to download packages and to provide outside connectivity for the nodes in a Contrail Cloud environment to reach outside networks. The Intranet network IP address can be the IP address from the External network.
The jumphost is configured during the installation process to use an IP masquerade for SNAT of outbound traffic from hosts connected to the provisioning network. The “public” IP address on the jumphost—the IP address of the “Intranet” port—is used as a source address for traffic exiting the cloud during provisioning. This IP address should be permitted in firewalls to allow access to public repositories, Juniper Satellite servers, and Red Hat subscription managers. If external access is not permitted from the jumphost for security purposes, a Capsule proxy server can be used as described in Capsule Configuration. See Miscellaneous.
You can also configure a proxy, in which case the “Intranet” port should have access to the proxy. Note that the externally accessible VIP addresses for Openstack, Appformix, and Contrail APIs should be excluded from proxying (to avoid issues during provisioning).
A sample proxy configuration snippet from the site.yml file:
global: proxy: enabled: true port: 443 host: 10.10.10.10 exclude: '127.0.0.1,localhost, 192.168.212.1, 192.168.212.2, 10.10.10.50,10.10.10.51, 10.10.10.52'
Where 10.10.100.50, 10.10.10.51, 10.10.10.52 are VIPs in the External network.
Adding an IP Address for the Jumphost to the Provision Network
The jumphost should be allocated an IP address in the Provision network to enable SSH to the other hosts in the Contrail Cloud environment. This IP address allocation is a convenience to enable troubleshooting from the jumphost.
This IP address is added to the jumphost in the site.yml file as shown in this configuration snippet:
jump host: network: provision: # jump host nic IP to be used for provisioning (PXE booting) servers ip: "192.168.212.2" prefix: 23
Contrail Command Configuration
Contrail Command is a standalone solution based on two containers: contrail_command and ccontrail_psql. Contrail Command has no HA capabilities and only new UI services are provided. The Contrail Command VM is created on the jumphost.
The Contrail Command web UI can be reached in a Contrail Cloud environment by entering this URL in a web browser: https://[jumphost-IP-address]:9091
Contrail Command can be accessed after a Contrail Cloud deployment without user configuration. Contrail Cloud configuration parameters are updated in the site.yml file.
A sample site.yml file configuration snippet where the Contrail Command parameters are updated:
command: vm: [user and password section will be used from credentials provided in vault-data.yml] #command_vm_cpu_count cpu: 16 #command_vm_memory_size memory: 32 #command_vm_disk_size disk: 100 network: provision: #command_ip ip: 192.168.213.3 #command_prefix cidr: "192.168.213.0/24" #command_gateway gateway: 192.168.213.1
Authentication details for Contrail Command are provided in the vault-data.yml file.
A sample vault-data.yml file configuration snippet with modified Contrail Command attributes:
command: vm: # command user name user: "contrail" # password for the command vm user password: "c0ntrail123" # root password for the command VM root_password: "c0ntrail123" # backend database database_password: "c0ntrail123" # keystone admin password admin_password: "c0ntrail123" # Passphrase used to encrypt ssh key of command user. # If not defined ssh private key will not be encrypted. # ssh_key_passphrase: "c0ntrail123" vnc: # VNC console password for the command VM password: "contrail123"
Controller Node Configuration
This section describes the controller node configuration in Contrail Cloud. It also includes sections related to VM resources and networking for controller nodes.
- Control Node VM Resources
- Controller Host Network Configuration
- Mapping Controller VM Interfaces to Host Bridges
- Appformix VM Configuration
- Contrail Service Node (CSN) - Optional
- Compute Node Configuration
- Compute Node Networking
Control Node VM Resources
This reference architecture is designed to support a large-scale Contrail Cloud environment. The following memory resources should be allocated to controller VMs to support this architecture.
Role |
vCPU (Threads) |
Memory (GB) |
Disk (GB) |
---|---|---|---|
Undercloud VM |
28 |
128 |
500 |
OpenStack Controller VM* |
8 |
48 |
500 |
Contrail Analytics DB VM |
12 |
48 |
500 & 1000 |
Contrail Analytics VM |
12 |
48 |
250 |
Contrail Controller VM |
16 |
64 |
250 |
AppFormix VM |
16 |
32 |
500 |
TSN (Contrail Service Node) |
4 |
8 |
100 |
Control Host OS |
4 |
8 |
100 |
* The Openstack Controller VM size is significantly smaller than Red Hat’s recommended Openstack Controller VM size. The VM uses less resources in Contrail Cloud because several network functions are handled by Contrail Networking and telemetry functions are performed by Appformix. See Red Hat OpenStack Platform 13: Recommendations for Large Deployments to see the recommended VM sizes.
Operating system resources need to be reserved, not overtaken by resources allocated to controller VMs (assuming no oversubscription on controller hosts). There are no configuration file options to configure resources for operating systems.
The following configuration snippet from the control_hosts:vm: section of the site.yml files configures control host options:
control_hosts: vm: control: cpu: 8 memory: 48 disk: vda: size: 500 pool: ssd_storage hv: - rack0-node1 - rack1-node1 - rack2-node1 contrail-controller: cpu: 16 memory: 64 disk: vda: size: 250 pool: ssd_storage hv: - rack0-node1 - rack1-node1 - rack2-node1 contrail-analytics: cpu: 12 memory: 48 disk: vda: size: 250 pool: ssd_storage hv: - rack0-node1 - rack1-node1 - rack2-node1 contrail-analytics-database: cpu: 12 memory: 48 disk: vda: size: 500 pool: ssd_storage vdb: size: 1000 pool: spinning_storage hv: - rack0-node1 - rack1-node1 - rack2-node1 appformix-controller: cpu: 16 memory: 32 disk: vda: size: 500 pool: ssd_storage hv: - rack0-node1 - rack1-node1 - rack2-node1 contrail-tsn: cpu: 4 memory: 8 vda: size: 100 pool: default_dir_pool hv: - rack0-node1 - rack1-node1
Controller Host Network Configuration
Corresponding interfaces of controller hosts are in the same L2 network, and are preferably deployed in different racks and connected to leaf devices in the EVPN-VXLAN IP Fabric. Figure 2 illustrates these connections.
The following VMs are running on the control hosts.
OS - Openstack Controller
CC - Contrail Controller
CA - Contrail Analytics
CADB - Contrail Analytics DB
AFX - Appformix
TSN - ToR Service Node (optional)
A TSN is enabled in the control_hosts: vm: contrail-tsn: hierarchy within the site.yml file. The function has been renamed CSN (Contrail Service Node) in Contrail Networking but tripleO manifests continue to use the TSN term.
Figure 3 illustrates how the physical and logical interfaces on a control host connect to it’s management switch and leaf switches.
Physical NICs typically contain multiple physical interfaces. In Red Hat configuration files, however, the naming convention is to use nicN to indicate the Nth physical interface on a server. For information on finding the order of interfaces using an introspection command, see Miscellaneous.
Figure 4 illustrates how networks and ports are assigned to bridges on a controller node.
Physical NICs typically contain two ports which are named nic3 and nic4 in the first physical NIC.
Table 4 shows which files are used for configuring each network.
Port/NIC | Configuration |
IPMI |
Address is entered as an input to the inventory.yml file. |
br-eno1 |
1 x 1G/10G NIC - untagged interface (e.g. built-in copper interface) with addresses generated automatically from the provisioning network pool. |
br-eno2 |
1 x 1G/10G NIC - Management network, untagged interface with address defined in the overcloud: network: section of the site.yml file. |
br-bond0 |
2 x 10G/25G/40G NICs made of first ports from both NICs. Tagged interface with networks: Tenant, Internal API and External networks. Bond physical allocation is defined in the control-host-nodes.yml file. Addressing is set in the overcloud: network: section of the site.yml file. |
br-bond1 |
2 x 10G/25G/40G NICs made of second ports from both NICs. Tagged interface with networks: Storage and Storage Mgmt networks. Bond physical allocation is defined in the control-host-nodes.yml file. Addressing is set in the overcloud: network: section of the site.yml file. |
The configuration for bond interfaces is performed in the control_host_nodes_network_config: hierarchy of the control-hosts-nodes.yml file. Bond interfaces should be configured with the following parameters:
Linux bond - mode 4/LACP (802.3ad)
Hash policy - layer3+4
Use ovs-bridge and linux_bond. An OVS-bridge Linux bridge is configurable, but is not recommended by Red Hat
This configuration snippet from the control-hosts.nodes.yml file shows various control host configurations, including bond interface configurations.
control_host_nodes_network_config: - type: ovs_bridge name: br-eno1 addresses: - ip_netmask: "{{ host.control_ip_netmask }}" dns_servers: - "{{ host.dns_server1 }}" - "{{ host.dns_server2 }}" routes: - ip_netmask: "{{ host.control_ip_netmask }}" next_hop: "{{ host.control_gateway }}" default: true use_dhcp: false mtu: "{{ overcloud['network']['control']['mtu'] }}" members: - type: interface name: nic1 mtu: "{{ overcloud['network']['control']['mtu'] }}" - type: ovs_bridge name: br-eno2 use_dhcp: false mtu: "{{ overcloud['network']['management']['mtu'] }}" members: - type: interface name: nic2 - type: ovs_bridge name: br-bond0 use_dhcp: false mtu: 9100 members: - type: linux_bond name: bond0 use_dhcp: false mtu: 9100 bonding_options: "mode=802.3ad xmit_hash_policy=layer3+4 lacp_rate=fast miimon=100" members: - type: interface name: nic3 primary: true mtu: 9100 - type: interface name: nic4 mtu: 9100 - type: ovs_bridge name: br-bond1 use_dhcp: false mtu: 9100 members: - type: linux_bond name: bond1 use_dhcp: false mtu: 9100 bonding_options: "mode=802.3ad xmit_hash_policy=layer3+4 lacp_rate=fast miimon=100" members: - type: interface name: nic5 primary: true mtu: 9100 - type: interface name: nic6 mtu: 9100
Use validation tools to check numbered to named NICs allocations. See Miscellaneous.
Mapping Controller VM Interfaces to Host Bridges
The control host VM interface connections to the bridges configured in Controller Host Network Configuration is done in the control_hosts: hierarchy of the control-host-nodes.yml file.
A sample configuration snippet:
control_hosts: vm_interfaces: - interface: eth0 bridge: br-eno1 - interface: eth1 bridge: br-eno2 - interface: eth2 bridge: br-bond0 - interface: eth3 bridge: br-bond1
The first interface—eth0—must connect to the bridge for the provision network to allow the VM to PXE boot. The other interface names must be sequential, which matches the sample configuration snippet. You should configure one interface for each bridge.
Appformix VM Configuration
AppFormix is provided with the Contrail Cloud bundle. AppFormix provides monitoring and troubleshooting for the networking and server infrastructure of Contrail Cloud. Appformix provides the same services for the workloads running in Contrail Cloud. For additional information on Appformix, see the Appformix TechLibrary page.
Appformix provides a WebUI as well as a REST API. The WebUI and the REST API are exposed to the Internal API and External networks.
The recommended Appformix deployment—which is also the default deployment—is deployed in a 3-node configuration for high availability.
Appformix node configuration for Contrail Cloud is defined in the site.yml file. A sample configuration snippet:
appformix: # Set to true if you have multiple control hosts which allows Appformix to run in HA mode enable_ha: true # Floating virtual IP for the Appformix APIs on the external network, used and required by HA mode. vip: "192.168.176.101" secondary_vip: "172.16.0.176" keepalived: # Set which interface will be used for vrrp vrrp_interface: "vlan305" # vrrp interface for secondary vip secondary_vrrp_interface: "vlan100"
The network configuration of Appformix is defined in the overcloud-nics.yml file.
A sample configuration snippet:
AppformixController_network_config: - type: interface name: nic1 dns_servers: get_param: DnsServers use_dhcp: false mtu: get_param: ControlPlaneNetworkMtu addresses: - ip_netmask: list_join: - '/' - - get_param: ControlPlaneIp - get_param: ControlPlaneSubnetCidr routes: - ip_netmask: 169.254.169.254/32 next_hop: get_param: EC2MetadataIp - next_hop: get_param: ControlPlaneDefaultRoute - type: vlan device: nic2 vlan_id: get_param: InternalApiNetworkVlanID mtu: get_param: InternalApiNetworkMtu addresses: - ip_netmask: get_param: InternalApiIpSubnet - type: interface name: nic2 mtu: get_param: ExternalNetworkMtu addresses: - ip_netmask: get_param: ExternalIpSubnet routes: - default: True next_hop: get_param: ExternalInterfaceDefaultRoute
The following resources are automatically monitored when AppFormix is installed in Contrail Cloud.
Openstack API endpoints and processes
Contrail API endpoint and processes
Openstack MySQL
Rabbit cluster status
Compute nodes including vRouter, Nova compute, operating system health and metrics
Appformix is not automatically configured to monitor the physical networking infrastructure, but adapters (without any configuration) for monitoring network devices can be installed during a Contrail Cloud deployment. AppFormix must be configured manually to monitor network devices after deployment. See the Network Devices section of the Appformix User Guide.
Network-related adapters can be installed for Appformix during a Contrail Cloud deployment in the site.yml file.
A sample configuration snippet:
appformix: network_device_monitoring: appformix_install_snmp_dependencies: true appformix_install_jti_dependencies: true network_device_discovery_enabled: true appformix_install_ipmi_dependencies: true
Custom Appformix plugins can also be installed during a Contrail Cloud deploying in the site.yml file.
A sample configuration snippet:
appformix: enable_copy_user_defined_plugins: true user_defined_plugins_config: | - { plugin_info: 'user_defined_plugins/plugin_1.json', plugin_file: 'user_defined_plugins/check_1.py'} - { plugin_info: 'user_defined_plugins/plugin_2.json', plugin_file: 'user_defined_plugins/check_2.py'} - { plugin_info: 'user_defined_plugins/plugin_3.json', plugin_file: 'user_defined_plugins/check_3.py'} - { plugin_info: 'user_defined_plugins/plugin_4.json', plugin_file: 'user_defined_plugins/check_4.py'}
For more information, see Extensibility Using Plug-Ins in the Appformix User Guide.
Contrail Service Node (CSN) - Optional
The Contrail Service Node (CSN) provides DHCP, ARP, and multicast services when Contrail is managing the full lifecycle of bare-metal servers, including provisioning the OS. CSNs are not needed in Contrail Cloud deployments that aren’t using bare-metal servers.
The triple-O templates for Contrail Cloud 13 release continue to use the Red Hat term ToR Service Node(TSN) to refer to the CSN function in Contrail Cloud. This reference architecture, therefore, uses both terms.
To enable CSN support in Contrail Cloud, edit the compute_hosts: hierarchy in the site.yml file.
A sample configuration snippet:
# to enable tsn support you need to set compute_hosts: tsn: enabled: true control_hosts: vm: contrail-tsn: hv: - control-host2 - control-host3
TSN VMs are created on all controller hosts by default. We recommend running TSNs on two control hosts in Contrail Cloud environments that include at least one bare-metal server (BMS). TSN VMs should not be run in Contrail Cloud environments that are not using a BMS. You can change the number of TSN instances in the control_hosts: hierarchy in the site.yml file.
Compute Node Configuration
Compute nodes in Contrail Cloud are installed in racks and connected to a pair of top-of-rack (ToR) switches and a management switch. The ToR switches are the leaf nodes in the EVPN-VXLAN IP Fabric.
The networking for compute nodes is configured to place compute nodes in racks that compose separate Layer 3 subnets. Each rack is it’s own separate layer 3 for the networks that are used by tenant workloads. Figure 5 illustrates this compute node networking structure.
Figure 6 illustrates how the physical and logical interfaces of a compute node connect it to a management switch and the IP Fabric leaf switches.
Figure 7 illustrates networks and port assignments to bridges on a compute node.
Table 5 shows which files are used for configuring each network connection on a compute node.
Port/NIC | Configuration |
IPMI |
Address is entered an input to the inventory.yml file. |
br-eno1 |
1 x 1G/10G NIC - untagged interface (e.g. built-in copper interface) with addresses generated automatically from the provisioning network pool. |
br-eno2 |
1 x 1G/10G NIC - Optional Management network, untagged interface with address defined in the overcloud: network: hierarchy in the site.yml file. |
br-bond0 |
2 x 10G/25G NICs made of first ports from both NICs. Untagged interface with Tenant network. Bond physical allocation is defined in the overcloud-nics.yml file and addressing is set in the overcloud: network: hierarchy within the site.yml file. |
br-bond1 |
2 x 10G/25G NICs made of second ports from both NICs. Tagged interface with Internal-API and Storage networks for the leaf. Bond physical allocation is defined in the overcloud-nics.yml file and addressing is set in the overcloud: network: hierarchy within the site.yml file. |
Provisioning and optional Management networks are connected via out-of-band management switches and must be configured as a Layer 2 stretch across the management switches.
Compute Node Networking
The data plane interfaces of compute nodes can be configured to support the following forwarding methods.
Kernel-mode |
The vRouter forwarding function is performed in the Linux kernel by replacing the default Linux bridge code with Contrail Networking code. |
DPDK |
vRouter runs in a user space in a specified number of cores. |
SR-IOV |
VM or container interface connects directly to the NIC, bypassing the vRouter. |
Your traffic forwarding method choice depends on the traffic profile expectations for each individual compute node. A Contrail Cloud environment can have different compute nodes configured with different interface types, and workloads can be placed on the most appropriate compute node using various technologies, such as OpenStack availability zones.
The IP address of the management interface and the addressing and configuration of bond interfaces are configured in the overcloud-nics.yml file.
Kernel-Mode vRouter Configuration
The vRouter vhost0 interface is connected into the br-bond0 bridge in compute nodes which run the vRouter in kernel-mode. The br-bond0 bridge is connected via a bond interface to the Tenant network.
Figure 8 illustrates these connections.
Compute Resources
The following guidelines should be following to optimize vRouter performance.
Guarantee minimum 4 cores for host operating system and minimum 2 of them can be used by vRouter kernel module. There is no mechanism to allocate cores but we assuming that Host OS processes will consume no more than 2 cores and remaining 2 a kernel scheduler will allocate for vRouter.
Guarantee minimum 8GB RAM for host operating system where 4GB will be used for vRouter.
vRouters running in kernel mode should achieve up to 500kpps per vRouter (<1000 flows in a table) when these guidelines are followed.
Bond Interface Configuration
The following options should be set for bond interfaces when kernel mode is used:
bond_mode: 4 (IEEE 802.3ad)
bond_policy: layer3+4
The bond configuration is defined in profile definitions in the overcloud-nics.yml file:
ComputeKernel0Hw0_network_config: [...] #name, DHCP, MTU etc settings - type: linux_bond name: bond0 use_dhcp: false bonding_options: "mode=802.3ad xmit_hash_policy=layer3+4 lacp_rate=fast updelay=1000 miimon=100"
Optimizations
We recommend increasing the maximum number of flows per vRouter from the default value of 500,000 to 2 million flows to increase performance and scaling. We also recommend allocating 4 threads for flow processing.
These configuration parameters are specified in the site.yml file. A configuration snippet:
overcloud: extra_config: ContrailVrouterModuleOptions: "vr_flow_entries=2000000" contrail: vrouter: contrail_settings: default: VROUTER_AGENT__FLOWS__thread_count: "4"
Adding Kernel-Mode Compute Nodes
A compute node operates in kernel mode when it is added into the compute_nodes_kernel: hierarchy in the compute-nodes.yml file.
A sample configuration snippet:
compute_nodes_kernel: - name: compute1 leaf: '0' profile: hw0 - name: compute2 leaf: '1' profile: hw1
Complete Leaf/Profile Configuration Snippet
This section provides a full configuration network snippet from the overcloud-nics.yml file for a compute node.
Profiles for network configuration in the overcloud-nics.yml file are written in this format:
[role][leaf number][hardware profile tag]_network_config
where:
role is one of the following options:
ComputeKernel - for vRouter in kernel mode
ComputeDpdk
ComputeSriov
CephStorage
leaf number - number or name of the leaf device. For instance, ”0” or “leaf0”
hardware profile tag - any name to define a profile tag. We strongly recommend using the Hw[number] format for your hardware profile tags.
Examples:
Compute node DPDK in leaf “0” with hardware profile “hw1”
ComputeDpdk0hw1_network_config
Compute SR-IOV in leaf “0” with hardware profile “hw1”
ComputeSriov0hw1_network_config
Compute kernel mode in leaf “0” with hardware profile “hw1”
ComputeKernel0hw1_network_network_config
The sample compute-nodes.yml file:
#[role][leaf number][hardware profile tag]_network_config # e.g. ComputeDpdk0hw1_network_config # Provisioning interface definition - type: interface name: nic1 # br-eno1 dns_servers: get_param: DnsServers use_dhcp: false mtu: get_param: ControlPlaneNetworkMtu addresses: - ip_netmask: list_join: - '/' - - get_param: ControlPlaneIp - get_param: ControlPlaneSubnetCidr routes: - ip_netmask: 169.254.169.254/32 next_hop: get_param: EC2MetadataIp - default: True #Default route via provisioning, e.g. to access Satellite next_hop: get_param: ControlPlaneDefaultRoute # Management interface definition - type: interface name: nic2 # br-eno2 mtu: get_param: ManagementNetworkMtu addresses: - ip_netmask: get_param: ManagementIpSubnet routes: - ip_netmask: 10.0.0.0/8 #Address pool of corporate network that has access to #servers via management network next_hop: 192.168.0.1 # br-bond0 interface definition (for vRouter overlay) - type: linux_bond name: bond0 # br-bond0 use_dhcp: false bonding_options: "mode=802.3ad xmit_hash_policy=layer3+4 lacp_rate=fast updelay=1000 miimon=100" members: - type: interface name: nic3 mtu: get_param: Tenant0NetworkMtu primary: true - type: interface name: nic4 mtu: get_param: Tenant0NetworkMtu - type: vlan vlan_id: get_param: Tenant0NetworkVlanID device: bond0 - type: contrail_vrouter name: vhost0 use_dhcp: false members: - type: interface name: str_replace: template: vlanVLANID params: VLANID: {get_param: Tenant0NetworkVlanID} use_dhcp: false addresses: - ip_netmask: get_param: Tenant0IpSubnet mtu: get_param: Tenant0NetworkMtu routes: - ip_netmask: get_param: TenantSupernet next_hop: get_param: Tenant0InterfaceDefaultRoute # br-bond1 interface definition (for Storage and Internal API networks) - type: linux_bond name: bond1 # br-bond1 use_dhcp: false bonding_options: "mode=802.3ad xmit_hash_policy=layer3+4 lacp_rate=fast updelay=1000 miimon=100" members: - type: interface name: nic5 primary: true - type: interface name: nic6 - type: vlan device: bond1 vlan_id: get_param: Storage0NetworkVlanID mtu: get_param: Storage0NetworkMtu addresses: - ip_netmask: get_param: Storage0IpSubnet routes: - ip_netmask: get_param: StorageSupernet next_hop: get_param: Storage0InterfaceDefaultRoute - type: vlan device: bond1 vlan_id: get_param: InternalApi0NetworkVlanID mtu: get_param: InternalApi0NetworkMtu addresses: - ip_netmask: get_param: InternalApi0IpSubnet routes: - ip_netmask: get_param: InternalApiSupernet next_hop: get_param: InternalApi0InterfaceDefaultRoute
MTU settings are inherited from network settings in the site.yml file. The name is in “heat” notation (camel case) with added MTU value at the end.
Use validation tools to check numbered to named NICs allocation. See Miscellaneous.
In the example above, the default route is configured using the management network, assuming it provides access for administrators to external resources. If the management network is not present, the default route should be via the provision network, for which SNAT to the Intranet is configured on the jumphost. In this scenario, the default: true statement should be in the provision: section of the compute-nodes.yml file. It is also possible to put the default route on the tenant network, which is useful if SNAT is used in tenant networks.
DPDK-mode vRouter Configuration
A vRouter in DPDK mode runs in a user space on the compute node. Network traffic is handled by a special DPDK dedicated interface or interfaces that handle VLANs and bonds. A specified number of cores is assigned to perform the vRouter forwarding function.
Figure 9 illustrates a vRouter in DPDK mode.
DPDK vRouters provide higher throughput than kernel vRouters. See Configuring the Data Plane Development Kit (DPDK) Integrated with Contrail vRouter for additional information on DPDK in Contrail networking.
- DPDK Bond Interface Configuration
- Performance Tuning for DPDK
- Enabling CPU performance mode
- Configuring Flow Threads and Huge Pages
DPDK Bond Interface Configuration
LACP bonding is under DPDK control when a vRouter is in DPDK mode. There is no Linux bond.
LACP bonding is defined in the overcloud-nics.yml file and configured using these options:
bond_mode: 4 (IEEE 802.3ad)
bond_policy: layer3+4
A sample overcloud-nics.yml file for the bond configuration:
ComputeDpdk0hw2_network_config: [...] #name, DHCP, MTU etc settings - type: contrail_vrouter_dpdk name: vhost0 driver: vfio-pci bond_mode: 4 bond_policy: layer3+4
The LACP rate with DPDK is taken during the LACP negotiation process between switches acting as LACP partners and applied to what is configured on the switches. Most Juniper switches are configured in fast LACP mode by default, and the vRouter applies that setting.
If you want to force LACP into fast LACP mode, set the LACP_RATE: field to 1 in the site.yml file:
contrail:
vrouter:
contrail_settings:
default:
LACP_RATE: 1
Performance Tuning for DPDK
To maximize throughput for a vRouter in DPDK mode, set the following parameters:
Allocate 4 threads for flow processing in the vRouter agent.
Allocate huge pages.
Allocate CPUs for DPDK, Host OS, and Nova.
Increase the flow table size to 2 million from the 500,000 default setting.
Increase the buffer sizes to 2048 to reduce packet drops from microbursts.
Double the vrouter memory pool size to 131072.
Set the CPU scaling governor into performance mode.
Enabling CPU performance mode
CPU frequency scaling enables the operating system to scale the CPU frequency to save power. CPU frequencies can be scaled automatically depending on the system load, in response to ACPI events, or manually by user space programs. To run the CPU at the maximum frequency, set the scaling_governor parameter to performance.
In the BIOS all power saving options should be disabled, including power performance tuning, CPU P-State, CPU C3 Report and CPU C6 Report. Select Performance as the CPU Power and Performance policy.
The configuration can be defined in the site.yml file as an extra_action (post deployment action) parameter.
A sample configuration snippet from the site.yml file:
post_deployment: shell: CPUperf: | if [[ "$role" =~ "ComputeDpdk" ]]; then sudo cat > /usr/bin/after_reboot.sh <<'CPUPERF_EOF' #!/bin/bash for f in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > ${f} ; done CPUPERF_EOF chmod +x /usr/bin/after_reboot.sh echo -e "$(sudo crontab -u root -l)\n#contrail cloud: setting performance\n@reboot /usr/bin/after_reboot.sh" | sudo crontab -u root - fi
Configuring Flow Threads and Huge Pages
Memory that does not need to be used by the operating system should be segmented into huge pages to maximize efficiency. For instance, suppose a server with 256GB RAM needs 8GB for the OS (including the vRouter) and therefore has 248GB remaining for other functions. This server should allocate the remaining memory into a 1GB huge page to maximize memory usage.
Additionally, some 2MB huge pages should be configured to maximize memory usage for the vRouter.
The number of threads used for flow processing and huge page allocations are configured in the site.yml file.
A sample configuration snippet from the site.yml file:
overcloud: contrail: vrouter: contrail_settings: default: VROUTER_AGENT__FLOWS__thread_count: “4” dpdk: driver: vfio-pci #must be for Intel Fortville NICs huge_pages: two_mb: 9196 one_gb: 224 # depends on how many VMs (amount of memory) is needed including 2G for vRouter
CPU Allocation
CPU partitioning must be properly defined and configured to optimize performance. CPU partitioning issues can cause transient packet drops in moderate and high throughput environments.
Consider the following parameters when planning physical CPU core assignments:
Numa topology
Usage of hyperthreading (HT)
Number of cores assigned for vrouter DPDK
Number of cores allocated to VMs
Number of cores left for system processes (include vRouter agent)
The following core mapping illustrates a NUMA topology for a CPU with 2 NUMAs. Each NUMA has 18 physical cores and supports hyper-threading.
NUMA node0: Physical cores: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 Hyperthreading cores: 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 NUMA node1: Physical cores: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 Hyperthreading cores: 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Output key:
Blue: allocated to DPDK
Red: allocated to Nova for VMs
Green: should not be allocated.
Black: remainder used for operating system
Cores 0 and 1 with their corresponding HT siblings must not be allocated for either DPDK or Nova.
Six cores, without hyperthreading, are allocated for vRouter and are shown in blue in the example snippet. Our test results found that six cores is the maximum number that should be allocated in this environment, since multi-queue virtio does not handle larger core numbers effectively. The number of cores delivering maximum throughput may vary with different hardware specifications. Use of hyperthreading for DPDK has been shown to cause reduced throughput.
If more cores are used for DPDK, then the vr_mempool__sz parameter should be modified from the value suggested below, according to the formula:
vr_mempool_sz = 2 * (dpdk_rxd_sz + dpdk_txd_sz) * (num_cores) * (num_ports)
where:
num_cores is the number of cores allocated for DPDK (including HT siblings, which are not recommended)
num_ports is the number of physical ports in the DPDK bond interface
dpdk_rxd_sz is the number of receive buffer descriptors
dpdk_txd_sz is the number of transmit buffer descriptors
The default values for dpdk_txd_sz and dpdk_rxd_sz are set at 128 descriptors, but Intel recommends setting these values to 2048 descriptors to handle microbursts. We recommend not exceeding the 2048 descriptors setting as larger settings can cause unexpected latencies.
The cores that are used for virtual machine workloads are set using Nova CPU pinning, shown in red in the example snippet.
The operating system is allocated 4 physical cores with hyper-threading in this setup, as shown in black in the example snippet. The first cores on each NUMA must be allocated for the operating system. These operating system core reservations are not explicitly set by the user; the reservations are implicitly specified when the NUMAs are not allocated for DPDK or Nova.
The DPDK cores are all on NUMA0 in this architecture, which is where the corresponding NICs are located. It is good practice to place as many OS cores on NUMA1 as possible to create an environment where a higher proportion of VM workloads run on NUMA0 where network performance is maximized.
DPDK core allocations are defined in the overcloud-nics.yml file using hardware profiles.
A sample configuration snippet from the overcloud-nics.yml file:
ComputeDpdk0hw2_network_config: [...] #name, DHCP, MTU etc settings - type: contrail_vrouter_dpdk name: vhost0 [...] #members, address, driver etc cpu_list: "2,4,6,8,10,12"
DPDK parameters—including the maximum number of flows and buffer sizes as well as parameters related to Nova pinning and general Nova functions—are defined in the extra-config: hierarchy in the site.yml file.
A sample configuration snippet from the site.yml file:
overcloud: extra_config: ComputeDpdkOptions: "--vr_flow_entries=2000000 --vr_mempool_sz 98304 --dpdk_txd_sz 2048 --dpdk_rxd_sz 2048" ComputeDpdkParameters: TunedProfileName: "cpu-partitioning" IsolCpusList: "2,4,6-35,38,40,42-45,47-71" NovaVcpuPinSet: [‘7’,’9’,’11’,’13-35’,’43’,’45’,’47-71’] NovaSchedulerDefaultFilters: - RetryFilter - AvailabilityZoneFilter - RamFilter - DiskFilter - ComputeFilter - ComputeCapabilitiesFilter - ImagePropertiesFilter - ServerGroupAntiAffinityFilter - ServerGroupAffinityFilter - AggregateInstanceExtraSpecsFilter - NUMATopologyFilter NovaComputeExtraConfig: nova::cpu_allocation_ratio: 1.0 nova::ram_allocation_ratio: 1.0 nova::disk_allocation_ratio: 1.0 ControllerExtraConfig: nova::config::nova_config: filter_scheduler/build_failure_weight_multiplier: value: 100.0
The tuning profile cpu-partitioning causes the cpu_affinity value in the tuned.conf file to be set to the set of cores that are not in the IsolCpusList.
Adding DPDK Compute Nodes
Compute nodes that run in DPDK mode are identified in the compute_nodes_dpdk hierarchy of the compute-nodes.yml file.
A sample configuration snippet from the compute-nodes.yml file:
compute_nodes_dpdk: - name: ComputeDpdk1 leaf: '0' profile: hw2 - name: ComputeDpdk2 leaf: '1' profile: hw3
SR-IOV Mode Compute Nodes
A compute node in SR-IOV mode provides direct access from the NIC to a VM. Because network traffic bypasses the vRouter in SR-IOV mode, no network policy or flow management is performed for traffic. See Configuring Single Root I/O Virtualization (SR-IOV) for additional information on SR-IOV in Contrail networking.
Figure 10 illustrates the VM connections in computes nodes using SR-IOV mode.
In SR-IOV terminology, a NIC is represented by Physical Function (PF) and the VMs, which are viewed as virtual versions of the NIC, are referred to as Virtual Functions (VFs). A VM with multiple interfaces can connect using overlay networks on some interfaces and SR-IOV on other interfaces.
VMs can use the overlay network or go directly to an underlay network through VFs using SR-IOV. This configuration is performed in the compute_nodes_sriov: hierarchy within the compute-nodes.yml file.
A configuration snippet from the compute-nodes.yml file.
compute_nodes_sriov: - name: ComputeSriov1 leaf: '0' profile: hw4 - name: ComputeSriov2 leaf: '1' profile: hw5
SR-IOV must be enabled in BIOS.
The two types of vRouter deployments for SR-IOV:
vRouter in Kernel mode on top of SR-IOV
vRouter in DPDK mode on top of SR-IOV
Figure 11 illustrates the vrouter in both modes.
When SR-IOV is enabled on compute nodes, each vRouter is attached to a bond interface configured from the SR-IOV Physical Functions (PF). A VM interface that is in a network that has SR-IOV enabled is connected to a NIC Virtual Function (VF) and exposed to the fabric underlay network by Nova PCI passthrough.
You can select SR-IOV-mode kernel or dpdk mode in the site.yml file.
A configuration snippet from the site.yml file:
compute_hosts: sriov: enabled: true mode: kernel|dpdk #Sriov NumVFs separated by comma num_vf: - "ens7f1:7" - "ens7f2:7" #NovaPCIPassthrough settings pci_passthrough: - devname: "ens7f1" physical_network: "sriov1" - devname: "ens7f2" physical_network: "sriov2"
Interface names must be used in the sriov: hierarchy in the site.yml file. See Interface naming conventions for server naming details.
The bond needs to be configured in the overcloud-nics.yml file.
A sample configuration snippet from the overcloud-nic.yml file:
[...] - type: linux_bond name: bond0 # br-bond0 use_dhcp: false bonding_options: "mode=802.3ad xmit_hash_policy=layer3+4 lacp_rate=fast updelay=1000 miimon=100" members: - type: interface name: nic3 #ens7f1 mtu: get_param: Tenant0NetworkMtu primary: true - type: interface name: nic4 #ens7f1 mtu: get_param: Tenant0NetworkMtu - type: vlan vlan_id: get_param: Tenant0NetworkVlanID device: bond0 - type: contrail_vrouter name: vhost0 use_dhcp: false members: - type: interface name: str_replace: template: vlanVLANID params: VLANID: {get_param: Tenant0NetworkVlanID} use_dhcp: false addresses: - ip_netmask: get_param: Tenant0IpSubnet mtu: get_param: Tenant0NetworkMtu routes: - ip_netmask: get_param: TenantSupernet next_hop: get_param: Tenant0InterfaceDefaultRoute
In the compute_hosts: section of the site.yml file, SR-IOV must be enabled and the mode must be set to kernel or dpdk.
In the num_vf section of the site.yml file, set the number of virtual functions (VFs) that are allocated for NIC interfaces by the operating system. In this sample configuration snippet, 7 VFs are allocated. VF0 from ens7f1 and ens7ens2 are allocated for Nova PCI passthrough - provider networks with names: sriov1 and sriov2. The interface names are those seen when logging into a server using IPMI. The network names are arbitrary and only used inside the SR-IOV configuration.
The following snippet displays the ip link command output from a host when SR-IOV is enabled.
ip link [...] 3: ens7f0: (BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP) mtu 9100 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:b7:2d:6a brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off 5: ens7f1: (BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP) mtu 9100 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:b7:2d:6a brd ff:ff:ff:ff:ff:ff vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 5 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off vf 6 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off, query_rss off
Storage Nodes (Ceph OSD)
The storage nodes run Ceph software in Contrail Cloud. For additional information on using Red Hat Ceph storage software, see Product Documentation for Red Hat Ceph Storage.
We recommend following these guidelines to optimize Ceph software performance in your Contrail Cloud deployment:
Storage nodes must be separate from compute nodes in a Contrail Cloud environment. Hyperconverged nodes are not supported.
Ceph requires a minimum of three storage nodes to operate.
If storage other than Ceph configured through Contrail Cloud is needed for your deployment, your deployment design might require additional support and engagement to ensure contractual support. Send an email to mailto:sre@juniper.net before moving forward with non-Ceph storage providers to ensure your deployment remains in compliance with your support contract.
Ceph Configuration
Ceph storage configuration is defined in the site.yml file.
Disk mapping for Ceph Storage nodes need to be defined on overcloud: disk_mapping: hierarchy for each storage node. The recommended disk settings are 4 OSD disks + 1 SSD journal disk. This configuration can be scaled appropriately; for instance, 8 OSD disks + 2 SSD journal disks for a larger environment. CPU, RAM, and storage traffic needs to be considered when using multiple 4 OSDs+1 journal bundles.
The mapping section of the site.yml file allows disks to be referenced by name rather than by device id.
A sample site.yml configuration file snippet to configure disk mappings:
overcloud: # Contains a list of label to disk mappings for roles disk_mapping: CephStorage0: # Mapping of labels to disk devices. The label is assigned to the disk # device so that the disk can be referenced by the alias in other # configurations. for example /dev/disk/by-alias/(label) # Each list element contains: # label: label to assign # hctl: disk device path H:C:T:L. see lsscsi - label: osd-0 hctl: '4:0:0:0' - label: osd-1 hctl: '5:0:0:0' - label: osd-2 hctl: '6:0:0:0' - label: osd-3 hctl: '7:0:0:0' - label: osd-4 hctl: '8:0:0:0' - label: journal-0 hctl: '9:0:0:0'
This sample site.yml configuration file snippet configures Ceph storage for a 4 OSD disks + 1 SSD journal bundle:
ceph: # Choice to enable Ceph storage in the overcloud. # "true" means that Ceph will be deployed as the backend for Cinder and Glance services. # "false" false means that Ceph will not be deployed. enabled: true # Ceph OSD disk configuration osd: # Update the Ceph crush map when OSDs are started crush_update_on_start: true # Size for OSD journal files. journal_size: 2048 # Ceph OSD disk assignments. The named disks will be exclusively used by Ceph for persistence. # For each disk, a "journal" can be configured. journals can be shared between OSDs. disk: default: '/dev/sdb': journal: '/dev/disk/by-alias/journal-0' '/dev/sdc': journal: '/dev/disk/by-alias/journal-0' '/dev/sdd': journal: '/dev/disk/by-alias/journal-0' '/dev/sde': journal: '/dev/disk/by-alias/journal-0' '/dev/sdf': journal: '/dev/disk/by-alias/journal-0' CephStorageHw8: '/dev/sdb': journal: '/dev/disk/by-alias/journal-0' '/dev/sdc': journal: '/dev/disk/by-alias/journal-0' '/dev/sdd': journal: '/dev/disk/by-alias/journal-0' '/dev/sdf': journal: '/dev/disk/by-alias/journal-0' CephStorageHw7: '/dev/sdb': journal: '/dev/disk/by-alias/journal-0' '/dev/sdc': journal: '/dev/disk/by-alias/journal-0' '/dev/sdd': journal: '/dev/disk/by-alias/journal-0'
The disk layouts, notably, are per hardware profile.
Ceph service configuration hides the complexity of computing placement groups, by default. This configuration can be manually provided in the site.yml file if required, such as in cases where the environment has a small number of OSDs.
Storage Node Network Configuration
In this reference architecture, separate bonds are used on storage nodes for the Storage Mgmt (replication) and Storage (user access) networks.
Figure 12 illustrates these connections.
Figure 13 illustrates how the physical and logical interfaces of a storage node connect to a management switch and leaf switches.
Figure 14 illustrates how the physical interfaces of a storage host connect to the bridges configured on it.
Table 7 shows which files are used for configuring each network connection on a storage node.
The IP address of the management interface and addressing and configuration of bond interfaces on a storage node are configured according to the configuration files that are used by the Contrail Cloud provisioning system. The ports on each storage host are configured as follows.
Port/NIC | Configuration |
IPMI |
Address is entered as an input to the inventory.yml file. |
br-eno1 |
1 x 1G/10G NIC - untagged interface (e.g. built-in copper interface) with addresses generated automatically from the provisioning network pool. |
br-eno2 |
1 x 1G/10G NIC - Optionally Management network, untagged interface with address defined in the overcloud: network: section of the site.yml file. |
br-bond0 |
2 x 10G/25G NICs made of first ports from both NICs. Untagged interface with Storage network. Bonds physical allocation defined in the overcloud-nics.yml file. Addressing is set in the overcloud: network: section of the site.yml file. |
br-bond1 |
2 x 10G/25G NICs made of second ports from both NICs. Untagged interface with Storage Mgmt network. Bonds physical allocation is defined in the overcloud-nics.yml file. Addressing is set in the overcloud: network: section of the site.yml file. |
Linux bond configuration is defined per storage host in the overcloud-nics.yml file,
A sample configuration snippet from the overcloud-nics.yml file:
CephStorage0Hw1_network_config: #0 means leaf number [...] #name, DHCP, MTU etc settings - type: ovs_bond name: bond0 use_dhcp: false bonding_options: "mode=802.3ad xmit_hash_policy=layer3+4 lacp_rate=fast updelay=1000 miimon=100"
Where:
NIC-0 - onboard copper NIC 1/10G
NIC-1 and NIC-2 - 2 ports 10G/25G/40G Intel Fortville family NICs in PCI slot connected to NUMA0