Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Announcement: Try the Ask AI chatbot for answers to your technical questions about Juniper products and solutions.

close
header-navigation
keyboard_arrow_up
close
keyboard_arrow_left
list Table of Contents
file_download PDF
{ "lLangCode": "en", "lName": "English", "lCountryCode": "us", "transcode": "en_US" }
English
keyboard_arrow_right

Pod Scheduling

Release: CN2 23.3
{}
Change Release
date_range 20-Oct-23

SUMMARY Juniper Cloud-Native Contrail Networking (CN2) release 23.1 supports network-aware pod scheduling using contrail-scheduler. This feature enhances the Kubernetes pod scheduler with plugins that analyze the network metrics of a node before scheduling pods. This article provides overview, implementation, and deployment information about network-aware pod scheduling.

Pod Scheduling in Kubernetes

In Kubernetes, scheduling refers to the process of matching pods to nodes so that the kubelet is able to run them. A scheduler monitors requests for pod creation and attempts to assign these pods to suitable nodes using a series of extension points during a scheduling and binding cycle. Potential nodes are filtered based on attributes like the resource requirements of a pod. If a node doesn't have the available resources for a pod, that node is filtered out. If more than one node passes the filtering phase, Kubernetes scores and ranks the remaining nodes based on their suitability for a given pod. The scheduler assigns a pod to the node with the highest ranking. If two nodes have the same score, the scheduler picks a node at random.

Pod Scheduling in CN2

CN2 release 22.4 enhanced the default Kubernetes pod scheduler to schedule pods based on the Virtual Machine Interface (VMI) considerations of DPDK nodes. This enhanced scheduler, called contrail-scheduler, supports custom plugins that enable the scheduling of pods based on current active VMIs in a DPDK node.

CN2 release 23.1 improves on this feature by supporting two additional plugins. As a result of these plugins, contrail-scheduler schedules pods based on the following network metrics:

  • Number of active ingress/egress traffic flows

  • Bandwidth utilization

  • Number of virtual machine interfaces (VMIs)

Network-Aware Pod Scheduling Overview

Many high-performance applications have bandwidth or network interface requirements as well as the typical CPU or VMI requirements. If contrail-scheduler assigns a pod to a node with low bandwidth availability, that application cannot run optimally. CN2 release 23.1 addresses this issue with the introduction of a metrics collector, a central collector, and custom scheduler plugins. These components collect, store, and process network metrics so that the contrail-scheduler schedules pods based on these metrics.

Network-Aware Pod Scheduling Components

The following main components comprise CN2's network-aware pod scheduling solution:

  • Metrics collector: This runs in a container alongside the vRouter pod that runs on each node in the cluster. . The vRouter agent sends metrics data to the metrics collector over localhost: 6700 specified in the agent: default: collectors field of the vRouter CR Deployment. The metrics collector then forwards requested data to configured sinks which are specified in the configuration. The central collector is one of the configured sinks and recieves this data from the metrics collector.

  • Central collector: This component acts as an aggregator and stores data received from all of the nodes in a cluster via the metrics collector. The central collector exposes gRPC endpoints which consumers use to to request this data for nodes in a cluster. For example, the contrail-scheduler uses these gRPC endpoints to retrieve and process network metrics and schedule pods accordingly.

  • Contrail scheduler: This custom scheduler introduces the following three custom plugins:

    • VMICapacity plugin (available from release 22.4 onwards): Implements Filter, Score, and NormalizeScore extension points in the scheduler framework. The contrail-scheduler uses these extension points to determine the best node to assign a pod to based on active VMIs.

    • FlowsCapacity plugin: Determines the best node to schedule a pod based on the number of active flows in a node. Too many traffic flows on a node means more competition for new pod traffic. Pods and nodes with a lower flow count are ranked higher by the scheduler.
    • BandwidthUsage plugin: Determines the best node to assign a pod based on the bandwidth usage of a node. The node with the least bandwidth usage (ingoing and outgoing traffic) per second is ranked highest.

      Note:

      Depending on the configured plugins, each plugin sends out scores to the scheduler. The scheduler takes the weighted scores from from all of the plugins and finds the best node to schedule a pod.

Deploy Network-Aware Pod Scheduling Components

See the following sections for information about deploying the components for network-aware pod scheduling:

Metrics Collector Deployment

Central Collector Deployment

Contrail Scheduler Deployment

Metrics Collector Deployment

CN2 includes the metrics collector in vRouter pod deployments by default. The agent: default: field of the vRouter spec contains a collectors: field which is configured with the metric collector reciever address. The example below shows the value collectors: - localhost: 6700. Since the the metrics collector runs in the same pod as the vRouter agent, it can communicate over the localhost port. Note that port 6700 is fixed as the metrics collector reciever address and cannot be changed. The vRouter agent sends metrics data to this address.

The following is a section of a default vRouter deployment with the collector enabled:

content_copy zoom_out_map
apiVersion: dataplane.juniper.net/v1
kind: Vrouter 
metadata: 
  name: contrail-vrouter-nodes 
  namespace: contrail 
spec: 
  agent: 
    default: 
      collectors: 
      - localhost:6700 
      xmppAuthEnable: true 
    sandesh: 
      introspectSslEnable: true 

Central Collector Deployment

The central collector Deployment object must always have a replica count set to 1. The following Deployment section shows an example:

content_copy zoom_out_map
spec: 
  selector: 
    matchLabels: 
      component: central-collector
  replicas: 1 
  template: 
    metadata: 
      labels: 
        component: central-collector 
A configMap provides key-value configuration data to the pods in your cluster. Create a configMap for the central collector configuration. This configuration is mounted in the container.

The following is an example of a central collector config file:

content_copy zoom_out_map
http_port: 9090 
tls_config:
  key_file: /etc/config/server.key 
  cert_file: /etc/config/server.crt 
  ca_file: /etc/config/ca.crt 
service_name: central-collector.contrail 
metric_configmap: 
  name: mc_configmap 
  namespace: contrail 
  key: config.yaml 

This config file contains the following fields:

  • http_port: Specifies the port that the central collector gRPC service runs on.

  • tls_config: Specifies what server_name and key_file the central collector service is associated with. This field contains upstream (northbound API) server information.

  • service_name: Specifies the name of the service the central collector exposes. In this case, central-collector.contrail is exposed as a service on top of the central collector Deployment. Consumers within the cluster can interact with the central collector using this service name.

  • metric_configmap: The fields in this section designate the details of the metrics collector configMap. Central collector uses this information to configure a metrics-collector sink with the required metrics the sink wants to receive. The following is a sample command to create a configMap:

    content_copy zoom_out_map
    kubectl create cm -n contrail central-collector-config –from-file=config.yaml=<path-to-config-file>

The following is an example of a central collector Deployment:

content_copy zoom_out_map
apiVersion: apps/v1 
kind: Deployment 
metadata: 
  name: central-collector 
  namespace: contrail 
  labels: 
    app: central-collector 
spec: 
  replicas: 1 
  selector: 
    matchLabels: 
      app: central-collector 
  template: 
    metadata: 
      labels: 
        app: central-collector 
    spec: 
      securityContext: 
        fsGroup: 2000 
        runAsGroup: 3000 
        runAsNonRoot: true 
        runAsUser: 1000 
      containers: 
      - name: contrail-scheduler 
        image: enterprise-hub.juniper.net/contrail-container-prod/central-collector:latest 
        command: 
        - /central-collector 
        - --kubeconfig=/tmp/config/kubeconfig 
        - --config=/etc/central-collector/config.yaml 
        imagePullPolicy: Always 
        volumeMounts: 
        - mountPath: /tmp/config 
          name: kubeconfig 
          readOnly: true 
        - mountPath: /etc/central-collector 
          name: central-collector-config 
          readOnly: true 
        - mountPath: /etc/config/tls 
          name: tls 
          readOnly: true 
      volumes: 
      - name: kubeconfig 
        secret: 
          secretName: cc-kubeconfig 
      - name: tls 
        secret: 
          secretName: central-collector-tls 
      - name: central-collector-config 
        configMap: 
          name: central-collector-config 
Note:

Verify the volume and volumeMounts fieds before deploying.

The central collector service is exposed on top of the Deployment object. The following YAML file is an example of a central collector service file:

content_copy zoom_out_map
apiVersion: v1
kind: Service
metadata:
  name: central-collector
  namespace: contrail
spec:
  selector:
    component: central-collector
  ports:
    - name: grpc
      port: <port-as-per-config>
    - name: json
      protocol: TCP
      port: 10000
Note:

The name field must match the service name specified in the central collector configuration. The namespace must match the namespace of the central collector Deployment. For example, namespace: contrail.

Contrail Scheduler Deployment

Perform the following steps to deploy the contrail-scheduler:

  • Create a namespace for the contrail-scheduler.
    content_copy zoom_out_map
    kubectl create ns contrail-scheduler
  • Create a ServiceAccount object (required) and configure the cluster roles for the ServiceAccount. A ServiceAccount assigns a role to a pod or component within a cluster. In this case, the fields kind: ClusterRole and name: system:kube-scheduler grant the contrail-scheduler ServiceAccount the same permissions as the default Kubernetes scheduler (kube-scheduler).

  • Create a configMap for the VMI plugin configuration. You must create the configMap within the same namespace as the contrail-scheduler Deployment.

    content_copy zoom_out_map
    kubectl create configmap vmi-config -n contrail-scheduler --from-file=vmi-config=<path-to-vmi-config>

    The following is an example of a VMI plugin config:

    content_copy zoom_out_map
    nodeLabels:
      "test-agent-mode": "dpdk"
    maxVMICount: 64
    address: "central-collector.contrail:9090"
  • Create a Secret for the kubeconfig file. This file is then mounted in the contrail-scheduler Deployment. Secrets store confidential data as files in a mounted volume or as a container environment variable.

    content_copy zoom_out_map
    kubectl create secret generic kubeconfig -n contrail-scheduler --from-file=kubeconfig=<path-to-kubeconfig-file>
  • Create a configMap for the contrail-scheduler config.

    content_copy zoom_out_map
    kubectl create configmap scheduler-config -n contrail-scheduler --from-file=scheduler-config=<path-to-scheduler-config>

    The following is an example of a scheduler config:

    content_copy zoom_out_map
    apiVersion: kubescheduler.config.k8s.io/v1beta3
    clientConnection:
      acceptContentTypes: ""
      burst: 100
      contentType: application/vnd.kubernetes.protobuf
      kubeconfig: /tmp/config/kubeconfig
      qps: 50
    enableContentionProfiling: true
    enableProfiling: true
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: false
    profiles:
      - schedulerName: contrail-scheduler
      pluginConfig:
      - args:
        apiVersion: kubescheduler.config.k8s.io/v1beta3
        kind: VMICapacityArgs
        config: /tmp/vmi/config.yaml
       name: VMICapacity
          - args:
              apiVersion: kubescheduler.config.k8s.io/v1beta3
              kind: FlowsCapacityArgs
              address: central-collector.contrail:9090
            name: FlowsCapacity
          - args:
              apiVersion: kubescheduler.config.k8s.io/v1beta3
              kind: BandwidthUsageArgs
              address: central-collector.contrail:9090
            name: BandwidthUsage
          plugins:
            multiPoint:
              enabled:
              - name: VMICapacity
                weight: 50
              - name: FlowsCapacity
                weight: 1
              - name: BandwidthUsage
                weight: 20

    Note the following fields:

    • schedulerName: The name of the scheduler you want to deploy.

    • pluginConfig: Contains information about the plugins included in the contrail-scheduler deployment. The deployment includes the following plugins:

      • VMICapacity

      • FlowsCapacity

      • BandwidthUsage

    • config: This field contains the filepath where the VMI plugin config is mounted.

    • multiPoint: You can enable extension points for each of the included plugins. Instead of having to enable specific extension points for a plugin, the multiPoint field let's you enable or disable all of the extension points that are developed for a given plugin. The weights of a plugin decide the priority of a particular score from a plugin. This means that at the end of scoring, all of the plugins send out a weighted score. A pod is scheduled on a node with the highest aggregated score.

  • Create a contrail-scheduler Deployment. The following is an example of a Deployment:

    content_copy zoom_out_map
    apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: contrail-scheduler
          namespace: contrail-scheduler
          labels:
            app: scheduler
        spec:
          replicas: 1
          selector:
            matchLabels:
              app: scheduler
          template:
            metadata:
              labels:
                app: scheduler
            spec:
              serviceAccountName: contrail-scheduler
              securityContext:
                fsGroup: 2000
                runAsGroup: 3000
                runAsNonRoot: true
                runAsUser: 1000
              containers:
              - name: contrail-scheduler
                image: <registry>/contrail-scheduler:<tag>
                command:
                - /contrail-scheduler
                - --authentication-kubeconfig=/tmp/config/kubeconfig
                - --authorization-kubeconfig=/tmp/config/kubeconfig
                - --config=/tmp/scheduler/scheduler-config
                - --secure-port=10271
                imagePullPolicy: Always
                livenessProbe:
                  failureThreshold: 8
                  httpGet:
                    path: /healthz
                    port: 10271
                    scheme: HTTPS
                  initialDelaySeconds: 30
                  periodSeconds: 10
                  timeoutSeconds: 30
                resources:
                  requests:
                    cpu: 100m
                startupProbe:
                  failureThreshold: 24
                  httpGet:
                    path: /healthz
                    port: 10271
                    scheme: HTTPS
                  initialDelaySeconds: 30
                  periodSeconds: 10
                  timeoutSeconds: 30
                volumeMounts:
                - mountPath: /tmp/config
                  name: kubeconfig
                  readOnly: true
                - mountPath: /tmp/scheduler
                  name: scheduler-config
                  readOnly: true
                - mountPath: /tmp/vmi
                  name: vmi-config
                  readOnly: true
              hostPID: false
              volumes:
              - name: kubeconfig
                secret:
                  secretName: kubeconfig
              - name: scheduler-config
                configMap:
                  name: scheduler-config
              - name: vmi-config
                configMap:
                  name: vmi-config

After you apply this Deployment, the new contrail-scheduler is active.

Use the Contrail Scheduler to Deploy Pods

Enter the name of your contrail-scheduler to the schedulerName field to use the contrail-scheduler to schedule (deploy) new pods. The following is an example of a pod manifest with the schedulerName defined:

content_copy zoom_out_map
apiVersion: v1
 kind: Pod
 metadata:
   name: my-app
   labels:
     app: web
 spec:
   schedulerName: contrail-scheduler
   containers:
     - name: app
       image: busybox
       command:
       - sh
       - -c
       - sleep 500
footer-navigation