Table of Contents
- play_arrow Configure Kubernetes and Contrail
- play_arrow Advanced Virtual Networking
- play_arrow Configure DPDK
- play_arrow Configure Services
list Table of Contents
Contrail Networking Alert List
Alert Name | Severity | Description |
VRouterConnectionDown | major | VRouter <name>
<connection_type> connection to
<connection_id> is down. |
VRouterNonFunctional | major | VRouter <name> is non-functional. |
ControllerNonFunctional | major | Controller <name> is non-functional. |
ControllerConnectionDown | major | Controller <name>
<connection_type> connection to
<connection_id> is down. |
ControllerDBConnectionDown | major | Controller <name> connection to database is
down. |
AlertmanagerFailedReload | critical | Reloading an Alertmanager configuration has failed. |
AlertmanagerMembersInconsistent | critical | A member of an Alertmanager cluster has not found all other cluster members. |
AlertmanagerFailedToSendAlerts | warning | An Alertmanager instance failed to send notifications. |
AlertmanagerClusterFailedToSendAlerts | critical | All Alertmanager instances in a cluster failed to send notifications to a critical integration. |
AlertmanagerClusterFailedToSendAlerts | warning | All Alertmanager instances in a cluster failed to send notifications to a non-critical integration. |
AlertmanagerConfigInconsistent | critical | Alertmanager instances within the same cluster have different configurations. |
AlertmanagerClusterDown | critical | Half or more of the Alertmanager instances within the same cluster are down. |
AlertmanagerClusterCrashlooping | critical | Half or more of the Alertmanager instances within the same cluster are crashlooping. |
ConfigReloaderSidecarErrors | warning | config-reloader sidecar has not had a successful
reload for 10m. |
etcdInsufficientMembers | critical | etcd cluster "<name> ":
insufficient members (<value> ). |
etcdNoLeader | critical | etcd cluster "<name> ": member
<instance> has no leader. |
etcdHighNumberOfLeaderChanges | warning | etcd cluster "<name> ":
instance <instance> has seen
<value> leader changes within the last
hour. |
etcdHighNumberOfFailedGRPCRequests | warning | etcd cluster "<name> ":
<value> % of requests for
<grpc_method> failed on etcd instance
<instance> . |
etcdHighNumberOfFailedGRPCRequests | critical | etcd cluster "<name> ":
<value> % of requests for
<grpc_method> failed on etcd instance
<instance> . |
etcdGRPCRequestsSlow | critical | etcd cluster "<name> ": gRPC
requests to <grpc_method> are taking
<value> s on etcd instance
<instance> . |
etcdMemberCommunicationSlow | warning | etcd cluster "<name> ": member
communication with <name> is taking
<value> s on etcd instance
<instance> . |
etcdHighNumberOfFailedProposals | warning | etcd cluster "<name> ":
<value> proposal failures within the last
hour on etcd instance <instance> . |
etcdHighFsyncDurations | warning | etcd cluster "<name> ": 99th
percentile fsync durations are <value> s on etcd
instance <instance> . |
etcdHighCommitDurations | warning | etcd cluster "<name> ": 99th
percentile commit durations <value> s on etcd
instance <instance> . |
etcdHighNumberOfFailedHTTPRequests | warning | <value> % of requests for
<method> failed on etcd instance
<instance> . |
etcdHighNumberOfFailedHTTPRequests | critical | <value> % of requests for
<method> failed on etcd instance
<instance> . |
etcdHTTPRequestsSlow | warning | etcd instance <instance> HTTP
requests to <method> are slow. |
TargetDown | warning | One or more targets are unreachable. |
KubeAPIErrorBudgetBurn | critical | The API server is burning too much error budget. |
KubeAPIErrorBudgetBurn | warning | The API server is burning too much error budget. |
KubeStateMetricsListErrors | critical | kube-state-metrics is experiencing errors in list
operations. |
KubeStateMetricsWatchErrors | critical | kube-state-metrics is experiencing errors in watch
operations. |
KubeStateMetricsShardingMismatch | critical | kube-state-metrics sharding is
misconfigured. |
KubeStateMetricsShardsMissing | critical | kube-state-metrics shards are missing. |
KubePodCrashLooping | warning | Pod is crash looping. |
KubePodNotReady | warning | Pod has been in a non-ready state for more than 15 minutes. |
KubeDeploymentGenerationMismatch | warning | Deployment generation mismatch due to possible roll-back. |
KubeDeploymentReplicasMismatch | warning | Deployment has not matched the expected number of replicas. |
KubeStatefulSetReplicasMismatch | warning | Deployment has not matched the expected number of replicas. |
KubeStatefulSetGenerationMismatch | warning | StatefulSet generation mismatch due to possible
roll-back. |
KubeStatefulSetUpdateNotRolledOut | warning | StatefulSet update has not been rolled out. |
KubeDaemonSetRolloutStuck | warning | DaemonSet rollout is stuck. |
KubeContainerWaiting | warning | Pod container waiting longer than 1 hour. |
KubeDaemonSetNotScheduled | warning | DaemonSet pods are not scheduled. |
KubeDaemonSetMisScheduled | warning | DaemonSet pods are misscheduled. |
KubeJobCompletion | warning | Job did not complete in time. |
KubeJobFailed | warning | Job failed to complete. |
KubeHpaReplicasMismatch | warning | HPA has not matched desired number of replicas. |
KubeHpaMaxedOut | warning | HPA is running at max replicas. |
KubeCPUOvercommit | warning | Cluster has overcommitted CPU resource requests. |
KubeMemoryOvercommit | warning | Cluster has overcommitted CPU resource requests. |
KubeCPUQuotaOvercommit | warning | Cluster has overcommitted CPU resource requests. |
KubeMemoryQuotaOvercommit | warning | Cluster has overcommitted memory resource requests. |
KubeQuotaAlmostFull | info | Namespace quota is going to be full. |
KubeQuotaFullyUsed | info | Namespace quota is fully used. |
KubeQuotaExceeded | warning | Namespace quota has exceeded the limits. |
CPUThrottlingHigh | info | Processes experience elevated CPU throttling. |
KubePersistentVolumeFillingUp | critical | PersistentVolume is filling up. |
KubePersistentVolumeFillingUp | warning | PersistentVolume is filling up. |
KubePersistentVolumeErrors | critical | PersistentVolume is having issues with
provisioning. |
KubeVersionMismatch | warning | Different semantic versions of Kubernetes components running. |
KubeClientErrors | warning | Kubernetes API server client is experiencing errors. |
KubeClientCertificateExpiration | warning | Client certificate is about to expire. |
KubeClientCertificateExpiration | critical | Client certificate is about to expire. |
KubeAggregatedAPIErrors | warning | Kubernetes aggregated API has reported errors. |
KubeAggregatedAPIDown | warning | Kubernetes aggregated API is down. |
KubeAPIDown | critical | Target disappeared from Prometheus target discovery. |
KubeAPITerminatedRequests | warning | The Kubernetes apiserver has terminated
<value> of its incoming requests. |
KubeControllerManagerDown | critical | Target disappeared from Prometheus target discovery. |
KubeProxyDown | critical | Target disappeared from Prometheus target discovery. |
KubeNodeNotReady | warning | Node is not ready. |
KubeNodeUnreachable | warning | Node is unreachable. |
KubeletTooManyPods | info | Kubelet is running at capacity. |
KubeNodeReadinessFlapping | warning | Node readiness status is flapping. |
KubeletPlegDurationHigh | warning | Kubelet Pod Lifecycle Event Generator is taking too long to relist. |
KubeletPodStartUpLatencyHigh | warning | Kubelet Pod startup latency is too high. |
KubeletClientCertificateExpiration | warning | Kubelet client certificate is about to expire. |
KubeletClientCertificateExpiration | critical | Kubelet client certificate is about to expire. |
KubeletServerCertificateExpiration | warning | Kubelet server certificate is about to expire. |
KubeletServerCertificateExpiration | critical | Kubelet server certificate is about to expire. |
KubeletClientCertificateRenewalErrors | warning | Kubelet has failed to renew its client certificate. |
KubeletServerCertificateRenewalErrors | warning | Kubelet has failed to renew its server certificate. |
KubeletDown | critical | Target disappeared from Prometheus target discovery. |
KubeSchedulerDown | critical | Target disappeared from Prometheus target discovery. |
NodeFilesystemSpaceFillingUp | warning | Filesystem is predicted to run out of space within the next 24 hours. |
NodeFilesystemSpaceFillingUp | critical | Filesystem is predicted to run out of space within the next 4 hours. |
NodeFilesystemAlmostOutOfSpace | warning | Filesystem has less than 5% space left. |
NodeFilesystemAlmostOutOfSpace | critical | Filesystem has less than 3% space left. |
NodeFilesystemFilesFillingUp | warning | Filesystem is predicted to run out of inodes within the next 24 hours. |
NodeFilesystemFilesFillingUp | critical | Filesystem is predicted to run out of inodes within the next 4 hours. |
NodeFilesystemAlmostOutOfFiles | warning | Filesystem has less than 5% inodes left. |
NodeFilesystemAlmostOutOfFiles | critical | Filesystem has less than 3% inodes left. |
NodeNetworkReceiveErrs | warning | Network interface is reporting many receive errors. |
NodeNetworkTransmitErrs | warning | Network interface is reporting many transmit errors. |
NodeHighNumberConntrackEntriesUsed | warning | Number of conntrack are getting close to the
limit. |
NodeTextFileCollectorScrapeError | warning | Node Exporter text file collector failed to scrape. |
NodeClockSkewDetected | warning | Clock skew detected. |
NodeClockNotSynchronising | warning | Clock not synchronising. |
NodeRAIDDegraded | critical | RAID Array is degraded. |
NodeRAIDDiskFailure | warning | Failed device in RAID array. |
NodeFileDescriptorLimit | warning | Kernel is predicted to exhaust file descriptors limit soon. |
NodeFileDescriptorLimit | critical | Kernel is predicted to exhaust file descriptors limit soon. |
NodeNetworkInterfaceFlapping | warning | Network interface is often changing its status. |
PrometheusBadConfig | critical | Failed Prometheus configuration reload. |
PrometheusNotificationQueueRunningFull | warning | Prometheus alert notification queue predicted to run full in less than 30m. |
PrometheusErrorSendingAlertsToSomeAlertmanagers | warning | Prometheus has encountered more than 1% errors sending alerts to a specific Alertmanager. |
PrometheusNotConnectedToAlertmanagers | warning | Prometheus is not connected to any Alertmanagers. |
PrometheusTSDBReloadsFailing | warning | Prometheus has issues reloading blocks from disk. |
PrometheusTSDBCompactionsFailing | warning | Prometheus has issues compacting blocks. |
PrometheusNotIngestingSamples | warning | Prometheus is not ingesting samples. |
PrometheusDuplicateTimestamps | warning | Prometheus is dropping samples with duplicate timestamps. |
PrometheusOutOfOrderTimestamps | warning | Prometheus drops samples with out-of-order timestamps. |
PrometheusRemoteStorageFailures | critical | Prometheus fails to send samples to remote storage. |
PrometheusRemoteWriteBehind | critical | Prometheus remote write is behind. |
PrometheusRemoteWriteDesiredShards | warning | Prometheus remote write desired shards calculation wants to run more than configured max shards. |
PrometheusRuleFailures | critical | Prometheus is failing rule evaluations. |
PrometheusMissingRuleEvaluations | warning | Prometheus is missing rule evaluations due to slow rule group evaluation. |
PrometheusTargetLimitHit | warning | Prometheus has dropped targets because some scrape configs have exceeded the targets limit. |
PrometheusLabelLimitHit | warning | Prometheus has dropped targets because some scrape configs have exceeded the labels limit. |
PrometheusTargetSyncFailure | critical | Prometheus has failed to sync targets. |
PrometheusErrorSendingAlertsToAnyAlertmanager | critical | Prometheus encounters more than 3% errors sending alerts to any Alertmanager. |
PrometheusOperatorListErrors | warning | Errors while performing list operations in controller. |
PrometheusOperatorWatchErrors | warning | Errors while performing list operations in controller. |
PrometheusOperatorSyncFailed | warning | Last controller reconciliation failed. |
PrometheusOperatorReconcileErrors | warning | Errors while reconciling controller. |
PrometheusOperatorNodeLookupErrors | warning | Errors while reconciling Prometheus. |
PrometheusOperatorNotReady | warning | Prometheus operator not ready. |
PrometheusOperatorRejectedResources | warning | Resources rejected by Prometheus operator. |