Contrail Networking Alert List

Table 1: Contrail Networking Alert List
Alert Name	Severity	Description
`VRouterConnectionDown`	major	VRouter `<name><connection_type>` connection to `<connection_id>` is down.
`VRouterNonFunctional`	major	VRouter `<name>` is non-functional.
`ControllerNonFunctional`	major	Controller `<name>` is non-functional.
`ControllerConnectionDown`	major	Controller `<name><connection_type>` connection to `<connection_id>` is down.
`ControllerDBConnectionDown`	major	Controller `<name>` connection to database is down.
`AlertmanagerFailedReload`	critical	Reloading an Alertmanager configuration has failed.
`AlertmanagerMembersInconsistent`	critical	A member of an Alertmanager cluster has not found all other cluster members.
`AlertmanagerFailedToSendAlerts`	warning	An Alertmanager instance failed to send notifications.
`AlertmanagerClusterFailedToSendAlerts`	critical	All Alertmanager instances in a cluster failed to send notifications to a critical integration.
`AlertmanagerClusterFailedToSendAlerts`	warning	All Alertmanager instances in a cluster failed to send notifications to a non-critical integration.
`AlertmanagerConfigInconsistent`	critical	Alertmanager instances within the same cluster have different configurations.
`AlertmanagerClusterDown`	critical	Half or more of the Alertmanager instances within the same cluster are down.
`AlertmanagerClusterCrashlooping`	critical	Half or more of the Alertmanager instances within the same cluster are crashlooping.
`ConfigReloaderSidecarErrors`	warning	`config-reloader` sidecar has not had a successful reload for 10m.
`etcdInsufficientMembers`	critical	`etcd` cluster "`<name>`": insufficient members (`<value>`).
`etcdNoLeader`	critical	`etcd` cluster "`<name>`": member `<instance>` has no leader.
`etcdHighNumberOfLeaderChanges`	warning	`etcd` cluster "`<name>`": instance `<instance>` has seen `<value>` leader changes within the last hour.
`etcdHighNumberOfFailedGRPCRequests`	warning	`etcd` cluster "`<name>`": `<value>`% of requests for`<grpc_method>` failed on etcd instance `<instance>`.
`etcdHighNumberOfFailedGRPCRequests`	critical	`etcd` cluster "`<name>`": `<value>`% of requests for `<grpc_method>` failed on etcd instance `<instance>`.
`etcdGRPCRequestsSlow`	critical	`etcd` cluster "`<name>`": gRPC requests to `<grpc_method>` are taking `<value>`s on etcd instance `<instance>`.
`etcdMemberCommunicationSlow`	warning	`etcd` cluster "`<name>`": member communication with `<name>` is taking `<value>`s on etcd instance `<instance>`.
`etcdHighNumberOfFailedProposals`	warning	`etcd` cluster "`<name>`": `<value>` proposal failures within the last hour on etcd instance `<instance>`.
`etcdHighFsyncDurations`	warning	`etcd` cluster "`<name>`": 99th percentile fsync durations are `<value>`s on etcd instance `<instance>`.
`etcdHighCommitDurations`	warning	`etcd` cluster "`<name>`": 99th percentile commit durations `<value>`s on etcd instance `<instance>`.
`etcdHighNumberOfFailedHTTPRequests`	warning	`<value>`% of requests for `<method>` failed on etcd instance `<instance>`.
`etcdHighNumberOfFailedHTTPRequests`	critical	`<value>`% of requests for `<method>` failed on etcd instance `<instance>`.
`etcdHTTPRequestsSlow`	warning	`etcd` instance `<instance>` HTTP requests to `<method>` are slow.
`TargetDown`	warning	One or more targets are unreachable.
`KubeAPIErrorBudgetBurn`	critical	The API server is burning too much error budget.
`KubeAPIErrorBudgetBurn`	warning	The API server is burning too much error budget.
`KubeStateMetricsListErrors`	critical	`kube-state-metrics` is experiencing errors in list operations.
`KubeStateMetricsWatchErrors`	critical	`kube-state-metrics` is experiencing errors in watch operations.
`KubeStateMetricsShardingMismatch`	critical	`kube-state-metrics` sharding is misconfigured.
`KubeStateMetricsShardsMissing`	critical	`kube-state-metrics` shards are missing.
`KubePodCrashLooping`	warning	Pod is crash looping.
`KubePodNotReady`	warning	Pod has been in a non-ready state for more than 15 minutes.
`KubeDeploymentGenerationMismatch`	warning	Deployment generation mismatch due to possible roll-back.
`KubeDeploymentReplicasMismatch`	warning	Deployment has not matched the expected number of replicas.
`KubeStatefulSetReplicasMismatch`	warning	Deployment has not matched the expected number of replicas.
`KubeStatefulSetGenerationMismatch`	warning	`StatefulSet` generation mismatch due to possible roll-back.
`KubeStatefulSetUpdateNotRolledOut`	warning	`StatefulSet` update has not been rolled out.
`KubeDaemonSetRolloutStuck`	warning	`DaemonSet` rollout is stuck.
`KubeContainerWaiting`	warning	Pod container waiting longer than 1 hour.
`KubeDaemonSetNotScheduled`	warning	`DaemonSet` pods are not scheduled.
`KubeDaemonSetMisScheduled`	warning	`DaemonSet` pods are misscheduled.
`KubeJobCompletion`	warning	Job did not complete in time.
`KubeJobFailed`	warning	Job failed to complete.
`KubeHpaReplicasMismatch`	warning	HPA has not matched desired number of replicas.
`KubeHpaMaxedOut`	warning	HPA is running at max replicas.
`KubeCPUOvercommit`	warning	Cluster has overcommitted CPU resource requests.
`KubeMemoryOvercommit`	warning	Cluster has overcommitted CPU resource requests.
`KubeCPUQuotaOvercommit`	warning	Cluster has overcommitted CPU resource requests.
`KubeMemoryQuotaOvercommit`	warning	Cluster has overcommitted memory resource requests.
`KubeQuotaAlmostFull`	info	Namespace quota is going to be full.
`KubeQuotaFullyUsed`	info	Namespace quota is fully used.
`KubeQuotaExceeded`	warning	Namespace quota has exceeded the limits.
`CPUThrottlingHigh`	info	Processes experience elevated CPU throttling.
`KubePersistentVolumeFillingUp`	critical	`PersistentVolume` is filling up.
`KubePersistentVolumeFillingUp`	warning	`PersistentVolume` is filling up.
`KubePersistentVolumeErrors`	critical	`PersistentVolume` is having issues with provisioning.
`KubeVersionMismatch`	warning	Different semantic versions of Kubernetes components running.
`KubeClientErrors`	warning	Kubernetes API server client is experiencing errors.
`KubeClientCertificateExpiration`	warning	Client certificate is about to expire.
`KubeClientCertificateExpiration`	critical	Client certificate is about to expire.
`KubeAggregatedAPIErrors`	warning	Kubernetes aggregated API has reported errors.
`KubeAggregatedAPIDown`	warning	Kubernetes aggregated API is down.
`KubeAPIDown`	critical	Target disappeared from Prometheus target discovery.
`KubeAPITerminatedRequests`	warning	The Kubernetes `apiserver` has terminated `<value>` of its incoming requests.
`KubeControllerManagerDown`	critical	Target disappeared from Prometheus target discovery.
`KubeProxyDown`	critical	Target disappeared from Prometheus target discovery.
`KubeNodeNotReady`	warning	Node is not ready.
`KubeNodeUnreachable`	warning	Node is unreachable.
`KubeletTooManyPods`	info	Kubelet is running at capacity.
`KubeNodeReadinessFlapping`	warning	Node readiness status is flapping.
`KubeletPlegDurationHigh`	warning	Kubelet Pod Lifecycle Event Generator is taking too long to relist.
`KubeletPodStartUpLatencyHigh`	warning	Kubelet Pod startup latency is too high.
`KubeletClientCertificateExpiration`	warning	Kubelet client certificate is about to expire.
`KubeletClientCertificateExpiration`	critical	Kubelet client certificate is about to expire.
`KubeletServerCertificateExpiration`	warning	Kubelet server certificate is about to expire.
`KubeletServerCertificateExpiration`	critical	Kubelet server certificate is about to expire.
`KubeletClientCertificateRenewalErrors`	warning	Kubelet has failed to renew its client certificate.
`KubeletServerCertificateRenewalErrors`	warning	Kubelet has failed to renew its server certificate.
`KubeletDown`	critical	Target disappeared from Prometheus target discovery.
`KubeSchedulerDown`	critical	Target disappeared from Prometheus target discovery.
`NodeFilesystemSpaceFillingUp`	warning	Filesystem is predicted to run out of space within the next 24 hours.
`NodeFilesystemSpaceFillingUp`	critical	Filesystem is predicted to run out of space within the next 4 hours.
`NodeFilesystemAlmostOutOfSpace`	warning	Filesystem has less than 5% space left.
`NodeFilesystemAlmostOutOfSpace`	critical	Filesystem has less than 3% space left.
`NodeFilesystemFilesFillingUp`	warning	Filesystem is predicted to run out of inodes within the next 24 hours.
`NodeFilesystemFilesFillingUp`	critical	Filesystem is predicted to run out of inodes within the next 4 hours.
`NodeFilesystemAlmostOutOfFiles`	warning	Filesystem has less than 5% inodes left.
`NodeFilesystemAlmostOutOfFiles`	critical	Filesystem has less than 3% inodes left.
`NodeNetworkReceiveErrs`	warning	Network interface is reporting many receive errors.
`NodeNetworkTransmitErrs`	warning	Network interface is reporting many transmit errors.
`NodeHighNumberConntrackEntriesUsed`	warning	Number of `conntrack` are getting close to the limit.
`NodeTextFileCollectorScrapeError`	warning	Node Exporter text file collector failed to scrape.
`NodeClockSkewDetected`	warning	Clock skew detected.
`NodeClockNotSynchronising`	warning	Clock not synchronising.
`NodeRAIDDegraded`	critical	RAID Array is degraded.
`NodeRAIDDiskFailure`	warning	Failed device in RAID array.
`NodeFileDescriptorLimit`	warning	Kernel is predicted to exhaust file descriptors limit soon.
`NodeFileDescriptorLimit`	critical	Kernel is predicted to exhaust file descriptors limit soon.
`NodeNetworkInterfaceFlapping`	warning	Network interface is often changing its status.
`PrometheusBadConfig`	critical	Failed Prometheus configuration reload.
`PrometheusNotificationQueueRunningFull`	warning	Prometheus alert notification queue predicted to run full in less than 30m.
`PrometheusErrorSendingAlertsToSomeAlertmanagers`	warning	Prometheus has encountered more than 1% errors sending alerts to a specific Alertmanager.
`PrometheusNotConnectedToAlertmanagers`	warning	Prometheus is not connected to any Alertmanagers.
`PrometheusTSDBReloadsFailing`	warning	Prometheus has issues reloading blocks from disk.
`PrometheusTSDBCompactionsFailing`	warning	Prometheus has issues compacting blocks.
`PrometheusNotIngestingSamples`	warning	Prometheus is not ingesting samples.
`PrometheusDuplicateTimestamps`	warning	Prometheus is dropping samples with duplicate timestamps.
`PrometheusOutOfOrderTimestamps`	warning	Prometheus drops samples with out-of-order timestamps.
`PrometheusRemoteStorageFailures`	critical	Prometheus fails to send samples to remote storage.
`PrometheusRemoteWriteBehind`	critical	Prometheus remote write is behind.
`PrometheusRemoteWriteDesiredShards`	warning	Prometheus remote write desired shards calculation wants to run more than configured max shards.
`PrometheusRuleFailures`	critical	Prometheus is failing rule evaluations.
`PrometheusMissingRuleEvaluations`	warning	Prometheus is missing rule evaluations due to slow rule group evaluation.
`PrometheusTargetLimitHit`	warning	Prometheus has dropped targets because some scrape configs have exceeded the targets limit.
`PrometheusLabelLimitHit`	warning	Prometheus has dropped targets because some scrape configs have exceeded the labels limit.
`PrometheusTargetSyncFailure`	critical	Prometheus has failed to sync targets.
`PrometheusErrorSendingAlertsToAnyAlertmanager`	critical	Prometheus encounters more than 3% errors sending alerts to any Alertmanager.
`PrometheusOperatorListErrors`	warning	Errors while performing list operations in controller.
`PrometheusOperatorWatchErrors`	warning	Errors while performing list operations in controller.
`PrometheusOperatorSyncFailed`	warning	Last controller reconciliation failed.
`PrometheusOperatorReconcileErrors`	warning	Errors while reconciling controller.
`PrometheusOperatorNodeLookupErrors`	warning	Errors while reconciling Prometheus.
`PrometheusOperatorNotReady`	warning	Prometheus operator not ready.
`PrometheusOperatorRejectedResources`	warning	Resources rejected by Prometheus operator.

Contrail Networking Alert List

Related Documentation