Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation

Traffic Black Hole Caused by Fabric Degradation

A traffic black hole occurs when packets are dropped by a device without notification. Other connected devices continue to forward traffic to the affected device, impacting the network performance. A severely degraded fabric plane can be one of the reasons for a traffic black hole.

Devices can limit the black-hole time by detecting unreachable destination Packet Forwarding Engines and signaling connected devices when they cannot carry traffic because of a severely degraded fabric.

Packet Forwarding Engine destinations can become unreachable for the following reasons:

  • The fabric Switch Interface Boards (SIBs) go offline as a result of a CLI command or a pressed physical button.
  • The fabric SIBs are turned offline by the Switch Processor Mezzanine Board (SPMB) because of high temperature.
  • Voltage or polled I/O errors in the SIBs detected by the SPMB.
  • On T640, T1600, and TX Matrix routers:
    • All Packet Forwarding Engines receive destination errors on all planes from remote Packet Forwarding Engines, even when the SIBs are online
    • Complete fabric loss caused by destination timeouts, even when the SIBs are online.
  • On PTX Series systems:
    • Link errors on all connected planes
    • Two Packet Forwarding Engines can reach the fabric but not each other
    • Link errors where two Packet Forwarding Engines have connectivity with the fabric but not through a common plane

When the system detects any unreachable Packet Forwarding Engine destinations, healing from a traffic black hole is attempted. If the healing fails, the system turns off the interfaces, thereby stopping the traffic black hole.

The recovery process consists of the following phases:

  1. (On T640, T1600, and TX Matrix routers) Fabric plane restart phase: Healing is attempted by restarting the fabric planes one by one. This phase does not start if the fabric plane is functioning properly and a single Flexible PIC Concentrator (FPC) is bad.

    (On PTX Series systems) SIB restart phase: Healing is attempted by restarting the SIBs one by one. This phase does not start if the SIBs are functioning properly and a single Flexible PIC Concentrator (FPC) is bad.

  2. (On T640, T1600, and TX Matrix routers) Fabric plane and FPC restart phase: Healing is attempted by restarting both the fabric planes and the FPCs. If there are bad FPCs that are unable to initiate high-speed links to the fabric after reboot, creation of traffic black hole is limited because no interfaces are created for these FPCs.

    (On PTX Series systems) SIB and FPC restart phase: Healing is attempted by restarting both the SIBs and the FPCs. If there are bad FPCs that are unable to initiate high-speed links to the fabric after reboot, creation of traffic black hole is limited because no interfaces are created for these FPCs.

  3. FPC offline phase: Traffic black hole is limited by turning the FPCs offline and by turning off interfaces because previous attempts at recovery have failed.

By default, the system limits traffic black-hole time by detecting severely degraded fabric. No user interaction is necessary.

Published: 2014-02-18