Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

Navigation

Traffic Black Hole Caused by Fabric Degradation

A traffic black hole occurs when packets are dropped by a device without notification. Other connected devices continue to forward traffic to the affected device, impacting the network performance. A severely degraded fabric plane can be one of the reasons for a traffic black hole.

Devices can limit the black-hole time by detecting unreachable destination Packet Forwarding Engines and signaling connected devices when they cannot carry traffic because of a severely degraded fabric.

Packet Forwarding Engine destinations can become unreachable for the following reasons:

  • The fabric Switch Interface Boards (SIBs) go offline as a result of a CLI command or a pressed physical button.
  • The fabric SIBs are turned offline by the Switch Processor Mezzanine Board (SPMB) because of high temperature.
  • Voltage or polled I/O errors in the SIBs detected by the SPMB.
  • On T640 and T1600 routers:
    • All Packet Forwarding Engines receive destination errors on all planes from remote Packet Forwarding Engines, even when the SIBs are online
    • Complete fabric loss caused by destination timeouts, even when the SIBs are online.
  • On PTX Series systems:
    • Link errors on all connected planes
    • Two Packet Forwarding Engines can reach the fabric but not each other
    • Link errors where two Packet Forwarding Engines have connectivity with the fabric but not through a common plane

When the system detects any unreachable Packet Forwarding Engine destinations, healing from a traffic black hole is attempted. If the healing fails, the system turns off the interfaces, thereby stopping the traffic black hole.

The recovery process consists of the following phases:

  1. On T640 and T1600 routers:Fabric plane restart phase: Healing is attempted by restarting the fabric planes one by one. This phase does not start if the fabric plane is functioning properly and a single Flexible PIC Concentrator (FPC) is bad.

    On PTX Series systems: SIB restart phase: Healing is attempted by restarting the SIBs one by one. This phase does not start if the SIBs are functioning properly and a single Flexible PIC Concentrator (FPC) is bad.

  2. On T640 and T1600 routers: Fabric plane and FPC restart phase: Healing is attempted by restarting both the fabric planes and the FPCs. If there are bad FPCs that are unable to initiate high-speed links to the fabric after reboot, creation of traffic black hole is limited because no interfaces are created for these FPCs.

    On PTX Series systems: SIB and FPC restart phase: Healing is attempted by restarting both the SIBs and the FPCs. If there are bad FPCs that are unable to initiate high-speed links to the fabric after reboot, creation of traffic black hole is limited because no interfaces are created for these FPCs.

  3. FPC offline phase: Traffic black hole is limited by turning the FPCs offline and by turning off interfaces because previous attempts at recovery have failed.

By default, the system limits black-hole time by detecting severely degraded fabric. No user interaction is necessary.

Published: 2013-03-07