Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

 
 

Understanding High Availability Features on Juniper Networks Routers

For Juniper Networks routing platforms running the Junos operating system (Junos OS), high availability refers to the hardware and software components that provide redundancy and reliability for packet-based communications. This topic provides brief overviews of the following high availability features:

Routing Engine Redundancy

Redundant Routing Engines are two Routing Engines that are installed in the same routing platform. One functions as the primary, while the other stands by as a backup should the primary Routing Engine fail. On routing platforms with dual Routing Engines, network reconvergence takes place more quickly than on routing platforms with a single Routing Engine.

Graceful Routing Engine Switchover

Graceful Routing Engine switchover (GRES) enables a routing platform with redundant Routing Engines to continue forwarding packets, even if one Routing Engine fails. Graceful Routing Engine switchover preserves interface and kernel information. Traffic is not interrupted. However, graceful Routing Engine switchover does not preserve the control plane. Neighboring routers detect that the router has experienced a restart and react to the event in a manner prescribed by individual routing protocol specifications.

Note:

To preserve routing during a switchover, graceful Routing Engine switchover must be combined with either graceful restart protocol extensions or nonstop active routing. For more information, see Understanding Graceful Routing Engine Switchover and Nonstop Active Routing Concepts.

Note:

In T Series routers, TX Matrix routers, and TX Matrix Plus routers, the control plane is preserved in case of GRES with NSR, and 75% of line rate worth of traffic per Packet Forwarding Engine remains uninterrupted during GRES.

Nonstop Bridging

Nonstop bridging enables an MX Series 5G Universal Routing Platform with redundant Routing Engines to switch from a primary Routing Engine to a backup Routing Engine without losing Layer 2 Control Protocol (L2CP) information. Nonstop bridging uses the same infrastructure as graceful Routing Engine switchover to preserve interface and kernel information. However, nonstop bridging also saves L2CP information by running the Layer 2 Control Protocol process (l2cpd) on the backup Routing Engine.

Note:

To use nonstop bridging, you must first enable graceful Routing Engine switchover.

Nonstop bridging is supported for the following Layer 2 control protocols:

  • Spanning Tree Protocol (STP)

  • Rapid Spanning Tree Protocol (RSTP)

  • Multiple Spanning Tree Protocol (MSTP)

  • VLAN Spanning Tree Protocol (VSTP)

For more information, see Nonstop Bridging Concepts.

Nonstop Active Routing

Nonstop active routing (NSR) enables a routing platform with redundant Routing Engines to switch from a primary Routing Engine to a backup Routing Engine without alerting peer nodes that a change has occurred. Nonstop active routing uses the same infrastructure as graceful Routing Engine switchover to preserve interface and kernel information. However, nonstop active routing also preserves routing information and protocol sessions by running the routing protocol process (rpd) on both Routing Engines. In addition, nonstop active routing preserves TCP connections maintained in the kernel.

Note:

To use nonstop active routing, you must also configure graceful Routing Engine switchover.

For a list of protocols and features supported by nonstop active routing, see Nonstop Active Routing Protocol and Feature Support.

For more information about nonstop active routing, see Nonstop Active Routing Concepts.

Graceful Restart

With routing protocols, any service interruption requires an affected router to recalculate adjacencies with neighboring routers, restore routing table entries, and update other protocol-specific information. An unprotected restart of a router can result in forwarding delays, route flapping, wait times stemming from protocol reconvergence, and even dropped packets. To alleviate this situation, graceful restart provides extensions to routing protocols. These protocol extensions define two roles for a router—restarting and helper. The extensions signal neighboring routers about a router undergoing a restart and prevent the neighbors from propagating the change in state to the network during a graceful restart wait interval. The main benefits of graceful restart are uninterrupted packet forwarding and temporary suppression of all routing protocol updates. Graceful restart enables a router to pass through intermediate convergence states that are hidden from the rest of the network.

When a router is running graceful restart and the router stops sending and replying to protocol liveness messages (hellos), the adjacencies assume a graceful restart and begin running a timer to monitor the restarting router. During this interval, helper routers do not process an adjacency change for the router that they assume is restarting, but continue active routing with the rest of the network. The helper routers assume that the router can continue stateful forwarding based on the last preserved routing state during the restart.

If the router was actually restarting and is back up before the graceful timer period expires in all of the helper routers, the helper routers provide the router with the routing table, topology table, or label table (depending on the protocol), exit the graceful period, and return to normal network routing.

If the router does not complete its negotiation with helper routers before the graceful timer period expires in all of the helper routers, the helper routers process the router's change in state and send routing updates, so that convergence occurs across the network. If a helper router detects a link failure from the router, the topology change causes the helper router to exit the graceful wait period and to send routing updates, so that network convergence occurs.

To enable a router to undergo a graceful restart, you must include the graceful-restart statement at the global [edit routing-options] or [edit routing-instances instance-name routing-options] hierarchy level. You can, optionally, modify the global settings at the individual protocol level. When a routing session is started, a router that is configured with graceful restart must negotiate with its neighbors to support it when it undergoes a graceful restart. A neighboring router will accept the negotiation and support helper mode without requiring graceful restart to be configured on the neighboring router.

Note:

A Routing Engine switchover event on a helper router that is in graceful wait state causes the router to drop the wait state and to propagate the adjacency’s state change to the network.

Graceful restart is supported for the following protocols and applications:

  • BGP

  • ES-IS

  • IS-IS

  • OSPF/OSPFv3

  • PIM sparse mode

  • RIP/RIPng

  • MPLS-related protocols, including:

    • Label Distribution Protocol (LDP)

    • Resource Reservation Protocol (RSVP)

    • Circuit cross-connect (CCC)

    • Translational cross-connect (TCC)

  • Layer 2 and Layer 3 virtual private networks (VPNs)

For more information, see Graceful Restart Concepts.

Nonstop Active Routing Versus Graceful Restart

Nonstop active routing and graceful restart are two different methods of maintaining high availability. Graceful restart requires a router restart. A router undergoing a graceful restart relies on its neighbors (or helpers) to restore its routing protocol information. The restart is the mechanism by which helpers are signaled to exit the wait interval and start providing routing information to the restarting router For more information, see Graceful Restart Concepts.

In contrast, nonstop active routing does not involve a router restart. Both the primary and backup Routing Engines are running the routing protocol process (rpd) and exchanging updates with neighbors. When one Routing Engine fails, the router simply switches to the active Routing Engine to exchange routing information with neighbors. Because of these feature differences, nonstop routing and graceful restart are mutually exclusive. Nonstop active routing cannot be enabled when the router is configured as a graceful restarting router. If you include the graceful-restart statement at any hierarchy level and the nonstop-routing statement at the [edit routing-options] hierarchy level and try to commit the configuration, the commit request fails. For more information, see Nonstop Active Routing Concepts.

Effects of a Routing Engine Switchover

Effects of a Routing Engine Switchover describes the effects of a Routing Engine switchover when no high availability features are enabled and when graceful Routing Engine switchover, graceful restart, and nonstop active routing features are enabled.

VRRP

The Virtual Router Redundancy Protocol (VRRP) enables hosts on a LAN to make use of redundant routing platforms (primary and backup pairs) on the LAN, requiring only the static configuration of a single default route on the hosts.

The VRRP routing platform pairs share the IP address corresponding to the default route configured on the hosts. At any time, one of the VRRP routing platforms is the primary (active) and the others are backups. If the primary fails, one of the backup routers or switches becomes the new primary router.

VRRP has advantages in ease of administration and network throughput and reliability:

  • It provides a virtual default routing platform.

  • It enables traffic on the LAN to be routed without a single point of failure.

  • A virtual backup router can take over a failed default router:

    • Within a few seconds.

    • With a minimum of VRRP traffic.

    • Without any interaction with the hosts.

Devices running VRRP dynamically elect primary and backup routers. You can also force assignment of primary and backup routers using priorities from 1 through 255, with 255 being the highest priority.

In VRRP operation, the default primary router sends advertisements to backup routers at regular intervals (default 1 second). If a backup router does not receive an advertisement for a set period, the backup router with the next highest priority takes over as primary and begins forwarding packets.

As of Junos OS Release 13.2, VRRP nonstop active routing (NSR) is enabled only when you configure the nonstop-routing statement at the [edit routing-options] or [edit logical system logical-system-name routing-options] hierarchy level.

For more information, see Understanding VRRP.

Unified ISSU

A unified in-service software upgrade (unified ISSU) enables you to upgrade between two different Junos OS Releases with no disruption on the control plane and with minimal disruption of traffic. Unified ISSU is only supported by dual Routing Engine platforms. In addition, graceful Routing Engine switchover (GRES) and nonstop active routing (NSR) must be enabled.

With a unified ISSU, you can eliminate network downtime, reduce operating costs, and deliver higher services levels. For more information, see Getting Started with Unified In-Service Software Upgrade.

Interchassis Redundancy for MX Series Routers Using Virtual Chassis

Interchassis redundancy is a high availability feature that can span equipment located across multiple geographies to prevent network outages and protect routers against access link failures, uplink failures, and wholesale chassis failures without visibly disrupting the attached subscribers or increasing the network management burden for service providers. As more high-priority voice and video traffic is carried on the network, interchassis redundancy has become a requirement for providing stateful redundancy on broadband subscriber management equipment such as broadband services routers, broadband network gateways, and broadband remote access servers. Interchassis redundancy support enables service providers to fulfill strict service-level agreements (SLAs) and avoid unplanned network outages to better meet the needs of their customers.

To provide a stateful interchassis redundancy solution for MX Series 5G Universal Routing Platforms, you can configure a Virtual Chassis. A Virtual Chassis configuration interconnects two MX Series routers into a logical system that you can manage as a single network element. The member routers in a Virtual Chassis are designated as the primary router (also known as the protocol primary) and the backup router (also known as the protocol backup). The member routers are interconnected by means of dedicated Virtual Chassis ports that you configure on Trio Modular Port Concentrator/Modular Interface Card (MPC/MIC) interfaces.

An MX Series Virtual Chassis is managed by the Virtual Chassis Control Protocol (VCCP), which is a dedicated control protocol based on IS-IS. VCCP runs on the Virtual Chassis port interfaces and is responsible for building the Virtual Chassis topology, electing the Virtual Chassis primary router, and establishing the interchassis routing table to route traffic within the Virtual Chassis.

Starting with Junos OS Release 11.2, Virtual Chassis configurations are supported on MX240, MX480, and MX960 Universal Routing Platforms with Trio MPC/MIC interfaces and dual Routing Engines. In addition, graceful Routing Engine switchover (GRES) and nonstop active routing (NSR) must be enabled on both member routers in the Virtual Chassis.

Platform-Specific High Availability Behavior on ACX7000 Series

The hardware architecture on ACX7000 series of devices differs from PTX and MX series devices. In PTX and MX series devices, FPC hosts both the datapath PFE as well as the WAN facing ports (PIC/MIC). In PTX and MX series devices, each FPCs are designed to include the CPU compute resource as well to manage the FPC components.

On ACX7000 series of devices, the Forwarding Engine Board (FEB) FRU contains only the PFE complex, and the Routing Engine contains the CPU compute complex. Routing Engine FRU executes both Routing Engine and line-card applications.

The following table shows the high availability attributes and features supported on ACX7000 series of devices:

Table 1: High availability attributes and features on ACX7000 series

High availability attributes and features

ACX7509

ACX7348

Control plane (RE) redundancy

Yes

Yes

Data plane (PFE) redundancy

Yes

No

GRES+GR

Yes

Yes

GRES+NSR

Yes

Yes

Note:

On ACX7348, if you alter the present flow or introduce a new flow during the Routing Engine switchover, the convergence does not take place until the switchover completes. Topology changes during the switchover are applied only after switchover. Traffic loss and minor statistics loss is expected during switchover.

GRES is enabled by default on Junos Evolved operating system and cannot be disabled

To preserve routing during a switchover, GRES must be combined with either:

  • Graceful restart (GR) protocol extensions
  • Nonstop active routing (NSR) and Nonstop Bridging (NSB)

On ACX7348 device, if a configuration belonging to features like Broadband network gateway (BNG), VXLAN, sFlow, J-Flow, and port mirroring is detected during Routing Engine switchover, then datapath is reset and traffic reconvergence is seen.

Before issuing any switchover command from the primary Routing Engine, check the status of the backup Routing Engine using the show system switchover command on the backup Routing Engine. If switchover status is ready, then issue the switchover command.

The switchover command can be issued even if the backup Routing Engine is not ready. In this case, Routing Engine will switchover the primary Routing Engine (even though the backup is not ready) and the system behavior is indeterminate.

Routing Engine switchover results in statistics accounting loss for the duration of switchover time.

ACX7509 supports Routing Engine redundancy as mentioned in the following table:

Table 2: ACX7509 Routing Engine Redundancy

System configuration

Redundancy

Single RE / single FEB

Not applicable. System works in non-redundant mode

Dual RE / dual FEB

Supported

Dual RE / single FEB

Not supported. System works in non-redundant mode

Single RE / dual FEB

Not supported. System works in non-redundant mode

Timing protocols do not support high availability. Therefore, timing applications will run only on active primary Routing Engine and won't be running on the backup Routing Engine. Timing applications restart on Routing Engine switchover. During Routing Engine switchover, either graceful or non-graceful RE switchover, PTP, GM, and SYNCE will lose the lock, and the box will go to FREERUN state. The PTP packet path within the hardware will be broken. All the downstream devices will switch to an alternate primary device in the network. If no alternate primary is present, then all the downstream devices will go to a HOLDOVER state.

On ACX7348 device, if you press the Online/Offline button of the primary Routing Engine (not the reset button), the switchover to backup Routing Engine happens gracefully. You can safely remove the Routing Engine card after the Routing Engine LEDs are turned off. Pressing the Online/Offline button on the backup Routing Engine has no effect on the primary Routing Engine.