Understanding High Availability Features on Juniper Networks Routers
For Juniper Networks routing platforms running the Junos operating system (Junos OS), high availability refers to the hardware and software components that provide redundancy and reliability for packet-based communications. This topic provides brief overviews of the following high availability features:
Routing Engine Redundancy
Redundant Routing Engines are two Routing Engines that are installed in the same routing platform. One functions as the primary, while the other stands by as a backup should the primary Routing Engine fail. On routing platforms with dual Routing Engines, network reconvergence takes place more quickly than on routing platforms with a single Routing Engine.
Graceful Routing Engine Switchover
Graceful Routing Engine switchover (GRES) enables a routing platform with redundant Routing Engines to continue forwarding packets, even if one Routing Engine fails. Graceful Routing Engine switchover preserves interface and kernel information. Traffic is not interrupted. However, graceful Routing Engine switchover does not preserve the control plane. Neighboring routers detect that the router has experienced a restart and react to the event in a manner prescribed by individual routing protocol specifications.
To preserve routing during a switchover, graceful Routing Engine switchover must be combined with either graceful restart protocol extensions or nonstop active routing. For more information, see Understanding Graceful Routing Engine Switchover and Nonstop Active Routing Concepts.
In T Series routers, TX Matrix routers, and TX Matrix Plus routers, the control plane is preserved in case of GRES with NSR, and 75% of line rate worth of traffic per Packet Forwarding Engine remains uninterrupted during GRES.
Nonstop Bridging
Nonstop bridging enables an MX Series 5G Universal Routing Platform with redundant Routing Engines to switch from a primary Routing Engine to a backup Routing Engine without losing Layer 2 Control Protocol (L2CP) information. Nonstop bridging uses the same infrastructure as graceful Routing Engine switchover to preserve interface and kernel information. However, nonstop bridging also saves L2CP information by running the Layer 2 Control Protocol process (l2cpd) on the backup Routing Engine.
To use nonstop bridging, you must first enable graceful Routing Engine switchover.
Nonstop bridging is supported for the following Layer 2 control protocols:
Spanning Tree Protocol (STP)
Rapid Spanning Tree Protocol (RSTP)
Multiple Spanning Tree Protocol (MSTP)
VLAN Spanning Tree Protocol (VSTP)
For more information, see Nonstop Bridging Concepts.
Nonstop Active Routing
Nonstop active routing (NSR) enables a routing platform with redundant Routing Engines to switch from a primary Routing Engine to a backup Routing Engine without alerting peer nodes that a change has occurred. Nonstop active routing uses the same infrastructure as graceful Routing Engine switchover to preserve interface and kernel information. However, nonstop active routing also preserves routing information and protocol sessions by running the routing protocol process (rpd) on both Routing Engines. In addition, nonstop active routing preserves TCP connections maintained in the kernel.
To use nonstop active routing, you must also configure graceful Routing Engine switchover.
For a list of protocols and features supported by nonstop active routing, see Nonstop Active Routing Protocol and Feature Support.
For more information about nonstop active routing, see Nonstop Active Routing Concepts.
Graceful Restart
With routing protocols, any service interruption requires an affected router to recalculate adjacencies with neighboring routers, restore routing table entries, and update other protocol-specific information. An unprotected restart of a router can result in forwarding delays, route flapping, wait times stemming from protocol reconvergence, and even dropped packets. To alleviate this situation, graceful restart provides extensions to routing protocols. These protocol extensions define two roles for a router—restarting and helper. The extensions signal neighboring routers about a router undergoing a restart and prevent the neighbors from propagating the change in state to the network during a graceful restart wait interval. The main benefits of graceful restart are uninterrupted packet forwarding and temporary suppression of all routing protocol updates. Graceful restart enables a router to pass through intermediate convergence states that are hidden from the rest of the network.
When a router is running graceful restart and the router stops sending and replying to protocol liveness messages (hellos), the adjacencies assume a graceful restart and begin running a timer to monitor the restarting router. During this interval, helper routers do not process an adjacency change for the router that they assume is restarting, but continue active routing with the rest of the network. The helper routers assume that the router can continue stateful forwarding based on the last preserved routing state during the restart.
If the router was actually restarting and is back up before the graceful timer period expires in all of the helper routers, the helper routers provide the router with the routing table, topology table, or label table (depending on the protocol), exit the graceful period, and return to normal network routing.
If the router does not complete its negotiation with helper routers before the graceful timer period expires in all of the helper routers, the helper routers process the router's change in state and send routing updates, so that convergence occurs across the network. If a helper router detects a link failure from the router, the topology change causes the helper router to exit the graceful wait period and to send routing updates, so that network convergence occurs.
To enable a router to undergo a graceful restart, you must include
the graceful-restart
statement at the global [edit
routing-options]
or [edit routing-instances instance-name routing-options]
hierarchy level. You can, optionally, modify
the global settings at the individual protocol level. When a routing
session is started, a router that is configured with graceful restart
must negotiate with its neighbors to support it when it undergoes
a graceful restart. A neighboring router will accept the negotiation
and support helper mode without requiring graceful restart to be configured
on the neighboring router.
A Routing Engine switchover event on a helper router that is in graceful wait state causes the router to drop the wait state and to propagate the adjacency’s state change to the network.
Graceful restart is supported for the following protocols and applications:
BGP
ES-IS
IS-IS
OSPF/OSPFv3
PIM sparse mode
RIP/RIPng
MPLS-related protocols, including:
Label Distribution Protocol (LDP)
Resource Reservation Protocol (RSVP)
Circuit cross-connect (CCC)
Translational cross-connect (TCC)
Layer 2 and Layer 3 virtual private networks (VPNs)
For more information, see Graceful Restart Concepts.
Nonstop Active Routing Versus Graceful Restart
Nonstop active routing and graceful restart are two different methods of maintaining high availability. Graceful restart requires a router restart. A router undergoing a graceful restart relies on its neighbors (or helpers) to restore its routing protocol information. The restart is the mechanism by which helpers are signaled to exit the wait interval and start providing routing information to the restarting router For more information, see Graceful Restart Concepts.
In contrast, nonstop active routing does not involve a router
restart. Both the primary and backup Routing Engines are running the
routing protocol process (rpd) and exchanging updates with neighbors.
When one Routing Engine fails, the router simply switches to the active
Routing Engine to exchange routing information with neighbors. Because
of these feature differences, nonstop routing and graceful restart
are mutually exclusive. Nonstop active routing cannot be enabled when
the router is configured as a graceful restarting router. If you include
the graceful-restart
statement at any hierarchy level and
the nonstop-routing
statement at the [edit routing-options]
hierarchy level and try to commit the configuration, the commit
request fails. For more information, see Nonstop Active Routing Concepts.
Effects of a Routing Engine Switchover
Effects of a Routing Engine Switchover describes the effects of a Routing Engine switchover when no high availability features are enabled and when graceful Routing Engine switchover, graceful restart, and nonstop active routing features are enabled.
VRRP
The Virtual Router Redundancy Protocol (VRRP) enables hosts on a LAN to make use of redundant routing platforms (primary and backup pairs) on the LAN, requiring only the static configuration of a single default route on the hosts.
The VRRP routing platform pairs share the IP address corresponding to the default route configured on the hosts. At any time, one of the VRRP routing platforms is the primary (active) and the others are backups. If the primary fails, one of the backup routers or switches becomes the new primary router.
VRRP has advantages in ease of administration and network throughput and reliability:
It provides a virtual default routing platform.
It enables traffic on the LAN to be routed without a single point of failure.
A virtual backup router can take over a failed default router:
Within a few seconds.
With a minimum of VRRP traffic.
Without any interaction with the hosts.
Devices running VRRP dynamically elect primary and backup routers. You can also force assignment of primary and backup routers using priorities from 1 through 255, with 255 being the highest priority.
In VRRP operation, the default primary router sends advertisements to backup routers at regular intervals (default 1 second). If a backup router does not receive an advertisement for a set period, the backup router with the next highest priority takes over as primary and begins forwarding packets.
As of Junos OS Release 13.2, VRRP nonstop active routing (NSR)
is enabled only when you configure the nonstop-routing
statement
at the [edit routing-options]
or [edit logical system logical-system-name routing-options]
hierarchy level.
For more information, see Understanding VRRP.
Unified ISSU
A unified in-service software upgrade (unified ISSU) enables you to upgrade between two different Junos OS Releases with no disruption on the control plane and with minimal disruption of traffic. Unified ISSU is only supported by dual Routing Engine platforms. In addition, graceful Routing Engine switchover (GRES) and nonstop active routing (NSR) must be enabled.
With a unified ISSU, you can eliminate network downtime, reduce operating costs, and deliver higher services levels. For more information, see Getting Started with Unified In-Service Software Upgrade.
Interchassis Redundancy for MX Series Routers Using Virtual Chassis
Interchassis redundancy is a high availability feature that can span equipment located across multiple geographies to prevent network outages and protect routers against access link failures, uplink failures, and wholesale chassis failures without visibly disrupting the attached subscribers or increasing the network management burden for service providers. As more high-priority voice and video traffic is carried on the network, interchassis redundancy has become a requirement for providing stateful redundancy on broadband subscriber management equipment such as broadband services routers, broadband network gateways, and broadband remote access servers. Interchassis redundancy support enables service providers to fulfill strict service-level agreements (SLAs) and avoid unplanned network outages to better meet the needs of their customers.
To provide a stateful interchassis redundancy solution for MX Series 5G Universal Routing Platforms, you can configure a Virtual Chassis. A Virtual Chassis configuration interconnects two MX Series routers into a logical system that you can manage as a single network element. The member routers in a Virtual Chassis are designated as the primary router (also known as the protocol primary) and the backup router (also known as the protocol backup). The member routers are interconnected by means of dedicated Virtual Chassis ports that you configure on Trio Modular Port Concentrator/Modular Interface Card (MPC/MIC) interfaces.
An MX Series Virtual Chassis is managed by the Virtual Chassis Control Protocol (VCCP), which is a dedicated control protocol based on IS-IS. VCCP runs on the Virtual Chassis port interfaces and is responsible for building the Virtual Chassis topology, electing the Virtual Chassis primary router, and establishing the interchassis routing table to route traffic within the Virtual Chassis.
Starting with Junos OS Release 11.2, Virtual Chassis configurations are supported on MX240, MX480, and MX960 Universal Routing Platforms with Trio MPC/MIC interfaces and dual Routing Engines. In addition, graceful Routing Engine switchover (GRES) and nonstop active routing (NSR) must be enabled on both member routers in the Virtual Chassis.