Inter-Chassis Services Redundancy Overview for Next Gen Services
Introduction to Inter-Chassis Services Redundancy
Interchassis redundancy for services is controlled by the services redundancy daemon (SRD). The SRD lets you specify events that trigger a switchover between the primary and standby services PICs, which are on two different MX Series chassis. The SRD monitors conditions, and performs a switchover when an event occurs. Inter-chassis services redundancy is a primary-secondary model, not an active-active cluster. Only one services PIC in a redundancy pair, the current primary, receives traffic to be serviced.
You can configure redundancy based on the following monitored events:
Link down events.
FPC and PIC reboots.
Routing protocol daemon (rpd) terminates and restarts.
Peer gateway events, including requests to acquire or release primary role, or to broadcast warnings.
Benefits
Inter-chassis services redundancy provides automatic switchovers from a services PIC on one chassis to a services PIC on another chassis when a monitored event occurs.
Services Redundancy Components
The following configurable components control services redundancy processing:
Redundancy Event—A monitored critical event that triggers the redundancy peers to acquire or release primary role or to create a warning, and to add or delete signal routes.
One monitored interface can be part of only one redundancy event, but one redundancy event can have multiple monitored interfaces.
Redundancy Policy—A policy that defines the set of actions taken when a redundancy event occurs. Available actions include acquisition or release of primary role, creation of a warning, and addition or deletion of signal routes. You can configure a maximum of 256 redundancy policies. A redundancy policy can have a maximum of 256 interface-down events.
One redundancy event can be part of only one redundancy policy, but one redundancy policy can have multiple redundancy events. For example, redundancy policy RP1 can include redundancy events RE1 and RE2. Redundancy events RE1 and RE2 cannot be included in redundancy policies other than RP1.
Redundancy Set—A collection of one or more redundancy policies that is assigned to one or more service sets on each MX Series chassis of the redundant pair, and the redundancy group that is associated with the redundancy set. At a given time, a particular redundancy set can be active on only one gateway, but not all redundancy sets have to be active on the same gateway. For example, redundancy set A can be active on gateway 1 while redundancy set B is active on gateway 2. You can configure a maximum of 128 redundancy sets.
One service set can be assigned only one redundancy set, but multiple service sets can be assigned the same redundancy set.
One redundancy policy can be part of only one redundancy set, but one redundancy set can have multiple redundancy policies. For example, redundancy set RS1 can include redundancy policies RP1 and RP2. Redundancy policies RP1 and RP2 cannot be included in redundancy sets other than RS1. A redundancy set can have a maximum of 16 redundancy policies.
Redundancy Group—The redundancy group identifies the associated ICCP redundancy group. A one-to-one relationship exists between a redundancy set and a redundancy group. One redundancy set can be part of only one redundancy group. You can configure a maximum of 16 redundancy groups. A maximum of 16 redundancy sets can be associated with the same redundancy group.
Signal routes—Static routes that are added or deleted by services redundancy processing, based on primary role state changes.
Routing Policies—Policies that advertise routes based on the existence or non-existence of signal routes.
VRRP (Virtual Router Redundancy Protocol) route tracking—Tracks whether a reachable signal route exists in the routing table of the routing instance in the configuration. Based on the reachability of the tracked route, VRRP route tracking dynamically changes the priority of the VRRP group.
Services Redundancy Operation
Services redundancy operates as follows:
The services redundancy daemon runs on the Routing Engine. It continuously monitors configured redundancy events.
When a redundancy event is detected, the services redundancy daemon:
Adds or removes signal routes specified in the redundancy policy.
Switches services to the standby.
Updates stateful synchronization roles as needed.
Resulting route changes cause:
The routing policy connected to this route to advertise routes differently.
VRRP to change advertised priorities.
To summarize the switchover process:
A critical event occurs.
The services redundancy daemon adds or removes a signal route.
A routing policy advertises routes differently. VRRP changes advertised priorities.
Services switch over to the standby.
Stateful synchronization is updated accordingly.
The order of routing priorities must match the order of services primary role.
If a redundancy policy action is release-primary role and the redundancy peer’s state is wait, the primary-role-release fails. If a redundancy policy action is release-primary role-force, the primary role release succeeds even if the redundancy peer’s state is warned.
Similarly, if a redundancy policy action on the standby is acquire-primary role and the local state is wait, the primary-role-release fails. If a redundancy policy action is acquire-primary role-force, the primary role release succeeds even if the standby state is wait.
You can also use a manual command to trigger a redundancy policy that releases or acquires primary role.
If gateway 1, the chassis that is configured with the lower IP address, is the primary chassis and you deactivate the services redundancy daemon on it, a switchover to gateway 2 occurs . If gateway 2, the chassis that is configured with the higher IP address, is the primary chassis and you deactivate the services redundancy daemon on it, a switchover does not occur.