Interoperability
As you know from Connectivity chapter, New York has transitioned to using SR for all intra-region traffic. Extra-region traffic still relies on LDP, which continues to run on all New York routers. This chapter reduces our dependence on LDP, while just as before, service continuity remains paramount.
As confidence is gained in the SR implementation, LDP can be eliminated entirely. This chapter focuses on eventually arriving at a state where LDP is no longer advertising unicast IPv4 or IPv6 FECs.
SR and LDP Interworking
LDP continues to distribute labels for all routers, even if the SR prefix SIDs are preferred in New York. Giving SR a higher preference only comes into effect if there is a competing route learned via LDP – when the the routers outside New York only have labels advertised via LDP, traffic to those destinations remains non-segment routed. If the plan is to turn LDP off, you must ensure you maintain a contiguous labeled path throughout, as LDP does for now (Figure 1). This matters in both the SR-to-LDP as well as LDP-to-SR directions.
SR-to-LDP
SR requires all routers to be reachable by a SID. Most of the network is lacking on that front. There are no prefix SIDs for remote routers as they are not yet running SR at all. Eliminate LDP in New York, and there will be no contiguous LSP outside the region. Clearly, then, you need a mechanism that indirectly associates prefix SIDs with non-SR-capable routers.
The SR mapping server (SRMS) is just such a control plane function. It advertises prefix SIDs on behalf of the non-SR-capable. As you would expect by now, this results in additional IS-IS TLVs that SR speakers will consume, and non-speakers ignore. SR routers will install prefix SIDs learned from mapping servers, much as they install native prefix SIDs, into inet.3.
Like all proxies, the SRMS must be highly available and its disseminated information must remain consistent – high availability is achieved by configuring more than one router as an SRMS. Mapping consistency is the aspirational result of following modern operational best practices, including templated configuration generation, version-controlled configuration repositories, and the ability to swiftly roll back unintended changes.
Complex tie-breaking rules exist if mapping inconsistencies occur. Each SRMS mapping carries a preference. Mappings from the SRMS with the highest preference are selected. If multiple SRMS have equal preference but advertise conflicting information, SR label collision rules come into effect.
The SRMS exists purely to advertise prefix to SID mappings. The prefixes themselves may or may not be reachable via the SRMS; in fact, the prefixes may be completely unreachable. Of course, advertising SID mappings for fictitious prefixes would be considered a misconfiguration, so standard change review process controls must be in effect to ensure incorrect mappings don’t come to life.
The analog to a server is a client. For a mapping server to hold any sway, there must exist a mapping client (SRMC). Different from most protocols, SR mapping clients and servers don’t form a distinct control plane relationship with each other. Instead, clients simply use the advertised mappings. An SRMS can simultaneously be both a server, as well as a client.
Routers bordering both domains are responsible for stitching the segments routed with LDP. The border router speaks both SR and LDP; on receiving a mapping, it creates a label swap entry for an incoming node SID to be switched with the LDP-learned entry for the same FEC in the mpls.0 table.
In this book’s network, both p1.nyc and p2.nyc will act as border routers. While it may be more intuitive to first configure an SRMS, that would disrupt connectivity. Because the New York routers prefer SR over LDP, learning SR mapping entries for the Washington FECs would immediately supplant the existing LDP-learned labels in the PE’s inet.3 table. Without explicitly enabling SR-to-LDP stitching behavior, when pe1.nyc and pe2.nyc attempt to use SR for long-haul traffic, traffic would be discarded at p1.nyc and p2.nyc whose mpls.0 table would lack a corresponding action for the inbound SR label.
Enabling SR to LDP Stitching
Let’s configure both p1.nyc and p2.nyc as border routers, acting as SR/LDP translators:
There should be no change to forwarding behavior yet.
Connectivity verification: cross-country traffic still uses LDP labels
user@ce1.nyc> traceroute wait 1 198.51.100.54 routing-instance svc-inet traceroute to 198.51.100.54 (198.51.100.54), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 4.390 ms 2.219 ms 2.002 ms 2 p1.nyc-ge-0-0-3.0 (192.0.2.9) 43.247 ms 115.612 ms 94.366 ms MPLS Label=28 CoS=0 TTL=1 S=1 3 p2.ewr-ge-0-0-2.0 (192.0.2.17) 11.741 ms p1.phl-ge-0-0-2.0 (192.0.2.21) 44.496 ms 75.404 ms MPLS Label=18 CoS=0 TTL=1 S=0 MPLS Label=17 CoS=0 TTL=1 S=1 4 p2.iad-ge-0-0-6.0 (192.0.2.28) 7.315 ms p1.iad-ge-0-0-5.0 (192.0.2.30) 6.626 ms 6.255 ms MPLS Label=17 CoS=0 TTL=1 S=1 5 pe1.iad-ge-0-0-1.0 (192.0.2.36) 129.904 ms 190.350 ms 100.185 ms 6 ce1.iad-ge-0-0-4.0 (198.51.100.54) 9.483 ms 10.870 ms 8.107 ms
Here, both p1.nyc and p2.nyc are eagerly awaiting mapping entries from an SRMS. To start, p1.nyc will be configured as our first SRMS. It will advertise mappings for the PEs in Washington. Remember that any SR-capable router can act as an SRMS – its topological placement is irrelevant. A not strictly true comparison can be made with BGP route reflectors (in large topologies, judicious route reflector placement or the use of optimal route reflection becomes necessary):
The policy is created in a protocol-independent section of the configuration. That policy is then referenced under the protocol-specific hierarchy to allow p1.nyc to act as an SRMS using IS-IS.
Remember that SR is agnostic to the choice of IGPs – it would work just as well, had our network been OSPF-based.
Let’s tease this configuration apart. The policy name is a local matter. Within that, we are advertising a mapping entry that encompasses a range of four IPv4 prefixes, starting with 128.49.106.10 (pe1.iad’s router-id).
This is efficiently encoded as a single label-binding TLV in
p1.nyc’s IS-IS LSP. In the next verification, the Range
corresponds to the size keyword. The IPv4 prefix
is our starting prefix. What we configured as the start-index
is represented as Value
.
Control plane: verify p1.nyc is acting as an SRMS
user@pe1.nyc> show isis database p1.nyc extensive level 1 ... Label binding: 128.49.106.10/32, Flags: 0x00(F:0,M:0,S:0,D:0,A:0), Range 4 Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 10 ...
The resulting entries in inet.3 tables of SRMC should be those in Table 1.
Table 1: p1.nyc SRMS Mapping Entries
Prefix | Label (SRGB + index) | Reachable? |
---|---|---|
128.49.106.10 | 1010 (1000 + 10) | Yes |
128.49.106.11 | 1011 (1000 + 11) | Yes |
128.49.106.12 | 1012 (1000 + 12) | No |
128.49.106.13 | 1013 (1000 + 13) | Yes |
128.49.106.10 | 1010 (1000 + 10) | Yes |
128.49.106.11 | 1011 (1000 + 11) | Yes |
Forwarding plane: verify pe1.nyc prefers SR to LDP
user@pe2.nyc> show route table inet.3 match-prefix “128.49.106.1?/32” inet.3: 8 destinations, 14 routes (8 active, 0 holddown, 0 hidden) + = Active Route, - = Last Active, * = Both 128.49.106.10/32 *[L-ISIS/8] 00:11:32, metric 40 > to 192.0.2.9 via ge-0/0/1.0, Push 1010 to 192.0.2.11 via ge-0/0/2.0, Push 1010 [LDP/9] 00:12:08, metric 40, tag 1111 > to 192.0.2.9 via ge-0/0/1.0, Push 27 to 192.0.2.11 via ge-0/0/2.0, Push 27 128.49.106.11/32 *[L-ISIS/8] 00:11:32, metric 40 > to 192.0.2.9 via ge-0/0/1.0, Push 1011 to 192.0.2.11 via ge-0/0/2.0, Push 1011 [LDP/9] 00:12:08, metric 40, tag 1111 > to 192.0.2.9 via ge-0/0/1.0, Push 28 to 192.0.2.11 via ge-0/0/2.0, Push 28 128.49.106.13/32 *[L-ISIS/8] 00:11:32, metric 40 > to 192.0.2.9 via ge-0/0/1.0, Push 1013 to 192.0.2.11 via ge-0/0/2.0, Push 1013 [LDP/9] 00:12:08, metric 40, tag 1111 > to 192.0.2.9 via ge-0/0/1.0, Push 29 to 192.0.2.11 via ge-0/0/2.0, Push 29
As you can see, the LDP routes have been ignored in favor of the newly-learned SR prefix SIDs from p1.nyc acting as the SRMS. Where pe2.nyc would previously push the LDP label learned from p1.nyc or p2.nyc, it now uses the familiar SR labels starting at 1000.
The real crux however is seeing what p1.nyc and p2.nyc do when they receive these inbound labels. Since they are border routers, they should swap the mapping server derived label for the LDP-learned label. More specifically, the border routers should swap the inbound SR label for the outgoing label stack associated with the same FEC. Without that crucial step, you would have non-contiguous label switched paths.
Let’s first make note of p1.nyc’s outbound label stack towards p1.iad and how it’s learned via LDP.
Forwarding plane: verify p1.nyc’s outgoing label stack for p1.iad’s FEC
user@p1.nyc> show route 128.49.106.11/32 table inet.3 detail inet.3: 8 destinations, 13 routes (8 active, 0 holddown, 0 hidden) 128.49.106.11 /32 (1 entry, 1 announced) State: <FlashAll> *LDP Preference: 9 Next hop type: Router, Next hop index: 0 Address: 0xcc3ad10 Next-hop reference count: 4 Next hop: 192.0.2.21 via ge-0/0/5.0 weight 0x1, selected Label-switched-path p1.iad:0 Label operation: Push 17, Push 18(top) Label TTL action: prop-ttl, prop-ttl(top) Load balance label: Label 17: None; Label 18: None; Label element ptr: 0xccdbaa0 Label parent element ptr: 0xccdb2c0 Label element references: 2 Label element child references: 1 Label element lsp id: 0 Session Id: 0x0 Next hop: 192.0.2.15 via ge-0/0/4.0 weight 0x8001 Label-switched-path p1.iad:0 Label operation: Push 17, Push 20(top) Label TTL action: prop-ttl, prop-ttl(top) Load balance label: Label 17: None; Label 20: None; ...
Since MPLS LSPs can be recursive, this is clearly a case of tunneling. Label 17 is learned from p1.nyc’s LDP neighbor, who is its chosen next hop towards this FEC. That neighbor is p1.iad, reachable via the RSVP LSP over which an LDP session is tunneled.
Control plane: verify how p1.nyc created the label stack to reach p1.iads
user@p1.nyc> show rsvp session name p1.iad:0 ingress Ingress RSVP: 4 sessions To From State Rt Style Labelin Labelout LSPname 128.49.106.9 128.49.106.3 Up 0 1 SE - 18 p1.iad:0 Total 1 displayed, Up 1, Down 0 user@p1.nyc> show ldp database session 128.49.106.9 Input label database, 128.49.106.3:0--128.49.106.9:0 Labels received: 9 Label Prefix 20 128.49.106.0/32 21 128.49.106.1/32 22 128.49.106.2/32 23 128.49.106.3/32 16 128.49.106.8/32 3 128.49.106.9/32 19 128.49.106.10/32 17 128.49.106.11/32 18 128.49.106.13/32 ...
Based on this output, we should expect p1.nyc to swap incoming label 1011 with 17 (SR to LDP stitching), before pushing 18 on top (tunneled over RSVP LSP being exported as a forwarding adjacency).
Forwarding plane: confirm p1.nyc swaps the SR label for the LDP/RSVP combination
user@p1.nyc> show route label 1011 detail mpls.0: 41 destinations, 41 routes (41 active, 0 holddown, 0 hidden) 1011 (1 entry, 1 announced) *L-ISIS Preference: 14 Level: 2 Next hop type: Router, Next hop index: 0 Address: 0xcc3b290 Next-hop reference count: 1 Next hop: 192.0.2.21 via ge-0/0/5.0 weight 0x1, selected Label-switched-path p1.iad:0 Label operation: Swap 17, Push 18(top) Label TTL action: prop-ttl, prop-ttl(top) Load balance label: Label 17: None; Label 18: None; Label element ptr: 0xccdd6c0 Label parent element ptr: 0x0 Label element references: 2 Label element child references: 0 Label element lsp id: 0 Session Id: 0x0 Next hop: 192.0.2.15 via ge-0/0/4.0 weight 0x8001 Label-switched-path p1.iad:0 Label operation: Swap 17, Push 20(top) Label TTL action: prop-ttl, prop-ttl(top) Load balance label: Label 17: None; Label 20: None; ...
Voilá! The label operation for both the FEC in inet.3 and the corresponding label in mpls.0 are identical. When pe1.nyc pushes label 1011 on traffic destined for pe1.iad, p1.nyc (and p2.nyc) swap that for the LDP label (that an RSVP label is pushed next is incidental).
The key is how well SR coexists with the classic label distribution protocols, both the namesake LDP, as well as the powerful RSVP.
Let’s run another long-distance traceroute. It should confirm that pe1.nyc and pe2.nyc now use the SRMS advertised mapping for pe1.iad. Of course, pe1.iad, as with other non-New York routers, remains SR-oblivious.
Connectivity verification: pe2.nyc now pushes SR label 1011, instead of the earlier LDP-learned label
user@ce1.nyc> traceroute wait 1 198.51.100.54 routing-instance svc-inet traceroute to 198.51.100.54 (198.51.100.54), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 3.418 ms 2.418 ms 3.593 ms 2 p1.nyc-ge-0-0-3.0 (192.0.2.9) 12.553 ms 6.212 ms p2.nyc-ge-0-0-3.0 (192.0.2.11) 6.776 ms MPLS Label=1011 CoS=0 TTL=1 S=1 3 p1.phl-ge-0-0-2.0 (192.0.2.21) 9.093 ms 7.572 ms p2.ewr-ge-0-0-2.0 (192.0.2.17) 8.630 ms MPLS Label=16 CoS=0 TTL=1 S=0 MPLS Label=16 CoS=0 TTL=1 S=1 4 p2.iad-ge-0-0-6.0 (192.0.2.28) 13.157 ms 7.112 ms 12.586 ms MPLS Label=16 CoS=0 TTL=1 S=1 5 pe1.iad-ge-0-0-1.0 (192.0.2.36) 11.872 ms pe1.iad-ge-0-0-2.0 (192.0.2.38) 8.175 ms 5.957 ms 6 ce1.iad-ge-0-0-4.0 (198.51.100.54) 8.596 ms 16.851 ms 7.024 ms
We would be remiss if we didn’t validate the lack of expected change to intra-New York traffic.
Connectivity verification: traffic within New York continues to use SR
user@ce1.nyc> traceroute wait 1 198.51.100.0 routing-instance svc-inet traceroute to 198.51.100.0 (198.51.100.0), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 9.106 ms 1.801 ms 1.862 ms 2 p1.nyc-ge-0-0-3.0 (192.0.2.9) 6.196 ms p2.nyc-ge-0-0-3.0 (192.0.2.11) 94.158 ms 91.205 ms MPLS Label=1001 CoS=0 TTL=1 S=1 3 pe1.nyc-ge-0-0-2.0 (192.0.2.6) 42.152 ms pe1.nyc-ge-0-0-1.0 (192.0.2.4) 70.788 ms 67.294 ms 4 ce1.nyc-ge-0-0-1.1 (198.51.100.0) 129.316 ms 135.363 ms 146.201 ms
Excellent. There is no change from the earlier traceroutes.
Before proceeding, let’s address the lack of SRMS redundancy by configuring p2.nyc as our second mapping server. You can simply configure it identically to p1.nyc:
Control plane: p2.nyc also originates a label-binding SID with the same mapping as p1.nyc
user@p2.nyc> show isis database p2.nyc extensive level 1 ... Label binding: 128.49.106.10/32, Flags: 0x00(F:0,M:0,S:0,D:0,A:0), Range 4 Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 10
There is no change to forwarding behavior. If p1.nyc becomes unavailable, the mappings from p2.nyc continue to appear as surrogate node SIDs.
For the sake of understanding a different approach to configuring SRMS entries, let’s remove this configuration from p2.nyc, and instead, add individual prefix entries and then investigate how they are differently encoded in the IS-IS LSP, yet lead to the same forwarding behavior:
We have created mappings for individual prefixes, instead of a range. Unlike the range that covered a prefix that isn’t currently part of our network (128.49.106.12), this form allows non-contiguous mappings. As you would expect, it requires separate TLVs in p2.nyc’s IS-IS PDU.
Control plane: p2.nyc now has multiple label binding TLVs
user@pe1.nyc> show isis database p2.nyc extensive ... TLVs: Area address: 49.0001 (3) LSP Buffer Size: 1492 Speaks: IP Speaks: IPV6 IP router id: 128.49.106.2 IP address: 128.49.106.2 Label binding: 128.49.106.10/32, Flags: 0x00(F:0,M:0,S:0,D:0,A:0), Range 1 Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 10 Label binding: 128.49.106.11/32, Flags: 0x00(F:0,M:0,S:0,D:0,A:0), Range 1 Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 11 Label binding: 128.49.106.13/32, Flags: 0x00(F:0,M:0,S:0,D:0,A:0), Range 1 Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 13
Where before we saw a single TLV provide mappings for all Washington’s PE routers (as well as an extraneous entry), we now find three TLVs, one for each PE. If the prefixes are not contiguous, this form will be needed. Keep in mind that both approaches have their own considerations:
Prefix ranges are the most compact way to distribute this information. Their scope may be too broad, covering prefixes that should or do not need SRMS services.
Multiple, label-binding TLVs are more precise but lead to inflated IGP PDUs, whose flooding and fragmentation may make distribution less efficient.
Of course, both approaches can be simultaneously used – a range when prefixes are conveniently addressed, and separate entries when inconvenient.
Conflict Resolution
Let’s explore this hybrid approach with a novel idea – advertise mappings for SR-capable nodes, from p2.nyc. The p2.nyc SRMS will continue to advertise individual entries for Washington routers, and add a range that covers the New York routers. The intent is simply to understand whether this causes any confusion or harm:
You can see our existing entries augmented by a range starting with pe2.nyc’s router-id (128.49.106.0). Since pe2.nyc is already advertising its node index, and the index matches what the SRMS purports to originate, there is no change to the forwarding behavior.
Control plane: p2.nyc is advertising ranges for both SR-capable and incapable routers
user@pe2.nyc> show isis database p2.nyc extensive ... Label binding: 128.49.106.10/32, Flags: 0x00(F:0,M:0,S:0,D:0,A:0), Range 1 Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 10 Label binding: 128.49.106.11/32, Flags: 0x00(F:0,M:0,S:0,D:0,A:0), Range 1 Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 11 Label binding: 128.49.106.13/32, Flags: 0x00(F:0,M:0,S:0,D:0,A:0), Range 1 Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 13 Label binding: 128.49.106.0/32, Flags: 0x00(F:0,M:0,S:0,D:0,A:0), Range 4 Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 0 ...
The exercise served to educate us that both individual and prefix ranges can be used together. While not damaging, this configuration does not represent an operational best practice. Muddying our intent with gratuitous advertisements is, at best, confusing, and at worst, a landmine for future software behavior or standards specification changes.
Remove the unneeded prefix-segment-range.
LDP-to-SR
Have we now eliminated the need for LDP on all New York PE routers? They have reachability to each other via SR node SIDs; they have additionally learned synthetic node SIDs for the Washington PE routers. Is LDP still necessary? After one last step, we will be through with it on the PEs.
Remember that the Washington routers learn about New York FECs via LDP. It is tunneled over the RSVP LSPs originating at p1.nyc and p2.nyc, and terminating at p1.iad and p2.iad. LDP can’t be turned off on these routers without breaking the contiguous LSP.
We can turn LDP off on pe1.nyc and pe2.nyc. Let’s first think about how we can continue to advertise labels for them via LDP to the Washington routers. Since the LDP session between the P and PE routers would be removed, the P routers need to be configured to originate an LDP label on the behalf of the PEs.
This is the same idea as SRMS, turned the other way around, through the proverbial looking glass. Instead of advertising prefix SIDs on behalf of the SR-incapable, we need to advertise proxy FEC mappings for the LDP-averse.
Before making changes, let’s verify the LDP labels being advertised to reach pe2.nyc by p2.nyc.
Control plane: p2.nyc is advertising label 56 to its LDP neighbors
user@p2.nyc> show ldp database | match “put|128.49.106.0” Input label database, 128.49.106.2:0--128.49.106.0:0 3 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.0:0 56 128.49.106.0/32 Input label database, 128.49.106.2:0--128.49.106.1:0 32 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.1:0 56 128.49.106.0/32 Input label database, 128.49.106.2:0--128.49.106.3:0 34 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.3:0 56 128.49.106.0/32 Input label database, 128.49.106.2:0--128.49.106.8:0 51 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.8:0 56 128.49.106.0/32 Input label database, 128.49.106.2:0--128.49.106.9:0 47 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.9:0 56 128.49.106.0/32
Forwarding plane: p2.nyc pops label 56 as its performing PHP to pe2.nyc
user@p2.nyc> show route label 56 mpls.0: 40 destinations, 40 routes (40 active, 0 holddown, 0 hidden) + = Active Route, - = Last Active, * = Both 56 *[LDP/9] 00:04:53, metric 1, tag 1111 > to 192.0.2.10 via ge-0/0/3.0, Pop 56(S=0) *[LDP/9] 00:04:53, metric 1, tag 1111 > to 192.0.2.10 via ge-0/0/3.0, Pop
Connectivity verification: traffic from Washington to pe2.nyc uses label 56
user@ce1.iad> traceroute wait 1 198.51.100.2 routing-instance svc-inet traceroute to 198.51.100.2 (198.51.100.2), 30 hops max, 40 byte packets 1 pe1.iad-ge-0-0-11.0 (198.51.100.55) 38.571 ms 2.340 ms 1.981 ms 2 p1.iad-ge-0-0-2.0 (192.0.2.37) 7.628 ms p2.iad-ge-0-0-2.0 (192.0.2.39) 9.786 ms p1.iad-ge-0-0-2.0 (192.0.2.37) 6.620 ms MPLS Label=47 CoS=0 TTL=1 S=1 3 p1.ewr-ge-0-0-3.0 (192.0.2.27) 99.309 ms 93.491 ms 94.582 ms MPLS Label=31 CoS=0 TTL=1 S=0 MPLS Label=34 CoS=0 TTL=1 S=1 4 p2.nyc-ge-0-0-4.0 (192.0.2.16) 6.603 ms 7.013 ms 7.230 ms MPLS Label=56 CoS=0 TTL=1 S=1 5 pe2.nyc-ge-0-0-2.0 (192.0.2.10) 10.157 ms pe2.nyc-ge-0-0-1.0 (192.0.2.8) 10.194 ms 9.562 ms 6 ce1.nyc-ge-0-0-2.1 (198.51.100.2) 14.194 ms 42.463 ms 7.111 ms
Enabling LDP to SR Stitching
Now let’s enable the LDP to SR stitching functionality at p2.nyc and observe the change in the advertised labels:
Control plane: p2.nyc allocates new LDP label 59, withdrawing label 56
user@p2.nyc> show ldp database | match “put|128.49.106.0” Input label database, 128.49.106.2:0--128.49.106.0:0 3 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.0:0 59 128.49.106.0/32 Input label database, 128.49.106.2:0--128.49.106.1:0 32 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.1:0 59 128.49.106.0/32 Input label database, 128.49.106.2:0--128.49.106.3:0 34 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.3:0 59 128.49.106.0/32 Input label database, 128.49.106.2:0--128.49.106.8:0 54 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.8:0 59 128.49.106.0/32 Input label database, 128.49.106.2:0--128.49.106.9:0 47 128.49.106.0/32 Output label database, 128.49.106.2:0--128.49.106.9:0 59 128.49.106.0/32
Control plane: LDP-to-SR stitching functionality is enabled
user@p2.nyc> show ldp overview Instance: master Reference count: 6 Router ID: 128.49.106.2 LDP inet: enabled ... LDP SR Mapping Client: enabled ...
Forwarding plane: label 59 is swapped with 0 as pe2.nyc requests SR UHP treatment
user@p2.nyc> show route label 59 mpls.0: 40 destinations, 40 routes (40 active, 0 holddown, 0 hidden) + = Active Route, - = Last Active, * = Both 59 *[LDP/9] 00:01:20, metric 1, tag 1111 > to 192.0.2.10 via ge-0/0/3.0, Swap 0 59(S=0) *[LDP/9] 00:01:20, metric 1, tag 1111 > to 192.0.2.10 via ge-0/0/3.0, Pop
Connectivity verification: label 59 is now in use by p2.nyc’s LDP neighbors to reach pe2.nyc
user@ce1.iad> traceroute wait 1 198.51.100.2 routing-instance svc-inet traceroute to 198.51.100.2 (198.51.100.2), 30 hops max, 40 byte packets 1 pe1.iad-ge-0-0-11.0 (198.51.100.55) 55.677 ms 96.798 ms 83.824 ms 2 p1.iad-ge-0-0-2.0 (192.0.2.37) 91.130 ms 81.951 ms p2.iad-ge-0-0-2.0 (192.0.2.39) 42.568 ms MPLS Label=54 CoS=0 TTL=1 S=1 3 p1.ewr-ge-0-0-3.0 (192.0.2.27) 132.320 ms p2.ewr-ge-0-0-3.0 (192.0.2.29) 7.874 ms 6.459 ms MPLS Label=16 CoS=0 TTL=1 S=0 MPLS Label=59 CoS=0 TTL=1 S=1 4 p2.nyc-ge-0-0-4.0 (192.0.2.16) 7.673 ms p1.nyc-ge-0-0-4.0 (192.0.2.14) 123.033 ms p2.nyc-ge-0-0-4.0 (192.0.2.16) 7.327 ms MPLS Label=59 CoS=0 TTL=1 S=1 5 pe2.nyc-ge-0-0-1.0 (192.0.2.8) 521.158 ms pe2.nyc-ge-0-0-2.0 (192.0.2.10) 64.296 ms pe2.nyc-ge-0-0-1.0 (192.0.2.8) 183.317 ms 6 ce1.nyc-ge-0-0-2.1 (198.51.100.2) 9.445 ms 10.869 ms 11.214 ms
You can enable the LDP SR mapping client – what’s been referred to as LDP-to-SR stitching – as well as on p1.nyc to ensure there are multiple paths into New York. Then you are free to turn LDP off on both pe1.nyc and pe2.nyc. They will no longer advertise an LDP label for themselves, but p1.nyc and p2.nyc will continue to do so as shown in Figure 2.
You can persist for as long as needed in this state. This can be a legitimate mode of operation if the routers outside New York are not going to be SR-capable for the foreseeable future. Inadequate hardware or software capabilities in one part of the network should not prevent an SR deployment. Well planned and executed though the change needs to be, it does not need to be a big bang event that requires all platforms simultaneously converting to segment routing.
SR and RSVP-TE
It was briefly touched upon that a full mesh of LDP-tunneling RSVP-TE LSPs exists between the New York and the Washington P routers. At the end of the previous section, LDP was eliminated from pe1.nyc and pe2.nyc. However, LDP remained in use at p1.nyc, p2.nyc, and their Washington counterparts.
Interworking: SR over RSVP-TE
Let’s plow forward by mirroring the changes we have wrought in New York:
Enable SR on all Washington routers.
Prefer SR to LDP at the Washington PEs.
Once all the ingress routers will have switched over to SR, we can then finally eradicate LDP. That will include the SRMS and LDP SR mapping client configured on p1.nyc and p2.nyc, as illustrated in Figure 3.
The importance of prefix SID planning cannot be overstated. The New York routers learn these via the pair of SRMS. Native node SID configuration should retain identical values. In fact, SRGB sizing, offset, and SID planning are fundamental to the success of any segment routing deployment. Do not rush into a live deployment without careful consideration of use cases, as well as the ability of extant equipment to use identical, and consistently-sized, SRGB.
You also need to be careful about the order in which you enable SR functionality in Washington. If it is first enabled on the P routers, connectivity will break. This is because p1.nyc and p2.nyc will determine their Washington neighbors are suddenly SR-capable. They will stop swapping the SRMS label, which they advertise for pe1.nyc and pe2.nyc to push, with the LDP label learned across the RSVP-TE LSP. The SR-to-LDP stitching functionality will be deemed unnecessary.
Instead, SR is first enabled on all the Washington PEs:
Let’s check to ensure there is no disruption.
Connectivity verification: traffic forwarding within New York is unchanged
user@ce1.nyc> traceroute wait 1 198.51.100.0 routing-instance svc-inet traceroute to 198.51.100.0 (198.51.100.0), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 4.094 ms 2.933 ms 2.475 ms 2 p2.nyc-ge-0-0-3.0 (192.0.2.11) 6.052 ms 5.527 ms p1.nyc-ge-0-0-3.0 (192.0.2.9) 5.449 ms MPLS Label=1001 CoS=0 TTL=1 S=1 3 pe1.nyc-ge-0-0-2.0 (192.0.2.6) 9.073 ms 5.860 ms 4.729 ms 4 ce1.nyc-ge-0-0-1.1 (198.51.100.0) 6.433 ms 7.107 ms 5.952 ms
Connectivity verification: traffic forwarding cross-country is unchanged
user@ce1.nyc> traceroute wait 1 198.51.100.54 routing-instance svc-inet traceroute to 198.51.100.54 (198.51.100.54), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 181.515 ms 211.310 ms 123.021 ms 2 p2.nyc-ge-0-0-3.0 (192.0.2.11) 7.206 ms p1.nyc-ge-0-0-3.0 (192.0.2.9) 6.855 ms p2.nyc-ge-0-0-3.0 (192.0.2.11) 6.535 ms MPLS Label=1011 CoS=0 TTL=1 S=1 3 p2.phl-ge-0-0-2.0 (192.0.2.23) 6.944 ms p1.ewr-ge-0-0-2.0 (192.0.2.15) 7.656 ms 10.006 ms MPLS Label=45 CoS=0 TTL=1 S=0 MPLS Label=17 CoS=0 TTL=1 S=1 4 p2.iad-ge-0-0-5.0 (192.0.2.32) 7.512 ms p1.iad-ge-0-0-6.0 (192.0.2.26) 12.778 ms p2.iad-ge-0-0-5.0 (192.0.2.32) 6.871 ms MPLS Label=22 CoS=0 TTL=1 S=1 5 pe1.iad-ge-0-0-1.0 (192.0.2.36) 6.567 ms pe1.iad-ge-0-0-2.0 (192.0.2.38) 6.934 ms pe1.iad-ge-0-0-1.0 (192.0.2.36) 6.115 ms 6 ce1.iad-ge-0-0-4.0 (198.51.100.54) 8.500 ms 7.762 ms 7.320 ms
No change is good (in this case). Washington is a separate Level 1 area, and the L1/L2 P routers are not yet SR-enabled. SR-to-LDP stitching continues unabated. The moment SR is enabled on p1.iad and p2.iad, the need for stitching will become moot.
This may seem gratuitous, but it's worth thinking about how the need for stitching is determined. If a neighbor doesn’t advertise SR capability, p1.nyc and p2.nyc will merrily swap an SR label for LDP. In our network, they will additionally push an RSVP-TE LSP label atop.
If p1.iad (or p2.iad) is then configured with a node SID, both p1.nyc and p2.nyc will see its refreshed IS-IS Level 2 LSP. Correctly, they will no longer consider it for SR-to-LDP stitching as it is evident that p1.iad (or p2.iad) will be able to forward SR-native labels.
This is exactly how to proceed, with one wrinkle. We’ll cost out the Washington P router which is not being enabled for SR first. This will ensure that all cross-region traffic will be segment routed; not only that, it will be carried over RSVP-TE LSPs, demonstrating SR over RSVP-TE.
Arbitrarily, let’s enable SR on p2.iad and overload p1.iad:
This forces all Washington-bound traffic towards p2.iad.
Connectivity verification: traffic forwarding to Washington is no longer multi-pathed
user@ce1.nyc> traceroute wait 1 198.51.100.54 routing-instance svc-inet traceroute to 198.51.100.54 (198.51.100.54), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 24.153 ms 42.585 ms 85.331 ms 2 p2.nyc-ge-0-0-3.0 (192.0.2.11) 7.771 ms 7.338 ms 7.904 ms MPLS Label=1011 CoS=0 TTL=1 S=1 3 p2.phl-ge-0-0-2.0 (192.0.2.23) 6.619 ms 8.563 ms 6.279 ms MPLS Label=41 CoS=0 TTL=1 S=0 MPLS Label=22 CoS=0 TTL=1 S=1 4 p2.iad-ge-0-0-5.0 (192.0.2.32) 45.022 ms 66.523 ms 67.692 ms MPLS Label=22 CoS=0 TTL=1 S=1 5 pe1.iad-ge-0-0-2.0 (192.0.2.38) 9.016 ms 7.956 ms 5.477 ms 6 ce1.iad-ge-0-0-4.0 (198.51.100.54) 63.975 ms 67.474 ms 65.506 ms
Connectivity verification: traffic forwarding within New York remains unaffected
user@ce1.nyc> traceroute wait 1 198.51.100.0 routing-instance svc-inet traceroute to 198.51.100.0 (198.51.100.0), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 2.712 ms 2.005 ms 2.196 ms 2 p2.nyc-ge-0-0-3.0 (192.0.2.11) 5.053 ms p1.nyc-ge-0-0-3.0 (192.0.2.9) 5.036 ms p2.nyc-ge-0-0-3.0 (192.0.2.11) 6.494 ms MPLS Label=1001 CoS=0 TTL=1 S=1 3 pe1.nyc-ge-0-0-1.0 (192.0.2.4) 4.992 ms pe1.nyc-ge-0-0-2.0 (192.0.2.6) 5.301 ms pe1.nyc-ge-0-0-1.0 (192.0.2.4) 5.028 ms 4 ce1.nyc-ge-0-0-1.1 (198.51.100.0) 6.404 ms 10.624 ms 5.638 ms
Forwarding plane: both New York P routers swap the inbound SR label for LDP-o-RSVP
user@p1.nyc> show route label 1011 extensive mpls.0: 44 destinations, 44 routes (44 active, 0 holddown, 0 hidden) ... 1011 /52 -> {list:Swap 22, Push 48(top), Swap 22, Push 43, Push 91(top)} user@p2.nyc> show route label 1011 extensive mpls.0: 48 destinations, 48 routes (48 active, 0 holddown, 0 hidden) ... 1011 /52 -> {list:Swap 22, Push 41(top), Swap 22, Push 39(top)}
This output shows both routers swap 1011 for 22, the LDP label advertised by p2.iad over the LDP session tunneled over RSVP-TE. The RSVP-TE LSP label is then pushed. There are two unequal cost paths, the latter representing the RSVP-TE bypass.
SR is then enabled on p2.iad:
The NY P routers continue to avoid p1.iad. And rather than swapping 1011 for an LDP label, they simply swap for the next SR instruction. That happens to be idempotent, a continue action, resulting in 1011 being swapped for 1011.
Forwarding plane: both New York P routers now swap the inbound SR label for another SR label
user@p1.nyc> show route label 1011 extensive mpls.0: 43 destinations, 43 routes (43 active, 0 holddown, 0 hidden) ... 1011 /52 -> {list:Swap 1011, Push 48(top), Swap 1011, Push 43, Push 91(top)} user@p2.nyc> show route label 1011 extensive mpls.0: 46 destinations, 46 routes (46 active, 0 holddown, 0 hidden) ... 1011 /52 -> {list:Swap 1011, Push 41(top), Swap 1011, Push 39(top)}
Note that the RSVP-TE LSP pushed labels don’t change. It is the bottom label that is no longer stitching between SR and LDP. Let’s verify that connectivity remains, and that SR is now correctly carried across the RSVP-TE core.
Connectivity verification: no change within New York as expected
user@ce1.nyc> traceroute wait 1 198.51.100.0 routing-instance svc-inet traceroute to 198.51.100.0 (198.51.100.0), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 3.562 ms 2.242 ms 2.280 ms 2 p1.nyc-ge-0-0-3.0 (192.0.2.9) 5.827 ms p2.nyc-ge-0-0-3.0 (192.0.2.11) 5.176 ms p1.nyc-ge-0-0-3.0 (192.0.2.9) 4.909 ms MPLS Label=1001 CoS=0 TTL=1 S=1 3 pe1.nyc-ge-0-0-1.0 (192.0.2.4) 5.346 ms 4.597 ms 6.729 ms 4 ce1.nyc-ge-0-0-1.1 (198.51.100.0) 6.242 ms 7.008 ms 6.032 ms
Connectivity verification: outside New York, SR labels are carried all the way until pe1.iad, not just the L1/L2
user@ce1.nyc> traceroute wait 1 198.51.100.54 routing-instance svc-inet traceroute to 198.51.100.54 (198.51.100.54), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 2.707 ms 2.235 ms 1.859 ms 2 p2.nyc-ge-0-0-3.0 (192.0.2.11) 7.356 ms 6.876 ms 6.823 ms MPLS Label=1011 CoS=0 TTL=1 S=1 3 p2.phl-ge-0-0-2.0 (192.0.2.23) 79.374 ms 99.533 ms 100.238 ms MPLS Label=41 CoS=0 TTL=1 S=0 MPLS Label=1011 CoS=0 TTL=1 S=1 4 p2.iad-ge-0-0-5.0 (192.0.2.32) 8.332 ms 5.620 ms 6.191 ms MPLS Label=1011 CoS=0 TTL=1 S=1 5 pe1.iad-ge-0-0-2.0 (192.0.2.38) 6.701 ms 6.149 ms 5.633 ms 6 ce1.iad-ge-0-0-4.0 (198.51.100.54) 50.015 ms 80.652 ms 16.288 ms
This is geographically farther than we have gotten so far with SR. Until now, the complex interplay of stitching and LDP tunneling has allowed SR traffic to be carried across the RSVP-TE LSPs without additional explicit configuration.
As we have learned, making our intent explicit is both a noble goal and an operational best practice. Platform defaults age poorly, dating themselves. The less reliant an operator is on orthodox behavior – once desirable, later antiquated – the more robust the network. Explicating is both self-documenting and a way of enforcing consistency across platforms, software releases, and the people responsible for them.
Let’s rectify this by configuring SR to be explicitly
carried over RSVP-TE LSPs. This should cause no immediate change,
but once LDP is removed, will ensure we maintain connectivity. On
both P routers in New York and Washington, let’s enable SR shortcuts
:
Connectivity verification: no change in either traffic flow
user@ce1.nyc> traceroute wait 1 198.51.100.54 routing-instance svc-inet traceroute to 198.51.100.54 (198.51.100.54), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 200.634 ms 131.682 ms 348.936 ms 2 p2.nyc-ge-0-0-3.0 (192.0.2.11) 44.074 ms 61.865 ms 89.987 ms MPLS Label=1011 CoS=0 TTL=1 S=1 3 p2.ewr-ge-0-0-2.0 (192.0.2.17) 69.335 ms 84.690 ms 85.139 ms MPLS Label=74 CoS=0 TTL=1 S=0 MPLS Label=1011 CoS=0 TTL=1 S=1 4 p2.iad-ge-0-0-6.0 (192.0.2.28) 7.209 ms 8.750 ms 7.218 ms MPLS Label=1011 CoS=0 TTL=1 S=1 5 pe1.iad-ge-0-0-2.0 (192.0.2.38) 6.394 ms 6.114 ms 5.724 ms 6 ce1.iad-ge-0-0-4.0 (198.51.100.54) 7.612 ms 7.749 ms 7.764 ms user@ce1.nyc> traceroute wait 1 198.51.100.0 routing-instance svc-inet traceroute to 198.51.100.0 (198.51.100.0), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 2.493 ms 1.882 ms 2.250 ms 2 p2.nyc-ge-0-0-3.0 (192.0.2.11) 4.085 ms p1.nyc-ge-0-0-3.0 (192.0.2.9) 7.766 ms 5.468 ms MPLS Label=1001 CoS=0 TTL=1 S=1 3 pe1.nyc-ge-0-0-2.0 (192.0.2.6) 6.799 ms 4.908 ms pe1.nyc-ge-0-0-1.0 (192.0.2.4) 4.566 ms 4 ce1.nyc-ge-0-0-1.1 (198.51.100.0) 6.375 ms 5.636 ms 6.442 ms
All that remains now is to remove LDP (including SRMS and stitching in both directions on the New York P routers, as well as LDP tunneling on the RSVP-TE LSPs), enable SR on p1.iad, and cost it back into operation.
These steps have been previously discussed, dissected, and detailed. They don’t need to be belabored. When completed, it will have made good on the promise to have replaced LDP with SR. Throughout we have seen how well the segment routing architecture fits in with the MPLS label distribution protocols.
Connectivity verification: complete reachability intra- and inter-region using SR
user@ce1.nyc> traceroute wait 1 198.51.100.0 routing-instance svc-inet traceroute to 198.51.100.0 (198.51.100.0), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 2.517 ms 2.329 ms 2.028 ms 2 p1.nyc-ge-0-0-3.0 (192.0.2.9) 5.231 ms p2.nyc-ge-0-0-3.0 (192.0.2.11) 39.534 ms 62.640 ms MPLS Label=1001 CoS=0 TTL=1 S=1 3 pe1.nyc-ge-0-0-2.0 (192.0.2.6) 7.699 ms pe1.nyc-ge-0-0-1.0 (192.0.2.4) 6.293 ms pe1.nyc-ge-0-0-2.0 (192.0.2.6) 4.781 ms 4 ce1.nyc-ge-0-0-1.1 (198.51.100.0) 7.089 ms 6.082 ms 6.172 ms user@ce1.nyc> traceroute wait 1 198.51.100.54 routing-instance svc-inet traceroute to 198.51.100.54 (198.51.100.54), 30 hops max, 40 byte packets 1 pe2.nyc-ge-0-0-10.1 (198.51.100.3) 102.157 ms 209.904 ms 134.732 ms 2 p2.nyc-ge-0-0-3.0 (192.0.2.11) 8.376 ms 7.249 ms 7.538 ms MPLS Label=1011 CoS=0 TTL=1 S=1 3 p2.ewr-ge-0-0-2.0 (192.0.2.17) 9.670 ms 7.303 ms 7.385 ms MPLS Label=74 CoS=0 TTL=1 S=0 MPLS Label=1011 CoS=0 TTL=1 S=1 4 p2.iad-ge-0-0-6.0 (192.0.2.28) 7.269 ms p1.iad-ge-0-0-6.0 (192.0.2.26) 8.827 ms p2.iad-ge-0-0-6.0 (192.0.2.28) 11.046 ms MPLS Label=1011 CoS=0 TTL=1 S=1 5 pe1.iad-ge-0-0-1.0 (192.0.2.36) 7.451 ms 6.142 ms pe1.iad-ge-0-0-2.0 (192.0.2.38) 6.463 ms 6 ce1.iad-ge-0-0-4.0 (198.51.100.54) 65.857 ms 68.233 ms 63.493 ms user@ce1.iad> traceroute routing-instance svc-inet wait 1 198.51.100.0 traceroute to 198.51.100.0 (198.51.100.0), 30 hops max, 40 byte packets 1 pe1.iad-ge-0-0-11.0 (198.51.100.55) 55.924 ms 64.637 ms 70.194 ms 2 p1.iad-ge-0-0-2.0 (192.0.2.37) 110.243 ms 61.692 ms 57.564 ms MPLS Label=1001 CoS=0 TTL=1 S=1 3 p1.phl-ge-0-0-3.0 (192.0.2.31) 60.380 ms p2.ewr-ge-0-0-3.0 (192.0.2.29) 35.849 ms 7.796 ms MPLS Label=75 CoS=0 TTL=1 S=0 MPLS Label=1001 CoS=0 TTL=1 S=1 4 p1.nyc-ge-0-0-5.0 (192.0.2.20) 9.929 ms 6.796 ms 7.237 ms MPLS Label=1001 CoS=0 TTL=1 S=1 5 pe1.nyc-ge-0-0-2.0 (192.0.2.6) 6.479 ms pe1.nyc-ge-0-0-1.0 (192.0.2.4) 6.491 ms 12.843 ms 6 ce1.nyc-ge-0-0-1.1 (198.51.100.0) 8.924 ms 7.647 ms 9.697 ms user@ce1.iad> traceroute routing-instance svc-inet wait 1 198.51.100.2 traceroute to 198.51.100.2 (198.51.100.2), 30 hops max, 40 byte packets 1 pe1.iad-ge-0-0-11.0 (198.51.100.55) 2.657 ms 2.126 ms 2.304 ms 2 p1.iad-ge-0-0-2.0 (192.0.2.37) 6.891 ms 6.503 ms p2.iad-ge-0-0-2.0 (192.0.2.39) 7.041 ms MPLS Label=1000 CoS=0 TTL=1 S=1 3 p2.ewr-ge-0-0-3.0 (192.0.2.29) 49.937 ms 60.684 ms p1.phl-ge-0-0-3.0 (192.0.2.31) 82.859 ms MPLS Label=73 CoS=0 TTL=1 S=0 MPLS Label=1000 CoS=0 TTL=1 S=1 4 p2.nyc-ge-0-0-4.0 (192.0.2.16) 7.331 ms p1.nyc-ge-0-0-5.0 (192.0.2.20) 85.649 ms p2.nyc-ge-0-0-4.0 (192.0.2.16) 7.287 ms MPLS Label=1000 CoS=0 TTL=1 S=1 5 pe2.nyc-ge-0-0-1.0 (192.0.2.8) 84.588 ms 43.379 ms 7.002 ms 6 ce1.nyc-ge-0-0-2.1 (198.51.100.2) 8.398 ms 7.586 ms 8.608 ms
With SR successfully transported over the traffic engineered core, let’s make note of the fact that, once again, this could remain a perpetual mode of operation. The likelihood of multi-pathing is greater in-region, where SR has been deployed. Where multipathing is poorer – such as across regions – RSVP-TE is exercised. The two can coexist perennially.
For our purposes, though, nothing holds us back from enabling SR in the entire L2 subdomain. The configuration is similar to previous examples and does not need to be repeated. Since our objective to demonstrate interworking has been met, let’s also decommission the RSVP-TE LSPs between p1 and p2 in New York and Washington so that all traffic within and between is entirely transported using SR.
Coexistence: SR-aware Bandwidth Reservation with RSVP-TE
RSVP-TE has a staggering feature set, accumulated over decades
of real-world operational experience. One particularly well-known
and widely-used feature is auto-bandwidth
: resizing and potentially rerouting LSPs to match the offered traffic
load. Auto-bandwidth can appear almost magical, with next to no manual
input, save the initial configuration parameters, and an optimal path
can be cleaved.
Grossly, auto-bandwidth relies on two values: the LSP reservation size, and the available bandwidth on a given interface in the network. The first is derived by the LSP ingress router using periodic rate measurement, smoothed over an interval. The latter is disseminated throughout the IGP area by all routers as bandwidth is consumed or released. The two form a feedback loop that drives the distributed auto-bandwidth calculation.
Implicit in this Swiss clockwork affair is the assumption that RSVP-TE is exclusively used to transport traffic. The available interface bandwidth is stored in the traffic engineering database (TED), whose utilization snapshot is populated only by RSVP-TE. If the interface also carries non-RSVP-TE traffic, one unappetizing stopgap is to cap RSVP-TE reservable bandwidth to a fraction of an interface’s capacity. This will prevent RSVP-TE LSPs from competing with other users of that bandwidth, such as non-MPLS traffic, at the expense of possible link under-utilization.
RFC8426 (https://datatracker.ietf.org/doc/rfc8426/) details this dark bandwidth problem statement, as well as other approaches to tackle it. Partitioning bandwidth is no panacea. There is no guarantee non-RSVP-TE users will limit themselves naturally to their apportioned capacity.
In our network, SR-MPLS traffic would be a consumer of bandwidth, alongside RSVP-TE. In order for the two to coexist, at least until a complete migration occurs, SR traffic utilization must be reflected in the TED’s maximum reservable interface bandwidth. Thus RSVP-TE LSPs are indirectly made aware that they may not have dedicated access to capacity, without needing to statically partition bandwidth.
While we have cost out the RSVP-TE LSPs encountered so far, there remain RSVP-TE LSPs that carry traffic between non-SR-capable devices outside Washington and New York. For brevity’s sake, let’s not focus too deeply on these routers or LSPs – indeed, they aren’t even represented in the diagram. Instead, let’s simply ensure the TEDS for all routers portray an accurate view of the network’s available bandwidth.
While there are no more RSVP-TE LSPs originating and terminating between New York and Washington, others continue to persist between other regions. Some of the same interfaces used by RSVP-TE LSPs also switch SR traffic, now natively. So let’s configure p1 and p2 in New York and Washington to measure SR traffic and deduct that from the remaining interface bandwidth (available to RSVP-TE):
The collection-interval
represents
how frequently statistics are culled. The adjust
interval is the duration of the window across which counter samples
are collected. The average of those samples – in our case that
would be three samples – represents the current SR traffic load
per-link. If this load exceeds the adjust threshold, which is the
delta between the current and previously computed average, the IGP
floods the maximum reservable bandwidth, which refreshes the TED.
The result may involve RSVP-TE LSPs being preempted, rerouted, as well as the IGP flooding the revised maximum reservable bandwidth for the TE link. With no SR traffic in flight, let’s see how much bandwidth is available for reservation on the link between p1.nyc to p1.phl.
Control plane: 3Kbps in use, 97Kbps available to RSVP-TE
user@p1.nyc> show rsvp interface ge-0/0/5.0 Active Subscr- Static Available Reserved Interface State resv iption BW BW BW ge-0/0/5.0 Up 9 100% 100kbps 97kbps 3kbps
Now let’s crank up traffic that is segment routed and observe the reduction in reported available bandwidth.
Control plane: ~67Kbps used by SR
user@p1.nyc> show auto-bandwidth traffic detail ge-0/0/5 Name: ge-0/0/5.0 ` Collection Interval: 10, Adjust Interval: 30, Adjust Threshold: 1% Adjust Subscription: 100% Pkt Recv: 154.098k, Byte Recv: 13.5604M, Query Count: 238, Average: 66.926kbps Last Base Bytes: 83.658k, Last Report Time: Tue Jan 15 17:50:44 Last Query Time: Tue Jan 15 17:50:44 Last Resp Time: Tue Jan 15 17:50:44 Byte Bucket(Chronological order, first entry is latest): 96.536k 109.032k 45.408k Packet Bucket(Chronological order, first entry is latest): 1.097k 1.239k 516
Control plane: 31Kbps available to RSVP-TE, reduced by ~67Kbps
user@p1.nyc> show rsvp interface ge-0/0/5.0 Active Subscr- Static Available Reserved Interface State resv iption BW BW BW ge-0/0/5.0 Up 9 100% 100kbps 31kbps 3kbps
Control plane: LSDB reflects updated bandwidth
user@p1.nyc> show isis database p1.nyc extensive level 2 IS-IS level 2 link-state database: ... IS extended neighbor: p1.phl.00, Metric: default 10 SubTLV len: 81 IP address: 192.0.2.20 Neighbor’s IP address: 192.0.2.21 Local interface index: 337, Remote interface index: 334 Current reservable bandwidth: Priority 0 : 31kbps Priority 1 : 31kbps Priority 2 : 31kbps Priority 3 : 31kbps Priority 4 : 31kbps Priority 5 : 31kbps Priority 6 : 31kbps Priority 7 : 31kbps Maximum reservable bandwidth: 34kbps Maximum bandwidth: 100kbps ...
This reflection works in reverse, too. Let’s staunch the SR traffic and observe how the available bandwidth once again increases.
Control plane: SR utilization drops to ~40Kbps
user@p1.nyc> show auto-bandwidth traffic detail ge-0/0/5 Name: ge-0/0/5.0 Collection Interval: 10, Adjust Interval: 30, Adjust Threshold: 1% Adjust Subscription: 100% Pkt Recv: 235.006k, Byte Recv: 20.6803M, Query Count: 317, Average: 39.846kbps Last Base Bytes: 54.325k, Last Report Time: Tue Jan 15 18:03:44 Last Query Time: Tue Jan 15 18:03:54 Last Resp Time: Tue Jan 15 18:03:54 Byte Bucket(Chronological order, first entry is latest): 0 63.184k 86.24k Packet Bucket(Chronological order, first entry is latest): 0 718 980
Control plane: RSVP-TE now indicates 54Kbps available
user@p1.nyc> show rsvp interface ge-0/0/5.0 Active Subscr- Static Available Reserved Interface State resv iption BW BW BW ge-0/0/5.0 Up 9 100% 100kbps 54kbps 3kbps
Control plane: The LSDB values are flooded to neighbors & match RSVP-TE’s reported values
user@p1.nyc> show isis database p1.nyc extensive level 2 IS-IS level 2 link-state database: ... IS extended neighbor: p1.phl.00, Metric: default 10 SubTLV len: 81 IP address: 192.0.2.20 Neighbor’s IP address: 192.0.2.21 Local interface index: 337, Remote interface index: 334 Current reservable bandwidth: Priority 0 : 54kbps Priority 1 : 54kbps Priority 2 : 54kbps Priority 3 : 54kbps Priority 4 : 54kbps Priority 5 : 54kbps Priority 6 : 54kbps Priority 7 : 54kbps Maximum reservable bandwidth: 57kbps Maximum bandwidth: 100kbps ...
This graceful coexistence of SR and RSVP-TE anticipates operator needs. For those with demanding deployments, RSVP-TE is unlikely to be quickly displaced. Ensuring the available bandwidth reflects more than one consumer allows accurate reservations.
SR and IPv6
Whatever your opinion on IPv6, you’re probably right. If you fail to see a strong driver, one likely doesn’t exist for the business you happen to be in; if you worship at the altar of Happy Eyeballs, it’s no less than an imperative and technological differentiator.
Happy Eyeballs is a set of algorithms defined by the IETF. It aims for a superior user experience when using dual-stack hosts. IPv6 is preferred, where available. Check https://tools.ietf.org/html/rfc8305 for details.
IPv6 support for existing MPLS label distribution protocols varies: LDPv6 extends the base protocol; RSVP, up until this date, doesn’t offer a generally available implementation; BGP remains address family agnostic, but of course it isn’t an IGP.
The good news is that segment routing treats IPv6 reachability as a first-class citizen. SR won’t convince you to deploy IPv6 anew, but then, it also won’t become a barrier to entry. Node, adjacency, and anycast SIDs (of which we’ve used the first two so far), come in both IPv4 and IPv6 flavors.
Segment routing support for IPv6 is most commonly understood to mean associating SIDs with IPv6 prefixes. The data plane remains MPLS-switched. In contrast, SRv6 is a radical reimagining of native IPv6 forwarding. It does not make use of MPLS at all. SRv6 may be a topic for a separate book.
Anyway, the configuration is nearly identical so let’s dive right in and create an IPv6 MPLS-encapsulated underlay. The syntax is shown for pe1.nyc. The SID allocation is detailed in Table 2 to avoid repeating the mostly identical configuration, and Figure 4 illustrates the setup.
Table 2: IPv6 Node SID Allocation
Router | IPv6 lo0 address | IPv6 node SID (index + SRGB) |
---|---|---|
pe1.nyc | 2001:db8::128:49:106:1 | 1061 (61 + 1000) |
pe2.nyc | 2001:db8::128:49:106:0 | 1060 (60 + 1000) |
p1.nyc | 2001:db8::128:49:106:3 | 1063 (63 + 1000) |
p2.nyc | 2001:db8::128:49:106:2 | 1062 (62 + 1000) |
pe1.iad | 2001:db8::128:49:106:11 | 1071 (71 + 1000) |
pe2.iad | 2001:db8::128:49:106:10 | 1070 (70 + 1000) |
pe3.iad | 2001:db8::128:49:106:13 | 1073 (73 + 1000) |
p1.iad | 2001:db8::128:49:106:9 | 1069 (69 + 1000) |
p2.iad | 2001:db8::128:49:106:8 | 1068 (68 + 1000) |
As soon as this configuration becomes effective, each router additionally starts to advertise an IPv6 node index (alongside the IPv4 node index), as well as IPv6 adjacency SIDs.
Control plane: Additional IPv6 node index and adjacency SIDs
user@pe1.nyc> show isis database pe1.nyc extensive IS-IS level 1 link-state database: pe1.nyc.00-00 Sequence: 0x2a7, Checksum: 0x6dc8, Lifetime: 803 secs IPV4 Index: 1, IPV6 Index: 61 Node Segment Blocks Advertised: Start Index : 0, Size : 128, Label-Range: [ 1000, 1127 ] IS neighbor: p2.nyc.00 Metric: 10 Two-way fragment: p2.nyc.00-00, Two-way first fragment: p2.nyc.00-00 P2P IPv4 Adj-SID: 17, Weight: 0, Flags: --VL-- P2P IPv6 Adj-SID: 36, Weight: 0, Flags: F-VL-- IS neighbor: p1.nyc.00 Metric: 10 Two-way fragment: p1.nyc.00-00, Two-way first fragment: p1.nyc.00-00 P2P IPv4 Adj-SID: 25, Weight: 0, Flags: --VL-- P2P IPv6 Adj-SID: 35, Weight: 0, Flags: F-VL-- IP prefix: 128.49.106.1/32 Metric: 0 Internal Up V6 prefix: 2001:db8::128:49:106:1/128 Metric: 0 Internal Up
The ‘F’ flag in the adjacency SID indicates an IPv6-capable adjacency. Unsurprisingly, you’ll see new entries in the recently populated inet6.3 routing table.
Control plane: FECs in inet6.3 use native addresses, not mapped IPv4 addresses
user@pe1.nyc> show route table inet6.3 inet6.3: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden) + = Active Route, - = Last Active, * = Both 2001:db8::128:49:106:0/128 *[L-ISIS/8] 00:06:40, metric 20 > to fe80::5668:a3ff:fe1e:4af5 via ge-0/0/1.0, Push 1060 to fe80::5668:a3ff:fe1e:4ab6 via ge-0/0/2.0, Push 1060 2001:db8::128:49:106:2/128 *[L-ISIS/8] 00:06:40, metric 10 > to fe80::5668:a3ff:fe1e:4ab6 via ge-0/0/2.0 2001:db8::128:49:106:3/128 *[L-ISIS/8] 00:06:40, metric 10 > to fe80::5668:a3ff:fe1e:4af5 via ge-0/0/1.0 2001:db8::128:49:106:8/128 *[L-ISIS/8] 00:06:40, metric 30 > to fe80::5668:a3ff:fe1e:4ab6 via ge-0/0/2.0, Push 1068 2001:db8::128:49:106:9/128 *[L-ISIS/8] 00:06:40, metric 30 > to fe80::5668:a3ff:fe1e:4af5 via ge-0/0/1.0, Push 1069 2001:db8::128:49:106:10/128 *[L-ISIS/8] 00:06:40, metric 40 > to fe80::5668:a3ff:fe1e:4af5 via ge-0/0/1.0, Push 1070 to fe80::5668:a3ff:fe1e:4ab6 via ge-0/0/2.0, Push 1070 2001:db8::128:49:106:11/128 *[L-ISIS/8] 00:06:40, metric 40 > to fe80::5668:a3ff:fe1e:4af5 via ge-0/0/1.0, Push 1071 to fe80::5668:a3ff:fe1e:4ab6 via ge-0/0/2.0, Push 1071 2001:db8::128:49:106:13/128 *[L-ISIS/8] 00:06:40, metric 40 > to fe80::5668:a3ff:fe1e:4af5 via ge-0/0/1.0, Push 1073 to fe80::5668:a3ff:fe1e:4ab6 via ge-0/0/2.0, Push 1073
And let’s verify connectivity to these new service routes, both intra-region as well as cross-region. You can see next that IPv6 service prefixes are being carried by IPv6 transport. 6PE’s mapped IPv4 addresses and use of IPv6 explicit null can be laid to rest.
Connectivity verification: Intra-region using the new IPv6 node SIDs
user@ce1.nyc> traceroute 2001:db8::198:51:100:2 traceroute6 to 2001:db8::198:51:100:2 (2001:db8::198:51:100:2) from 2001:db8::198:51:100:0, 64 hops max, 12 byte packets 1 2001:db8::198:51:100:1 (2001:db8::198:51:100:1) 33.108 ms 11.671 ms 3.032 ms 2 p1.nyc-lo0.0 (2001:db8::128:49:106:3) 60.297 ms 73.248 ms p2.nyc-lo0.0 (2001:db8::128:49:106:2) 6.085 ms MPLS Label=1060 CoS=0 TTL=1 S=1 3 pe2.nyc-lo0.0 (2001:db8::128:49:106:0) 8.127 ms 5.868 ms 6.411 ms 4 2001:db8::198:51:100:2 (2001:db8::198:51:100:2) 7.223 ms 7.043 ms 6.865 ms
Connectivity verification: inter-region using the new IPv6 node SIDs
user@ce1.iad> traceroute 2001:db8::198:51:100:0 traceroute6 to 2001:db8::198:51:100:0 (2001:db8::198:51:100:0) from 2001:db8::198:51:100:54, 64 hops max, 12 byte packets 1 2001:db8::198:51:100:55 (2001:db8::198:51:100:55) 3.297 ms 2.640 ms 2.752 ms 2 p2.iad-lo0.0 (2001:db8::128:49:106:8) 8.982 ms p1.iad-lo0.0 (2001:db8::128:49:106:9) 8.719 ms 8.548 ms MPLS Label=1061 CoS=0 TTL=1 S=1 3 p1.ewr-lo0.0 (2001:db8::128:49:106:5) 65.592 ms p2.phl-lo0.0 (2001:db8::128:49:106:6) 8.300 ms 7.831 ms MPLS Label=1061 CoS=0 TTL=1 S=1 4 p1.nyc-lo0.0 (2001:db8::128:49:106:3) 8.081 ms p2.nyc-lo0.0 (2001:db8::128:49:106:2) 8.512 ms 8.452 ms MPLS Label=1061 CoS=0 TTL=1 S=1 5 pe1.nyc-lo0.0 (2001:db8::128:49:106:1) 8.251 ms 8.593 ms 8.803 ms 6 2001:db8::198:51:100:0 (2001:db8::198:51:100:0) 9.266 ms 8.870 ms 8.603 ms
SR and Multicast
A common myth is that multicast needs special treatment in a segment routed network. Like most myths, this stems from a lack of understanding.
Multicast forwarding is orthogonal to unicast. Existing approaches – IPv4 multicast, IPv6 multicast, MPLS multicast – remain usable even as SR is enabled. When it comes to MPLS multicast, it may indeed mean that LDP and RSVP-TE – two protocols that predate SR – are left intact for multicast forwarding.
While mLDP offers both point-to-multipoint (P2MP) and multipoint-to-multipoint (MP2MP) replication trees; RSVP-TE offers point-to-point (P2P) ingress replication, as well as P2MP network replication. Both support delivery of non-VPN and multicast VPN traffic.
SR-native approaches such as Spray and Tree-SID aspire to offer ingress and network replication, respectively. BIER is technically not segment routing but follows a similar ideal of eliminating state from transit nodes. Until these technologies mature, operators can rely on existing multicast mechanisms.