Optimizability

The previous chapters have looked at various use cases for introducing segment routing. This chapter looks specifically at the traffic engineering (TE) aspects of SR. In particular it describes:

The TE process model
The role of explicit path definition
Routing constraints and several salient properties of SR SID types
Distributed and centralized path computation
And finally, an end-to-end SR-TE solution for the sample network leveraging a centralized controller.

The TE Process Model

In the TE process model provided in Figure 1, the operator, or a suitable automaton, acts as a “controller” in an adaptive feedback control system. This system includes:

A set of interconnected network elements (IP/MPLS network)
A network state/topology monitoring element
A network performance monitoring element
A network configuration management element

The operator/automaton formulates a control policy, observes the state of the network through the monitoring system, characterizes the traffic, and applies control actions to drive the network to an optimal state. This can be done reactively, based on the current state of the network, or proactively, based on a predicted future state.

The adaptive feedback loop may implemented in each node in the network, i.e. distributed TE, or in a centralized manner, i.e. using a Path Computation Element (PCE). Furthermore, various hybrid models are possible.

Distributed Versus Centralized TE Control-loop

Explicit Routing: A Tool for TE

Explicit routing may be desired to optimize network resources or provide very strict service guarantees but it is not by itself a TE solution. The result of the TE process model, the important part of a traffic engineering solution, likely requires the programming of an explicit route to program the resulting computation. It is therefore desired to be able to define a strict path across a network for one or more LSPs. Both RSVP-TE and SR provide a means to explicitly define strict paths (strict hops, loose hops and/or abstract/anycast hops) across a network.

Explicit Routing with RSVP

Network operators request, typically through configuration, LSPs that meet specific constraints. For example, a network operator could request an LSP that originates at Node R1, terminates at Node R6, reserves 100 megabits per second, and traverses blue interfaces only. A path computation module, located on a central controller – such as the PCE – or on the ingress router, computes a path that satisfies all of the constraints. Figure 3 illustrates the resulting RSVP path setup and reserve messaging to set up the LSP and the resulting MPLS forwarding table label operations.

RSVP Signaling and Labels Allocation for an Explicit Tunnel

The following configuration example illustrates the required configuration. It’s worth noting that there are no routing constraints defined for the LSP. In other words, the only configuration shown is the definition of the explicit path. In many (or most) RSVP-TE deployments the paths are not explicitly defined but instead a routing constraint is defined, such as bandwidth or link affinity, such that the ingress node or external CSPF computes the explicit path dynamically to meet the routing constraint.

R1 configuration

Explicit Path Definition with SR

Much like RSVP-TE explicit LSPs, a network operator will request, typically through configuration, LSPs that meet specific constraints. The main difference is not only the specific configuration syntax, though they are quite similar, but the concept of describing the ‘routing constraint’ using the same CLI constructs. To provide complex TE constraints, an ingress SR node may need to rely on an external controller or PCE.

Using the same example as in the RSVP-TE example, a network operator could request an LSP that originates at Node R1, terminates at Node R6, reserves 100 megabits per second, and traverses blue interfaces only but would require an external controller/PCE to compute the path. An external PCE is required only because of the specified bandwidth constraint as SR doesn’t signal its reservations. If bandwidth is not required, a distributed path computation completed by R1 would suffice for the SR path computation. Figure 4 illustrates the resulting SR path and MPLS forwarding table label operations. Note that there is a fairly significant difference in the label forwarding operations for a SR-TE LSP compared to the SPF SR LSP and the previous RSVP-TE LSPs.

SR Signaling and Labels Allocations for an Explicit Tunnel

SR adjacency SIDs are dynamically allocated by default. Because the next configuration example illustrates using traditional CLI techniques for describing the explicit path, where each hop in the path is specified by its label/SID, it is recommended that labels are pre-planned and statically assigned so that each link has a unique label/SID, much like how IP addressing is handled. It’s worth noting that an SR-TE path can also be described, via CLI, using traditional IP addresses as the specified hops in the path and allowing the ingress router to resolve the SID and label stack.

R1: defining static adjacency SIDs

R1: defining explicit SR tunnels

As mentioned above, you may use the auto-translate option and describe the path as a series of IP hops, like an RSVP-TE path, and Junos will translate the SIDs. This has a nice advantage of removing the need to statically configure adjacency SIDs as well, ensuring a controller, who may have calculated the path, is not required to change the path in the event of a link transitioning down.

Loose Hops, Prefix-SIDs, and Anycast-SIDs

Segment Routing SIDs

A segment identifier (SID) identifies each segment. Network operators allocate SIDs using procedures that are similar to those used to allocate private IP (i.e., RFC 1918) addresses. Every SID maps to an MPLS label with SR-MPLS. SR-capable routers advertise the SIDs via the IGP. The IGP floods this data, in addition to the previously mentioned TE link attributes described above, throughout the IGP domain. Therefore, each node within the IGP domain maintains an identical copy of a link-state database (LSDB) and a traffic engineering database (TED). The following segment types are the most common and will be used in this chapter to describe several relevant use cases.

Adjacency: Adjacency segments represent an IGP adjacency between two routers. Junos allocates adjacency SIDs dynamically but they can also be statically configured. As previously mentioned this may be useful in some scenarios:

Prefix: Prefix segments represent the IGP least cost path between any router and a specified prefix. Prefix segments contain one or more router hops. A node SID is also a type of prefix SID:

Anycast: Anycast segments are like prefix segments in that they represent the IGP least cost path between any router and a specified prefix. However, the specified prefix can be advertised from multiple points in the network. Note that in the example, the CLI shows only a single node announcing the anycast-SID. In an actual deployment all nodes that are part of the anycast group would advertise the same anycast-SID.

Binding: Binding prefixes represent tunnels in the SR domain. The tunnel can be another SR-Path, an LDP-signaled LSP, an RSVP-TE signaled LSP, or any other encapsulation:

The Critical Role of Maximum SID Depth

Maximum SID depth (MSD) is a generic concept defining the number of SID’s LSR hardware and software capable of imposing on a given node. It is defined in various IETF OSPF/ISIS/BGP-LS/PCEP drafts. When SR paths are computed, it is critical that the computing entity learn the MSD that can be imposed at each node or link for a given SR path. This ensures that the SID stack depth of a computed path does not exceed the number of SIDs the node is capable of imposing.

Setting, Reporting, and Advertising MSD

When using PCEP to communicate with a PCE for LSP state reporting, control, and provisioning, the MSD is reported. The following CLI can be used to increase the reported MSD value of 5:

While this reports MSD to a controller via a control plane protocol, Junos also requires that ingress interfaces that will be imposing labels also have their label imposition increased from the default value of 3:

SID Depth Reduction

Because MSD is such a critical concern in many SR scenarios, the idea that the end-to-end path need not be completely specified is a very popular idea. In the previous examples, the example CLI specified the individual hops of the SR-TE LSP (and the RSVP-TE LSP) and can be referred to as strict. A strict hop means that the specified LSR must be directly connected to the previous hop. A loose hop, on the other hand, means that the path must pass through the specified LSR, but the LSR does not have to be directly connected to the last hop; any valid route between the two can be used. Let’s look at an example using the topology illustrated in Figure 5.

Let’s say you want an LSP to go from R1 to R6 via the lower route of R3, R4, and R5. To achieve this goal you merely need to specify R3 as the first hop in the path and then let normal routing take over from there since the path from R3 to R6 via R5 has an IGP metric of 30 while the path via R1 has an IGP metric of 40. The resulting label stack for the SR-TE LSP is only two labels deep (node-SID for R3 and node-SID for R6) instead of four as previously seen:

You can equate this, and the previous examples, to the MPLS version of a static route. Like static routes, manually configured paths are useful when you want explicit control, but also like static routes, they pose an administrative problem when you need to establish many paths and/or want the network to dynamically react to new events such as a link coming Up or going Down.

Special care must be taken when explicitly configuring loose hops for a path, RSVP or SR. Link failures can result in unexpected forwarding behavior. For example, let’s say that the link between R1 and R3 fails, as illustrated in Figure 6. Since the SR-TE LSP is a static instruction set and the node SID specified in the path is still reachable on the network, the resulting LSP’s path will be suboptimal, yet valid.

Binding SIDs represent another option for reducing the label stack depth when configuring explicit paths, as shown in Figure 7. Using the same simple example topology, an SR-TE LSP could be created from R3 to R5 and advertised with a binding-SID such that the ingress LSP from R1 to R6 referenced the binding SID in its path and the resulting label stack is only two deep.

Label
Stack Reduction Using Binding SIDs

R3 to R5 SR-TE LSP with binding-SID = 2000

And then the SR-TE LSP from R1 to R6 would look like:

Anycast SIDs represent another SID type used to create some interesting forwarding behaviors in SR networks. Since multiple nodes, comprising the anycast group, announce the same prefix SID, an ingress or transit node will forward towards the closest, from an IGP metric perspective, node announcing the prefix SID. This enables an operator to introduce load balancing and high availability scenarios that are somewhat unique to SR networks. Again, using our simple network topology shown in Figure 8, let’s assume that R4 and R5 announce the same anycast SID, 405. You can now create an ingress LSP from R1 to R6 that results in equal cost load balancing between the R1, R2, R5, R6 path and the R1, R3, R4, R5, and R6 paths as shown.

Again, while not the primary goal of using anycast SIDs, the label stack has been reduced by specifying the anycast SID as a hop in our SR-TE LSPs path:

Another way for the ingress LSR to determine the path is to dynamically calculate it. Just as OSPF and IS-IS use a Shortest Path First (SPF) algorithm to calculate a route, the ingress LSR can use a modification of the SPF algorithm called Constrained Shortest Path First (CSPF) to calculate a path. Let’s explore dynamic path calculation next.

Dynamic Path Calculation and Routing Constraints

Thus far we have looked at various ways of defining a non-shortest path or explicit path using the various types of SIDs that SR offers along with a few special considerations. But defining and managing tens, hundreds, thousands, or tens of thousands of static SR paths is unscalable. We must therefore look back to the TE process model and explore how to define ‘simple’ routing constraints that describe how SR-TE LSPs paths should be computed. In other words, the explicit paths are not TE but rather a means to enable TE. First, let’s look at information distribution or state/topology monitoring.

Link and Node Attributes and Routing Constraints

The most important requirement for TE is the dissemination of link and node characteristics. Just as SIDs are reliably flooded throughout a routing domain, link and node characteristics along with TE-oriented resource availability are also flooded throughout a TE domain using extensions to the IGPs. Enabling the use of link-state routing protocols to efficiently propagate information pertaining to resource availability in their routing updates is achieved by the addition of extensions to the link-state routing protocol. Link-state routing protocols manage not only the flooding of updates in the network upon link-state or metric change, but also bandwidth availability from a TE perspective. These resource attributes are flooded by the set of routers in the network to make them available to head end routers for use in TE tunnel LSP path computation (dynamic tunnels).

Link-state announcements carry information lists that describe a given router’s neighbors attached networks, network resource information, and other relevant information pertaining to the actual resource availability that might be later required to perform a constraint-based SPF calculation. OSPF and IS-IS have been provided with extensions to enable their use in an MPLS TE environment to propagate information pertaining to resource availability and in dynamic LSP path selection. These link and node attributes take the form of link colors, shared risk link group associations, available bandwidth, and metric types (TE or IGP) to name a few. These attributes are available in the TED to be used by the computing entity. Again, using our simple example topology shown in Figure 9, the following has been added to R5:

This results in a TE topology that looks like Figure 9 where the R5 to R2 link has been colored blue and added to the SRLG named common-254 and the link from R5 to R4 has been colored red, given a te-metric of 100 and also added to the same SRLG.

Let’s observe the contents of the TED for R5’s link to R4 and see how the link TE information has been flooded on R1, using ISIS TE extensions, so that it can be used for path computation, and specifically for, constraint inclusion or exclusion:

Distributed SR Constraint-based SPF

In the normal SPF calculation process, a router places itself at the head of the tree with the shortest paths calculated to each of the destinations, taking only the least metric, or cost route, to the destination into account. In this calculation, a key concept to note is that no consideration is given to the bandwidth of Dynamic Path Calculation and Routing Constraints the links on the other paths. If the attributes required for a given path include parameters beyond simply the IGP cost or metric such as link color, the topology can be constrained to eliminate the links that do not allow for the mentioned requirements, such that the SPF algorithm, returns a path with both link cost and link inclusion/exclusion requirements.

With CSPF, you use more than the link cost to identify the probable paths that can be used for TE LSP paths. The decision of which path is chosen to set up a TE LSP path is performed at the computing entity, after ruling out all links that do not meet a certain criteria, such as link colors, in addition to the te-cost of the link. The result of the CSPF calculation is an ordered set of SIDs that map to the next-hop addresses of routers that form the TE LSP. Therefore, multiple TE LSPs could be instantiated by using CSPF to identify probable links in the network that meet the criteria.

Constraint-based SPF can use either administrative weights or TE metrics during the constraint-based computation. In the event of a tie, the path with the highest minimum bandwidth takes precedence, followed by the least number of hops along the path. If all else is equal, CSPF picks a path at random and chooses the same to be the TE LSP path of preference.

As previously mentioned, SR paths contain a few salient attributes, mainly MSD and ECMP attributes, resulting in the need for slightly different CSPF results than those of traditional RSVP-TE paths. As a result, Junos, as a distributed CSPF (we’ll talk about external, centralized, CSPFs in a moment), has been enhanced to not only provide an ordered set of adjacency SIDs for a path, but also to minimize or meet the MSD for a given ingress router’s MSD, as well as to leverage any available ECMPs along the resulting set of candidate paths by offering node SIDs within the segment list.

The SR-TE candidate paths will be locally computed such that they satisfy the configured routing constraints. The multi path CSPF computation results will be an ordered set of adjacency SIDs when label stack compression is disabled. When label stack compression is enabled, the result would be a set of compressed label stacks (composed of Adj-SIDs and node-SIDs) that provide IP-like ECMP forwarding behavior wherever possible.

For all computation results, we use an event-driven approach to provide updated results that are consistent with the current state of the network in a timely manner. However, we need to be careful to make sure that our computations do not become overwhelmed during those periods with large numbers of network events. Therefore the algorithm has the following properties:

Very fast reaction for a single event (e.g., link failure)
Fast-paced reaction for multiple IGP events that are temporally close, while computation and the ability to consume results are considered acceptable
Delayed reaction when computation and ability to consume results are problematic

Furthermore, reaction to certain network events varies depending on whether label stack compression is enabled or not. For the following events, there is no immediate recomputation of SID-lists of candidate paths when compression is off:

Change in TE-metric of links
Link Down events where link is not traversed by candidate path
Link Up event

When label stack compression is on, the above events are acted upon to see if the computation results are impacted.

R1: CSPF

This configuration results in the SR-TE LSPs being created as shown in Figure 11.

Verifying the SR-TE LSPs computed by the Distributed SR CSPF

As you can see in the output, Junos computes a path consisting of only the node SID for R4 to meet the constraints for the red path and also determines that the blue path is just the shortest path and thus only uses the node SID for R6, instead of computing a fully qualified path of adjacency SIDs. In contrast, if you add the no-label-stack-compression keyword you can see how a fully qualified SID list, comprised of adjacency SIDs, is computed.

Configuration example for R1

Verifying the SR-TE LSPs computed by the Distributed SR CSPF

Lastly, as discussed above, maximum SID depth continues to play an important role during dynamic CSPF. Like the previous PCEP example, a maximum SID depth can be set for each specific compute-profile using the maximum-segment-list-depth<value> key words.

Configuration example for R1

A Centralized Controller or PCE for External CSPF

When discussing SR there is oftentimes an assumption that a controller or centralized PCE is present, so let’s briefly explore how the Northstar TE Controller can be leveraged as an external path computation source for SR paths.

BGP Link-state for Topology Discovery

In order for the controller (PCE) to do any kind of path computation, it must be synchronized with the network “topology.” A network topology can take the form of a traffic engineering database (TED), much like what RSVP-TE uses for path computation, a link-state database (LSDB), or even more physical forms. The following example shows how using BGP-LS will convey a TED to Juniper’s Northstar Controller.

Note

Please refer to the Visualization section in Observability chapter for a BGP-LS configuration example.

PCEP for SR-TE LSP Creation and Control

The next relevant piece of information a controller requires is to be able to learn or to create a SR-TE LSP state. In next configuration example shows a Path Computation Element Protocol (PCEP) session between a Path Computation Client (PCC), or ingress router, and the controller. Figure 13 shows the PCEP session parameters.

Now let’s revisit some of the SR-specific SID types and see how they ‘look’ on the controller and how to announce them.

Anycast SIDs are advertised by the IGP SR extensions during LSDB, by flooding, creating a second set of lo0.0 addresses and announcing a prefix-SID for them. The next example announces the anycast SID 123 from nodes p1.nyc and p2.nyc.

Announcing anycast SIDs from p1.nyc and p2.nyc

Verifying on pe1.nyc

The anycast SID can be seen as a node property on the Northstar GUI, and later it’s chosen as a strict or loose hop (more likely a loose hop) when creating the SR-TE LSP. Binding SIDs are added to a SR-TE LSP, as shown in Figure 14, and then advertised to Northstar via the PCE Protocol.

Creating a Core SR-TE LSP with a binding SID

Verifying the core LSP on P1.NYC

Verifying the routing table on p1.nyc

Verifying binding SIDs on the Northstar TE Controller

Binding SIDs can be verified using LSP properties or by adding a binding SID column to the Tunnels tab.

Now that you have explored various aspects of TE, several of the SR-specific SID types, and how basic information is synchronized with a controller, let’s revisit the sample network from previous chapters and bring an entire solution together!

End-to-End TE Solution

One of the key goals of transitioning the sample network to segment routing will be to replicate a form of “bandwidth optimization” (TE) into the core, Level 2 ISIS domain, where currently the Auto Bandwidth RSVP-TE LSPs provide a relatively granular per path bandwidth optimization. Because SR does not maintain the per LSP state, and thus per LSP statistics, a bandwidth optimization solution for a SR network requires that a controller acquire data from another source, such as streaming telemetry sources as described in Observability chapter, to end at a solution in another form that RSVP-TE would have.

First let’s start by creating a mesh of SR-TE LSPs (see Table 1) between pe1.nyc and all the PEs in the IAD PoP. These LSPs will be created, ephemerally, using the PCE protocol such that the Northstar Controller has explicit control of each of their paths.

Table 1: SR-TE LSP Mesh

LSP Name	Ingress router	Egress router
pe1.nyc-pe1.iad	pe1.nyc	pe1.iad
pe1.nyc-pe2.iad	pe1.nyc	pe2.iad
pe1.nyc-pe3.iad	pe1.nyc	pe3.iad
pe1.iad-pe1.nyc	pe1.iad	pe1.nyc
pe2.iad-pe1.nyc	pe2.iad	pe1.nyc
pe3.iad-pe1.nyc	pe3.iad	pe1.nyc

To create a SR-TE LSP on the Northstar Controller use the Applications>Provision LSP drop-down option or the Add button on the Tunnel Tab. A pop-up window will appear, adding the attributes of the LSP, as shown in Figure 16. A key attribute when creating SR-TE LSPs is to provide them with some nominal ‘Planned Bandwidth’ value. This is to ensure that during periodic reoptimization, or a triggered reoptimization, link congestion awareness can be accounted for by Northstar’s CSPF, as you will see later.

Creating PCE Provisioned SR-TE LSPs Using PCEP

Note

At the time of this writing, per SR policy ingress statistics were not available to the Northstar Controller. In the future, the static 10k bandwidth assigned to each SR-TE LSP can be replaced with a dynamic, real-time data plane statistics value so that a more granular bandwidth optimization is realizable.

The resulting SR-TE LSP mesh is shown in Figure 17. Here you can see what links are traversed by the mesh.

From the ingress PCC perspective, you can see that three SR-TE LSPs have been signaled via the Controller (PCE):

And that each has been installed in the inet.3 RIB of pe1.nyc:

You can verify the label stack for each SR-TE LSP via the Northstar Controller GUI, shown in the Figure 18, as shown in the Record Route and the ERO columns, and displaying the adjacency SIDs for the topology.

As you can see from the Junos ingress router’s SR-TE LSP detailed output, the label stack matches. It’s worth noting that the first label (17, in the case of SR-TE LSP pe1.nyc-pe1.iad) is not actually imposed on the packet since the IP address from the PCEP NAI (node or adjacency identifier) field is used for the output interface selection. This is illustrated in the next output.

Detailed SR-TE LSP output:

JUNOS inet.3 RIB entry for the SR-TE LSPs:

Our sample network is providing several services to attached CE routers. From ce1.nyc you can see the resulting label stack of the transport SR-TE LSPs between pe1.nyc and pe2.iad:

Now let’s get back to how to provide a bandwidth optimization service for IS-IS Level 2 domain using the Northstar Controller. The original core network was built on RSVP-TE Auto Bandwidth LSPs. These dynamically adapt to increasing and decreasing traffic rates. Segment routing currently has no equivalent capabilities, primarily due to a lack of transit LSP state. We will enable an attribute on the Northstar Controller to react to interface congestion to trigger SR-TE LSP re-optimization based solely on ingress statistic collection. To enable this feature of the Northstar Controller go to Administration>Analytics and toggle the Reroute feature On as shown in Figure 20.

Enabling
Interface Traffic-based Reroute

To ensure the topology also displays real-time interface statistics, use the drop down box on the left hand side of the GUI. Select Performance and enable Interface Utilization, as shown in Figure 21.

Enabling Interface Utilization-based Topology Display

To simulate traffic on the network, in order to illustrate the controller’s ability to reroute SR-TE LSPs away from congested links, let’s start some extended pings with large packet sizes between pe1.nyc and p1.iad:

And likewise in the other direction…

As you can see in Figure 22, Northstar detects the link congestion which will trigger a path reoptimization and reroute the SR-TE LSPs away from the congested link(s).

Link
Congestion Display on Northstar GUI

By selecting Timeline in the left panel, you can see in Figure 23 that link congestion, based on the interface congestion threshold, has been detected and the LSPs that were traversing the congested link have been scheduled for rerouting.

Time of Events for a Link Congestion Threshold Crossing

Going back to Northstar’s Topology > Tunnel View, you can see the SR-TE LSP between pe1.nyc and pe2. iad in Figure 24. You can see that the LSP is now on a new path avoiding the congested link(s).

Updated SR-TE LSP Path After Congestion Detection

And the ingress router’s SR-TE LSP has been updated as well:

Traffic engineering is an indispensable function in most backbone wide area networks. A key objective of modern TE is the optimization of resource utilization. RSVP-TE has a long history and a large box of tools to leverage while SR needs to leverage newer tools, such as streaming telemetry and controllers, to achieve better resource utilization. This chapter provided a number of SR-specific options for creating explicit paths, various forms of distributed and centralized path computation, and finally, how our sample network can be transitioned to a SR TE. While the bandwidth optimization solution differs quite a bit from traditional RSVP-TE, it can provide a means of reacting to interface congestion to approximate resource optimization.

ON THIS PAGE