Help us improve your experience.

Let us know what you think.

Do you have time for a two-minute survey?

header-navigation
keyboard_arrow_up
list Table of Contents
file_download PDF
{ "lCode": "en_US", "lName": "English", "folder": "en_US" }
English

Scaling, Troubleshooting, and Monitoring Considerations

date_range 10-Apr-20

This chapter discusses various aspects of monitoring route server scale, Junos BGP components, and client-specific BGP sessions. Monitoring a route server is not unlike normal BGP session management, which is covered so commonly in other publications that we assume the reader is well-versed in it.

Unlike those normal BGP speakers, Junos route servers have a couple of special considerations

  • Configuration database size

  • rpd memory utilization for route copies between routing instances

Monitoring the Configuration Database Size

To support large configurations, for example say a thousand route server clients resulting in more than two million lines of output or more, the default configuration database size needs to be extended and compression enabled. The following configuration stanza enables extended configuration database size but requires a Junos reboot:

content_copy zoom_out_map
system {
configuration-database {
extend-size;
}
compress-configuration-files;
}

Initial Junos Configuration Database

The Junos configuration database can be monitored with the show system configuration database usage command, here showing the size with a basic configuration:

content_copy zoom_out_map
root@rs1> show system configuration database usage
Maximum size of the database: 1305.99 MB
Current database size on disk: 1.50 MB
Actual database usage: 1.47 MB
Available database space: 1304.52 MB

Now let’s show 1,000 route-server clients, each with their own routing instance, import policy, instance-export policy, and instance-import policy:

content_copy zoom_out_map
root@rs1> show system configuration database usage
Maximum size of the database: 1305.99 MB
Current database size on disk: 1221.50 MB
Actual database usage: 1221.48 MB
Available database space: 84.51 MB

You can see the significant database usage.

Monitoring Route Table Size

The next sample output shows summary statistics about the entries in the routing table (show route summary command) and the memory usage breakdown (show task memory detail command) for the rpd. The two commands provide a comprehensive picture of the memory utilization of the routing protocol process.

The show route summary command shows the number of routes in the various routing tables for each route server client. Within each routing table, all of the active, holddown, and hidden destinations and routes are summarized. Routes are in the holddown state prior to being declared inactive, and hidden routes are not used because of routing policy. Note that routes in the holddown and hidden states are still using memory because they appear in the routing table:

content_copy zoom_out_map
root@rs1> show route summary table C1.inet.0
Autonomous system number: 123
Router ID: 192.0.2.254
C1.inet.0: 31 destinations, 31 routes (31 active, 0 holddown, 0 hidden)
BGP: 31 routes, 31 active

Monitoring RPD Memory Utilization

The show task memory detail command lists the data structures within the tasks run by rpd. Tasks are enabled depending on the router’s configuration. The Alloc Bytes field indicates the highest amount of memory used by the data structure. The maximum allocated blocks and bytes are high water marks for a data structure. The example below looks to be output from a very healthy route server as very little memory is being allocated:

content_copy zoom_out_map
root@rs1> show task memory detail | match “Total bytes| |bgp|Allocator|Name|Malloc” | except je_
task | except “Size TXP” | except Overall |except Page
content_copy zoom_out_map
 ------------------------------------------------------------------------- 
 ------------------------ Allocator Memory Report -------------------------
 Name                 Size Alloc DTXP    Alloc     Alloc MaxAlloc  MaxAlloc
 bgp-th-wrstage-iov  10240 12288             5     61440        7     86016
 rt_table_name_node     20    24            11       264       12       288
 bgpconf              1776  1792            12     21504       17     30464
 bgp_uio_trace_conf     20    24            36       864       36       864
 bgp_uio_group_info    820   896             4      3584        4      3584
 bgp_uio_peer_info     168   192             7      1344        7      1344
 bgp_orf_mark_t         24    28             7       196        7       196
 bgp_riblist_entry_t    12    16            12       192       12       192
 bgp_rg_list_obj         8    12             1        12        1        12
 bgp_msgbld             76    80            11       880       11       880
 bgp_io_oper           228   256             7      1792        7      1792
 bgp_nlri_sync_t       372   384             7      2688        7      2688
 bgp-rib-to-group       12    16             4        64        4        64
 bgp_mrto_hash        4096  8192             2     16384        4     32768
 bgp-rib-grp          1896  2048             2      4096        4      8192
 bgp-rib-globals        28    32             7       224        7       224
 bgp-rib-peer-group    496   512             2      1024        4      2048
 bgp-rib-peer-counter  116   128             7       896        7       896
 bgp-rib-peer          484   512             7      3584        7      3584
 bgp-rib               136   140             7       980        7       980
 bgp ifachg notify      12    16             3        48        3        48
 bgp ifachg reg         32    36             3       108        3       108
 bgp_act_node          700   768             7      5376        7      5376
 bgp_adv_entry          24    28             6       168      136      3808
 bgp_checksum_stats_t   12    16             7       112        7       112
 bgp_tsi_t              16    20             6       120      136      2720
 bgp_addpath_params_t   12    16             7       112        7       112
 bgpb_sbits_01          20    24             4        96       10       240
 bgp_metrics_node       84    96            13      1248       19      1824
 bgpg_rtinfo_entry      16    20             6       120       65      1300
 bgp_peeras_t           16    20             1        20        3        60
 bgp_rtentry            24    28            43      1204      105      2940
 bgp_bmp_common_peer_    4     8            16       128       22       176
 bgpPeerGroup        19088 20480             4     81920        4     81920
 bgpPeer              9288 12288             7     86016        7     86016
 bgp_buffer           4100  8192             5     40960        7     57344
 bgp_cluster_t          16    20             1        20        1        20
 bgp_instance_t         88    96             5       480        6       576
 bgp_evpn_metrics       48    52            18       936       34      1768
  -------------------------------------------------------------------------
content_copy zoom_out_map
  -------------------------- Malloc Usage Report ---------------------------
  Name                      Allocs     Bytes  MaxAllocs   MaxBytes FuncCalls
  BGP_3                          3       448          3        448         3
  BGP_Group_C3                   4       816          4        816         4
  BGP_2                          3       448          3        448         3
  BGP_Group_C2                   4       816          4        816         4
  BGP_1                          3       448          3        448         3
  BGP_Group_C1                   4       816          4        816         4
  BGP_100_100                    3       448          3        448         3
  BGP_100_100                    3       448          3        448         3
  BGP_100_100                    3       448          3        448         3
  BGP_100_100                    3       448          3        448         3
  BGP_Group_evpn                 4       816          5        828        94
  BGP_RT_Background             29    278000         32     278136       303
  bgp-thrio                      1      8192          1       8192         1
  bgp-thrio-ctx                 41     19552         60      27420    575053
  BGP addpath task               2      3584          2       3584         2
	 -------------------------------------------------------------------------

Monitoring Client EBGP Sessions

Individual route server client EBGP sessions can be viewed either as a summarized list, or specifically, using show bgp summary on the sample topology:

content_copy zoom_out_map
 root@rs1> show bgp summary 
 Threading mode: BGP I/O
 Groups: 4 Peers: 7 Down peers: 0
 Table        Tot Paths  Act Paths Suppressed History Damp State     Pending
 bgp.evpn.0
                     12         12          0       0          0           0
 Peer          AS      InPkt     OutPkt OutQ   Flaps Last Up/Dwn State|#Active/
Received/Accepted/Damped...
 192.0.2.1      1       6195       6093     0     1  1d 22:26:37  Establ
    C1.inet.0: 31/31/31/0
 192.0.2.2      2        332        325     0     7      2:27:38  Establ
    C2.inet.0: 31/31/31/0
 192.0.2.3      3      53788      52946     0     0  2w2d 19:23:41 Establ
    C3.inet.0: 31/31/31/0
content_copy zoom_out_map
 
root@rs1> show bgp neighbor 192.0.2.1
Peer: 192.0.2.1+56947 AS 1  Local: 192.0.2.254+179 AS 123
  Group: C1                  Routing-Instance: C1
  Forwarding routing-instance: master
  Type: External    State: Established    Flags: <Sync>
  Last State: OpenConfirm   Last Event: RecvKeepAlive
  Last Error: Hold Timer Expired Error
  Options: <Preference AddressFamily PeerAS Refresh>
  Options: <MtuDiscovery>
  Options: <RouteServerClient>
  Address families configured: inet-unicast
  Holdtime: 90 Preference: 170
  Number of flaps: 1
  Last flap event: HoldTime
  Error: ‘Hold Timer Expired Error’ Sent: 1 Recv: 0
  Peer ID: 192.0.2.1        Local ID: 192.0.2.254        Active Holdtime: 90
[...Output truncated...]`

Monitoring Route Distribution

The following show command views the total routes present in the route server client C3’s RIB, along with the RIBs where they are imported:

content_copy zoom_out_map
root@rs1> show route export C3.inet.0 detail
C3.inet.0                         Routes:     31
Import: [ C1.inet.0 C2.inet.0 C3.inet.0 ]

To see specific routes in C3’s RIB that will be exported to C1, based on the IXP global policy, view the source RIB and filter by using the target community:

content_copy zoom_out_map
root@rs1> show route protocol bgp community 64498:1 table C3.inet.0

C3.inet.0: 31 destinations, 31 routes (31 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

198.51.100.0/24     *[BGP/170] 2w0d 20:13:27, localpref 100
                      AS path: 3 I, validation-state: unverified
                    > to 192.0.2.3 via ge-0/0/1.0
203.0.113.0/24      *[BGP/170] 2w0d 20:13:27, localpref 100
                      AS path: 3 I, validation-state: unverified
                    > to 192.0.2.3 via ge-0/0/1.0
[...Output truncated...]

A slightly different view, or rather a validation of the previous command, is to look at the RIB contents of all client RIBs from the perspective of what routes have been received from a specific route server client. In the next example, 192.0.2.3 is the BGP peer associated with the routing-instance C3:

content_copy zoom_out_map
root@rs1> show route receive-protocol bgp 192.0.2.3 | except inet.0

inet.3: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)

C1.inet.0: 31 destinations, 31 routes (31 active, 0 holddown, 0 hidden)
  Prefix        Nexthop       MED     Lclpref    AS path
* 198.51.100.0/24          192.0.2.3                              3 I
* 203.0.113.0/24           192.0.2.3                              3 I
  [...Output truncated...]

C3.inet.0: 31 destinations, 31 routes (31 active, 0 holddown, 0 hidden)
   Prefix       Nexthop       MED     Lclpref    AS path
* 198.51.100.0/24           192.0.2.3                             3 I
* 203.0.113.0/24            192.0.2.3                             3 I
   [...Output truncated...]

Specific prefixes can also be searched for to aid in troubleshooting:

content_copy zoom_out_map
regress@RS1> show route receive-protocol bgp 192.0.2.3 198.51.100.0 |except inet.0

C1.inet.0: 31 destinations, 31 routes (31 active, 0 holddown, 0 hidden)
  Prefix       Nexthop        MED     Lclpref    AS path
* 198.51.100.0/24           192.0.2.3                             3 I

C1.inet.0: 31 destinations, 31 routes (31 active, 0 holddown, 0 hidden)
  Prefix       Nexthop        MED     Lclpref    AS path
* 198.51.100.0/24           192.0.2.3                             3 I

Routes may also be searched by community value or name. The search results in retrieving all the clients’ RIBs that have a match, so route propagation between client RIBs can be tracked:

content_copy zoom_out_map
regress@RS1> show route protocol bgp community-name as64498_comms 198.51.100.0 

inet.0: 41 destinations, 41 routes (40 active, 0 holddown, 1 hidden)

C1.inet.0: 93 destinations, 93 routes (93 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

198.51.100.0/24     *[BGP/170] 2w2d 19:32:26, localpref 100
                      AS path: 3 I, validation-state: unverified
                    > to 192.0.2.3 via ge-0/0/1.0

C3.inet.0: 62 destinations, 62 routes (62 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

198.51.100.0/24     *[BGP/170] 2w2d 19:32:26, localpref 100
                      AS path: 3 I, validation-state: unverified
                    > to 192.0.2.3 via ge-0/0/1.0

Monitoring Tools: HealthBot

HealthBot is a highly automated and programmable device-level diagnostics and network analytics tool that provides consistent and coherent operational intelligence across network deployments. Integrated with multiple data collection methods (such as Junos Telemetry Interface, NETCONF, and SNMP), HealthBot aggregates and correlates large volumes of time-sensitive telemetry data, providing a multidimensional and predictive view of the network. Additionally, HealthBot translates troubleshooting, maintenance, and real-time analytics into an intuitive user experience to give network operators actionable insights into the health of an individual device and the overall network.

HealthBot BGP KPIs, located at https://github.com/Juniper/healthbot-rules/tree/master/juniper_official/Protocols/Bgp, contain readily consumable HealthBot playbooks and rules, which are specific to BGP neighbor key performance indicators (KPIs).

Figure 1: HealthBot Dashboard for Route Server System KPI Monitoring
HealthBot Dashboard for Route Server System KPI Monitoring

BGP KPI rules collect the statistics from network devices then analyzes the data and acts. A BGP KPI playbook is set of rules, each rule is defined with a set of KPIs. Playbooks contain BGP session state, neighbor flap detection, received routes with static threshold, and received routes with dynamic threshold rules.

Rules are defined with default variable values that can be changed when deploying playbooks.

HealthBot RIB KPIs, located at https://github.com/Juniper/healthbot-rules/tree/master/juniper_official/Protocols/Rib, contain readily consumable HealthBot playbooks and rules that are specific to RIB route summary KPIs. RIB route summary KPI rules collect the statistics from network devices then analyze the data and act appropriately. The RIB route summary KPI playbook is set of rules, each rule is defined with set of KPIs. Playbooks contain route table summaries for ascertaining routes and protocol route summary rules with dynamic thresholds. Rules are defined with default variable values that can be changed while deploying the playbook.

HealthBot Systems KPIs, located at https://github.com/Juniper/healthbot-rules/tree/master/juniper_official/System, contain readily consumable HealthBot playbooks and rules that are specific to system KPIs. System KPI rules collect the statistics from network devices, then analyze the data and act. The system KPI playbook is a set of rules, where each rule is defined with a set of KPIs. Playbooks contain routing engine CPU, routing engine memory, Junos processes CPU, memory leak detection, and system storage rules.

Rules are defined with default variable values, which can be changed while deploying the playbook.

footer-navigation