NorthStar Controller Log Files
Empty Topology
NTAD Version
Incorrect Topology
Missing LSPs
LSP Controller Statuses
PCC That is Not PCEP-Enabled
LSP Stuck in PENDING or PCC_PENDING State
LSP That is Not Active
PCS Out of Sync with Toposerver
Disappearing Changes
Investigating Client Side Issues
Incomplete Results of the Bandwidth Sizing Scheduled Task
Troubleshooting NorthStar Integration with HealthBot
Collecting NorthStar Controller Debug Files
Remote Syslog
Increasing the Scale of SNMP Collection

NorthStar Controller Troubleshooting Guide

This document includes strategies for identifying whether an apparent problem stems from the NorthStar Controller or from the router, and provides troubleshooting techniques for those problems that are identified as stemming from the NorthStar Controller.

Before you begin any troubleshooting investigation, confirm that all system processes are up and running. A sample list of processes is shown below. Your actual list of processes could be different.

Restart any processes that display as STOPPED instead of RUNNING.

Note:

To stop, start, or restart all processes, use the service northstar stop, service northstar start, and service northstar restart commands.

To access system process status information from the NorthStar Controller Web UI, navigate to More Options>Administration and select System Health.

The current CPU %, memory usage, virtual memory usage, and other statistics for each system process are displayed. Figure 1 shows an example.

Note:

Only processes that are running are included in this display.

Figure 1: Process Status Display Screenshot of a server process monitoring tool showing processes, CPU and memory usage, and commands for server 172.25.152.150.

Screenshot of a server process monitoring tool showing processes, CPU and memory usage, and commands for server 172.25.152.150.

Table 1 describes each field displayed in the Process Status table.

Table 1: Descriptions of Process Status Fields
Field	Description
Process	The name of the NorthStar Controller process.
PID	The Process ID number.
User	The NorthStar Controller user permissions required to access information about this process.
Group	NorthStar Controller user group permissions required to access information about this process.
CPU%	Displays current percentage of CPU currently in use by this process.
Memory	Displays current percentage of memory currently in use by this process.
Virtual Memory	Displays current Virtual memory in use by this process.
CPU Time	The amount of time the CPU was used for processing instructions for the process
CMD	Displays the specific command options for the system process.

The troubleshooting information is presented in the following sections:

NorthStar Controller Log Files

Throughout your troubleshooting efforts, it can be helpful to view various NorthStar Controller log files. To access log files:

Log in to the NorthStar Controller Web UI.
Navigate to More Options > Administration and select Logs.

A list of NorthStar system log and message files is displayed, a truncated example of which is shown in Figure 2.

Figure 2: Sample of System Log and Message Files
Click the log file or message file that you want to view.

The log file contents are displayed in a pop-up window.
To open the file in a separate browser window or tab, click View Raw Log in the pop-up window.
To close the pop-up window and return to the list of log and message files, click X in the upper right corner of the pop-up window.

Table 2 lists the NorthStar Controller log files most commonly used to identify and troubleshoot issues with the PCS and PCE.

Table 2: Top NorthStar Controller Troubleshooting Log Files
Log File	Description	Location
pcep_server.log	Log entries related to the PCEP server. The PCEP server maintains the PCEP session. The log contains information about communication between the PCC and the PCE in both directions. To configure verbose PCEP server logging: From the NorthStar Controller CLI, run `pcep_cli`. Type `set log-level all`. Press CTRL-C to exit.	/var/log/jnc
pcs.log	Log entries related to the PCS. The PCS is responsible for path computation. This log includes events received by the PCS from the Toposerver, including provisioning orders. It also contains notification of communication errors and issues that prevent the PCS from starting up properly.	/opt/northstar/logs
toposerver.log	Log entries related to the topology server. The topology server is responsible for maintaining the topology. These logs contain the record of the events between the PCS and the Toposerver, the Toposerver and NTAD, and the Toposerver and the PCE server	/opt/northstar/logs

Table 3 lists additional log files that can also be helpful for troubleshooting. All of the log files in Table 3 are located under the /opt/northstar/logs directory.

Table 3: Additional Log Files for Troubleshooting NorthStar Controller
Log Files	Description
cassandra.msg	Log events related to the cassandra database.
ha_agent.msg	HA coordinator log.
mlAdaptor.log	Interface to transport controller log.
net_setup.log	Configuration script log.
nodejs.msg	Log events related to nodejs.
pcep_server.log	Log files related to communication between the PCC and the PCE in both directions.
pcs.log	Log files related to the PCS, which includes any event received by PCS from Toposerver and any event from Toposerver to PCS including provisioning orders. This log also contains any communication errors as well as any issues that prevent the PCS from starting up properly.
rest_api.log	Logs files of REST API requests.
toposerver.log	Log files related to the topology server. Contains the record of the events between the PCS and topology server, the topology server and NTAD, and the topology server and the PCE server Note: Any message forwarded to the pcshandler.log file is also forwarded to the pcs.log file.

To see logs related to the Junos VM, you must establish a telnet session to the router. The default IP address for the Junos VM is 172.16.16.2. The Junos VM is responsible for maintaining the necessary BGP, ISIS, or OSPF sessions.

Empty Topology

Figure 3 illustrates the flow of information from the router to the Toposerver that results in the topology display in the NorthStar Controller UI. When the topology display is empty, it is likely this flow has been interrupted. Finding out where the flow was interrupted can guide your problem resolution process.

Figure 3: Topology Information Flow Toposerver connected to Junos VM via NTAD, Junos VM linked to Router using BGP-LS, ISIS, OSPF protocols.

The topology originates at the routers. For NorthStar Controller to receive the topology, there must be a BGP-LS, ISIS, or OSPF session from one of the routers in the network to the Junos VM. There must also be an established Network Topology Abstractor Daemon (NTAD) session between the Junos VM and the Toposerver.

To check these connections:

Using the NorthStar Controller CLI, verify that the NTAD connection between the Toposerver and the Junos VM was successfully established as shown in this example:
Note:
Port 450 is the port used for Junos VM to Toposerver connections.

In the following example, the NTAD connection has not been established:

Log in to the Junos VM to confirm whether NTAD is configured to enable topology export. The grep command below gives you the IP address of the Junos VM.

If the topology-export statement is missing, the Junos VM cannot export data to the Toposerver.

Use Junos OS show commands to confirm whether the BGP, ISIS, or OSPF relationship between the Junos VM and the router is ACTIVE. If the session is not ACTIVE, the topology information cannot be sent to the Junos VM.
On the Junos VM, verify whether the lsdist.0 routing table has any entries:
If you see only zeros in the lsdist.0 routing table, there is no topology that can be sent. Review the NorthStar Controller Getting Started Guide sections on configuring topology acquisition.
Ensure that there is at least one link in the lsdist.0 routing table. The Toposerver can only generate an initial topology if it receives at least one NTAD link event. A network that consists of a single node with no IGP adjacency with other nodes (as is possible in a lab environment, for example), will not enable the Toposerver to generate a topology. Figure 4 illustrates the Toposerver’s logic process for creating the initial topology.

Figure 4: Logic Process for Initial Topology Creation

If an initial topology cannot be created for this reason, the toposerver.log generates an entry similar to the following example:

NTAD Version

If you see that SR LSPs have not been provisioned and the pcs.log shows messages similar to this example:

It might be that the NTAD version is incorrect. See Installing the NorthStar Controller for information on NTAD versions.

Incorrect Topology

One important function of the Toposerver is to correlate the unidirectional link (interface) information from the routers into bidirectional links by matching source and destination IPv4 Link_Identifiers from NTAD link events. When the topology displayed in the NorthStar UI does not appear to be correct, it can be helpful to understand how the Toposerver handles the generation and maintenance of the bidirectional links.

Generation and maintenance of bidirectional links is a complex process, but here are some key points:

For the two nodes constituting each bidirectional link, the Node ID that was assigned first (and therefore has the lower Node ID number) is given the Node A designation, and the other node is given the Node Z designation.

Note:
The Node ID is assigned when the Toposerver first receives the Node event from NTAD.
Whenever a Node ID is cleared and reassigned (such as during a Toposerver restart or network model reset), the Node IDs and therefore, the A and Z designations, can change.
The Toposerver receives a Link Update message when a link in the network is added or modified.
The Toposerver receives a Link Withdraw message when a link is removed from the network.
The Link Update and Link Withdraw messages affect the operational status of the nodes.
The node operational status, together with the protocol (IGP versus IGP plus MPLS) determine whether a link can be used to route LSPs. For a link to be used to route LSPs, it must have both an operational status of UP and the MPLS protocol active.

Missing LSPs

When your topology is displaying correctly, but you have missing LSPs, take a look at the flow of information from the PCC to the Toposerver that results in tunnels being added to the NorthStar Controller UI, as illustrated in Figure 5. The flow begins with the configuration at the PCC, from which an LSP Update message is passed to the PCEP server by way of a PCEP session and then to the Toposerver by way of an Advanced Message Queuing Protocol (AMQP) connection.

Figure 5: LSP Information Flow Network architecture diagram showing Toposerver, PCEP Server, and PCC. Toposerver uses AMQP to communicate with PCEP Server. PCEP Server uses PCEP to communicate with PCC. Arrows indicate data flow direction.

Network architecture diagram showing Toposerver, PCEP Server, and PCC. Toposerver uses AMQP to communicate with PCEP Server. PCEP Server uses PCEP to communicate with PCC. Arrows indicate data flow direction.

To check these connections:

Look at the toposerver.log. The log prints a message every 15 seconds when it detects that its connection with the PCEP server has been lost or was never successfully established. Note that in the following example, the connection between the Toposerver and the PCEP server is marked as down.

Using the NorthStar Controller CLI, verify that the PCEP session between the PCC and the PCEP server was successfully established as shown in this example:
Note:
Port 4189 is the port used for PCC to PCEP server connections.

Knowing that the session has been established is useful, but it does not necessarily mean that any data was transferred.

Verify whether the PCEP server learned about any LSPs from the PCC.

In the far right column of the output, you see the number of LSPs that were learned. If this number is 0, no LSP information was sent to the PCEP server. In that case, check the configuration on the PCC side, as described in the NorthStar Controller Getting Started Guide.

LSP Controller Statuses

You can view the controller status of LSPs in the Controller Status column in the Tunnels tab of the Network Information table (in the NorthStar Controller GUI).

Table 4 lists the various controller statuses and their descriptions.

Table 4: LSP Controller Statuses
Controller Status	Indicates That
FAILED	The NorthStar Controller has failed to provision the LSP.
PENDING	The PCS has sent an LSP provisioning order to the PCEP sever. The PCS is awaiting a response from the PCEP server.
PCC_PENDING	The PCEP server has sent an LSP provisioning order to the PCC. The PCS is awaiting a response from the PCC.
NETCONF_PENDING	The PCS has sent an LSP provisioning order to netconfd. The PCS is awaiting a response from netconfd.
PRPD_PENDING	The PCS has sent an LSP provisioning order to the PRPD client to provision a BGP route. The PCS is awaiting a response from the PRPD client.
SCHEDULED_DELETE	The PCS has scheduled the LSP to be deleted; the PCS will send the deletion provisioning order to the PCC.
SCHEDULED_DISCONNECT	The PCS has scheduled the LSP to be disconnected. The LSP will be moved to Shutdown status; the LSP is retained in the NorthStar datastore with a Persist state associated with it and is not used in CSPF calculations.
NoRoute_Rescheduled	The PCS hasn’t found a path for the LSP. The PCS will scan the LSPs periodically and will try to find a path for the LSP that hasn’t been routed and then, schedule its reprovisioning.
FRR_DETOUR_Rescheduled	The PCS has detoured the LSP and rescheduled the LSP’s re-provisioning.
Provision_Rescheduled	The PCS has scheduled the LSP to be provisioned.
Maint_NotHandled	The LSP is not part of the ongoing maintenance event as the LSP is not controlled by NorthStar.
Maint_Rerouted	The PCS has rerouted the LSP due to maintenance.
Callsetup_Scheduled	The PCS must provision the LSP when the event starts.
Disconnect_Scheduled	The PCS must disconnect the LSP when the event ends.
No path found	The PCS was unable to find a path for the LSP.
Path found on down LSP	The PCEP server has reported that the LSP is Down but the PCS has found a path for the LSP.
Path include loops	The SR-LSP has one or more loops.
Maint_NotReroute_DivPathUp	The LSP is not rerouted due to the maintenance event as there’s a standby path already up and running.
Maint_NotReroute_NodeDown	The LSP is not rerouted as the maintenance event is for the endpoints of the LSP.
PLANNED_LSP	The LSP must be provisioned but is not in the provisioning queue yet.
PLANNED_DISCONNECT	The LSP must be disconnected but is not in the provisioning queue yet.
PLANNED_DELETE	The LSP must be deleted but is not in the provisioning queue yet.
Candidate_ReOptimization	The PCS has selected the LSP as a candidate for reoptimization.
Activated(used_by_primary)	Secondary path for the LSP is activated.
Time_Expired	Scheduled window for the LSP has expired.
PCEP_Capability_not_supported	PCEP may not be supported on the device, or if supported, PCEP may either not be configured, may be disabled, or misconfigured on the device.
De-activated	NorthStar Controller has deactivated the secondary LSP.
NS_ERR_NCC_NOT_FOUND	The NorthStar Controller is unable to use the Netconf Connection Client (NCC) to establish a Netconf connection to the device. Workaround: Restart Netconf on the NorthStar server. [root@pcs-1 templates]# `supervisorctl restart netconf` netconf:netconf: stopped netconf:netconf: started
SR LSP provisioning requires LSP statefull SR capability	You must configure the following command on the Junos device through the CLI, to provision the SR LSP: set protocols pcep pce <name> spring-capability

PCC That is Not PCEP-Enabled

The Toposerver associates the PCEP sessions with the nodes in the topology from the TED in order to make a node PCEP-enabled. This Toposerver function is hindered if the IP address used by the PCC to establish the PCEP session was not the one automatically learned by the Toposerver from the TED. For example, if a PCEP session is established using the management IP address, the Toposerver will not receive that IP address from the TED.

When the PCC successfully establishes a PCEP session, it sends a PCC_SYNC_COMPLETE message to the Toposerver. This message indicates to NorthStar that synchronization is complete. The following is a sample of the corresponding toposerver log entries, showing both the PCC_SYNC_COMPLETE message and the PCEP IP address that NorthStar might or might not recognize:

Some options for correcting the problem of an unrecognized IP address are:

Manually input the unrecognized IP address in the device profile in the NorthStar Web UI by navigating to More Options > Administration > Device Profile.
Ensure there is at least one LSP originating on the router, which will allow Toposerver to associate the PCEP session with the node in the TED database.

Once the IP address problem is resolved, and the Toposerver is able to successfully associate the PCEP session with the node in the topology, it adds the PCEP IP address to the node attributes as can be seen in the PCS log:

LSP Stuck in PENDING or PCC_PENDING State

Once nodes are correctly established as PCEP-enabled, you could start provisioning LSPs. It is possible for the LSP controller status to indicate PENDING or PCC_PENDING as seen in the Tunnels tab of the Web UI network information table (Controller Status column). This section explains how to interpret those statuses.

When an LSP is being provisioned, the PCS server computes a path that satisfies all the requirements for the LSP, and then sends a provisioning order to the PCEP server. Log messages similar to the following example appear in the PCS log while this process is taking place:

The LSP controller status is PENDING at this point, meaning that the provisioning order has been sent to the PCEP server, but an acknowledgement has not yet been received. If an LSP is stuck at PENDING, it suggests that the problem lies with the PCEP server. You can log into the PCEP server and configure verbose log messages which can provide additional information of possible troubleshooting value:

There are also a variety of show commands on the PCEP server that can display useful information. Just as with Junos OS syntax, you can enter show ? to see the show command options.

If the PCEP server successfully receives the provisioning order, it performs two actions:

It forwards the order to the PCC.
It sends an acknowledgement back to the PCS.

The PCEP server log would show an entry similar to the following example:

The LSP controller status changes to PCC_PENDING, indicating that the PCEP server received the provisioning order and forwarded it on to the PCC, but the PCC has not yet responded. If an LSP is stuck at PCC_PENDING, it suggests that the problem lies with the PCC.

If the PCC receives the provisioning order successfully, it sends a response to the PCEP server, which in turn, forwards the response to the PCS. When the PCS receives this response, it clears the LSP controller status completely, indicating that the LSP is fully provisioned and is not waiting for action from the PCEP server or PCC. The operational status (Op Status column) then becomes the indicator for the condition of the tunnel.

The PCS log would show an entry similar to the following example:

LSP That is Not Active

If an LSP provisioning order is successfully sent and acknowledged, and the controller status is cleared, it is still possible that the LSP is not up and running. If the operational status of the LSP is DOWN, the PCC cannot signal the LSP. This section explores some of the possible reasons for the LSP operational status to be DOWN.

Utilization is a key concept related to LSPs that are stuck in DOWN. There are two types of utilization, and they can be different from each other at any specific time:

Live utilization—This type is used by the routers in the network to signal an LSP path. This type of utilization is learned from the TED by way of NTAD. You might see PCS log entries such as those in the following example. In particular, note the reservable bandwidth (reservable_bw) entries that advertise the RSVP utilization on the link:

Planned utilization—This type is used within NorthStar Controller for path computation. This utilization is learned from PCEP when the router advertises the LSP and communicates to NorthStar the LSP bandwidth and the path the LSP is to use. You might see PCS log entries such as those in the following example. In particular, note the bandwidth (bw) and record route object (RRO) entries that advertise the RSVP utilization on the link:

It is possible for the two utilizations to be different enough from each other that it causes interference with successful computation or signalling of the path. For example, if the planned utilization is higher than the live utilization, a path computation issue could arise in which the PCS cannot compute the path because it thinks there is no room for it. But because the planned utilization is higher than the actual live utilization, there may very well be room.

It’s also possible for the planned utilization to be lower than the live utilization. In that case, the PCC does not signal the path because it thinks there is no room for it.

To view utilization in the Web UI topology map, navigate to Options in the left pane of the Topology view. If you select RSVP Live Utilization, the topology map reflects the live utilization that comes from the routers. If you select RSVP Utilization, the topology map reflects the planned utilization which is computed by the NorthStar Controller based on planned properties.

A better troubleshooting tool in the Web UI is the Network Model Audit widget in the Dashboard view. The Link RSVP Utilization line item reflects whether there are any mismatches between the live and the planned utilizations. If there are, you can try executing Sync Network Model from the Web UI by navigating to Administration > System Settings, and then clicking Advanced Settings in the upper right corner of the resulting window.

Note:

The upper right corner button toggles between General Settings and Advanced Settings.

PCS Out of Sync with Toposerver

If the PCS becomes out of sync with Toposerver such that they do not agree on the state of LSPs, you must deactivate and reactivate the PCEP protocol in order to restore synchronization. Perform the following steps on the NorthStar server.

CAUTION:

Be aware that following this procedure:

Kills the PCEP sessions for all PCCs, not just the one with which there is a problem.
Results in the loss of all user data which then needs to be repopulated.
Has an impact on a production system due to the resynchronization.

Stop the PCE server and wait 10 seconds to allow the PCC to remove all lingering LSPs.
Restart the PCE server.
Restart Toposerver.
Note:
An alternative way to restart Toposerver is to perform a Reset Network Model from the NorthStar Controller web UI (Administration > System Settings, Advanced). See the Disappearing Changes section for more information about the Sync Network Model and Reset Network Model operations.

Disappearing Changes

Two options are available in the Web UI for synchronizing the topology with the live network. These options are only available to the system administrator, and can be accessed by first navigating to Administration > System Settings, and then clicking Advanced Settings in the upper right corner of the resulting window.

Note:

The upper right corner button toggles between General Settings and Advanced Settings.

Figure 6 shows the two options that are displayed.

Figure 6: Synchronization Operations Operations section with Sync Network Model and Reset Network Model options with Sync and Reset red buttons.

Operations section with Sync Network Model and Reset Network Model options with Sync and Reset red buttons.

It is important to be aware that if you execute Reset Network Model in the Web UI, you will lose changes that you’ve made to the database. In a multi-user environment, one user might reset the network model without the knowledge of the other users. When a reset is requested, the request goes from the PCS server to the Toposerver, and the PCS log reflects:

The Toposerver log then reflects that database elements are being removed:

The Toposerver then requests a synchronization with both the Junos VM to retrieve the topology nodes and links, and with the PCEP server to retrieve the LSPs. In this way, the Toposerver relearns the topology, but any user updates are missing. Figure 7 illustrates the flow from the topology reset request to the request for synchronization with the Junos VM and the PCEP Server.

Figure 7: Reset Model Request Network topology diagram showing PCS requesting topology reset from Toposerver. Junos VM and PCEP Server synchronize topology data with Toposerver.

Network topology diagram showing PCS requesting topology reset from Toposerver. Junos VM and PCEP Server synchronize topology data with Toposerver.

Upon receipt of the synchronization requests, Junos VM and the PCEP server return topology updates that reflect the current live network. The PCS log shows this information being added to the database:

Figure 8 illustrates the return of topology updates from the Junos VM and the PCEP Server to the Toposerver and the PCS.

Figure 8: Model Updates Using Reset Network Model Network topology diagram with PCS at the top connected to Toposerver. Toposerver links to Junos VM on the left via NTAD EOR and to PCEP Server on the right via LSP_TOPO_SYNC_END with ID g043506.

Network topology diagram with PCS at the top connected to Toposerver. Toposerver links to Junos VM on the left via NTAD EOR and to PCEP Server on the right via LSP_TOPO_SYNC_END with ID g043506.

You should use the Reset Network Model when you want to start over from scratch with your topology, but if you don’t want to lose user planning data when synchronizing with the live network, execute the Sync Network Model operation instead. With this operation, the PCS still requests a topology synchronization, but the Toposerver does not delete the existing elements. Figure 9 illustrates the flow from the PCS to the Junos VM and PCEP server, and the updates coming back to the Toposerver.

Figure 9: Synchronization Request and Model Updates Using Sync Network Model Network topology synchronization process: PCS initiates synchronization request to Toposerver. Junos VM sends REQ_TOPO_SYNC_FORCE to Toposerver. PCEP Server exchanges REQ_LSP_TOPO_SYNC and LSP_TOPO_SYNC_END with Toposerver.

Network topology synchronization process: PCS initiates synchronization request to Toposerver. Junos VM sends REQ_TOPO_SYNC_FORCE to Toposerver. PCEP Server exchanges REQ_LSP_TOPO_SYNC and LSP_TOPO_SYNC_END with Toposerver.

Investigating Client Side Issues

If you are looking for the source of a problem, and you cannot find it on the server side of the system, there is a debugging flag that can help you find it on the client side. The flag enables detailed messages on the web browser console about what has been exchanged between the server and the client. For example, you might notice that an update is not reflected in the Web UI. Using these detailed messages, you can identify possible miscommunication between the server and the client such as the server not actually sending the update, for example.

To enable this debug flag, modify the URL you use to launch the Web UI as follows:

Note:

If you are already in the Web UI, it is not necessary to log out; simply add ?debug=true to the URL and press Enter. The UI reloads.

Figure 10 shows an example of the web browser console with detailed debugging messages.

Figure 10: Web Browser Console with Debugging Messages Developer tools console displaying JavaScript objects labeled rest_lsp_evt_key, showing network path details for IP routing.

Developer tools console displaying JavaScript objects labeled rest_lsp_evt_key, showing network path details for IP routing.

Accessing the console varies by browser. Figure 11 shows an example: accessing the console on Google Chrome.

Figure 11: Accessing the Google Chrome Console Google Chrome settings menu open with options like new tab, history, bookmarks, zoom, print, and developer tools.

Google Chrome settings menu open with options like new tab, history, bookmarks, zoom, print, and developer tools.

Incomplete Results of the Bandwidth Sizing Scheduled Task

If execution of the bandwidth sizing scheduled task does not result in publishing statistics for all the bandwidth sizing-enabled LSPs, check to see if the traffic statistics are being collected for all the bandwidth sizing-enabled LSPs for the scheduled duration. If traffic statistics are not available, the bandwidth statistics for those LSPs cannot be resized.

You can use the NorthStar Collector web UI to determine whether traffic statistics are being collected:

Open the Tunnel tab in the network information table.
Select the LSPs that have not been resized.
Right-click and select View LSP Traffic.
Click custom in the upper left corner, provide the schedule duration, and click Submit.

Troubleshooting NorthStar Integration with HealthBot

If update device to HealthBot is failing in NorthStar, first check to see if there are errors in the NorthStar web application server logs:

The HealthBot API server logs might also provide helpful information if update device to HealthBot is failing:

To determine if RPM probe data and LDP demands statistics collection is working, access the IAgent container logs. IAgent is used for RPM data (link latency) and LDP demands statistics collection.

To determine if JTI LSP and interface statistics data collection is working, access the fluentd container logs. Native GBP is used for JTI data collection.

To determine if statistics data is being notified from the HealthBot server to the PCS, access the PCS logs to see live statistics notification information:

Collecting NorthStar Controller Debug Files

If you are unable to resolve a problem with the NorthStar Controller, we recommend that you forward the debug files generated by the NorthStar Controller debugging utility to JTAC for evaluation. Currently all debug files are located in subdirectories under the u/wandl/tmp directory.

To collect debug files, log in to the NorthStar Controller CLI, and execute the command u/wandl/bin/system-diagnostic.sh filename.

The output is generated and is available from the /tmp directory in the filename.tbz2 debug file.

Remote Syslog

Most of NorthStar processes use rsyslog which is defined in /etc/rsyslog.conf. For a detailed information about using rsyslog, refer to http://www.rsyslog.com/doc for the specific rsyslog version running on your Linux system.

Increasing the Scale of SNMP Collection

To increase the scale of SNMP collection within a polling interval of 5 minutes, perform the following tasks:

By using a text editing tool like vi, open the supervisord_snmp_slave.conf file for editing.

The configuration file opens.

Add the following command to increase the number of threads from 100 to 200:

Add more workers (for example, worker3) by duplicating the preceding worker:

Add the workers in the group statement:

Best Practice:
The number of workers that you can add should be less than or equal to the number of cores in the CPU.
Restart the collector:* group in the supervisord:
View the supervisorctl status of worker1, worker2, and worker3 to confirm that they are up and running:
Ensure that you see a few worker1 processes in the output but only one parent process each for worker2 and worker3:

To increase the number of threads for higher scalability, perform the following tasks:
1. By using a text editing tool like vi, open the data_gateway.py file for editing.
  
  The configuration file opens.
2. Increase the number of threads in the pool from 10 to 20:
3. Stop the collector_main:data_gateway process and restart the process:
To increase the throughput for higher scalability, perform the following tasks:
1. By using a text editing tool like vi, open the es_publisher.cfg file for editing.
  
  The configuration file opens.
2. Configure the following parameters:
  Note:
  The maximum number of records to be sent in a single operation to the ElasticSearch database (batch_size) is 5000, while the maximum number of threads (in a thread pool) that can be run to collect SNMP statistics (pool_size) is 20.
To collect data from more number of router interfaces per poll, perform the following tasks:
1. Navigate to the Device Profile (Administration > Device Profile) page in the NorthStar Controller GUI.
2. In the Device List, select a router and click Modify.
  
  The Modify Device(s) page appears.
3. In the Name column of the User-defined Properties tab, specify the name of the property as bulk_size. In the Value column, configure the bulk size as 100.
  
  Bulk size indicates the number of interfaces collected each time the network is polled.
4. Click Modify.
  
  You are redirected to the Device Profile page, where a confirmation message appears, indicating that the changes are saved.

ON THIS PAGE

NorthStar Controller Troubleshooting Guide

NorthStar Controller Log Files

Empty Topology

NTAD Version

Incorrect Topology

Missing LSPs

LSP Controller Statuses

PCC That is Not PCEP-Enabled

LSP Stuck in PENDING or PCC_PENDING State

LSP That is Not Active

PCS Out of Sync with Toposerver

Disappearing Changes

Investigating Client Side Issues

Incomplete Results of the Bandwidth Sizing Scheduled Task

Troubleshooting NorthStar Integration with HealthBot

Collecting NorthStar Controller Debug Files

Remote Syslog

Increasing the Scale of SNMP Collection

Related Documentation