Using Custom Telemetry Data in an IBA Probe

date_range 04-Mar-24

SUMMARY This topic describes how to create an IBA probe and detect and store any anomalies in a historical database for reference.

So far in our walkthrough, we've created a custom telemetry collector service that defines the data you want to collect from your devices. Now let's ingest this data into IBA probes in your blueprint so that Apstra can visualize and analyze the data.

Create a Probe

First, we'll create a new probe in your deployed blueprint so that Apstra can ingest data from your custom telemetry collector. In this example, we'll focus on a minimal set of configurations for the simple use case of visualizing BFD session data and generating anomalies (alerts) when sessions are down.

Note:

Data Center and Freeform blueprints support IBA probes with the Custom Telemetry Collection.

From your blueprint, navigate to Analytics > Probes, and then click Create Probe > New Probe.
Enter a name and (optional) description (in this example, BFD-Example-Probe), then click Add Processor.
Select a processor type. For our example, we selected the Extensible Service Data Collector processor.
Click Add to add the processor to the probe. See the Juniper Apstra User Guide for information about the different processors.
Click Create to create the probe and return to the table view.
To the right of the Graph Query field click the Select a predefined graph query button, then select DC – All managed devices (any role) from the Predefined Query drop-down.
This query determines the scope within the blueprint in which the telemetry collection is executed. This means if a device in your blueprint is not matched by the graph query, the telemetry collection service will not start for that device.

The graph query specifically matches all system nodes in the graph database of your blueprint. Each managed device, such as a leaf switch or spine switch, shows as a system node in the graph.

In the Predefined Query we selected above, the query matches all nodes of the type system, which in deploy mode has a role of leaf, access, spine, or superspine.
Click Update to return to the table view.
In the System ID field, enter system.system_id. This entry tells the probe that the graph query will match on your managed devices under the name system (name='system’).
The attribute system_id on each system nodes refers to the system ID of each device. This attribute is what Apstra uses to uniquely identify each device.
Select BFD from the Service name drop-down list.
Select the Data Type.
- Select Dynamic Text if your telemetry service collects string as the value type.
- Select Dynamic Number if the service collects integer as the value type.
In our example, we chose Dynamic Text because the BFD session state contains the string values Up and Down.
Click Create Probe.
Navigate to the output stage of the data collector processor to verify that the probe is correctly ingesting data from your custom telemetry collector.

Congratulations! You successfully create a probe!

Customize a Probe

We created a working probe that collects the BFD state for every device in your network. Now let’s explore a couple of useful customization options to fine-tune your probe.

Service Interval

The service interval determines how often your telemetry collection service fetches data from devices and ingests them into the probe. This interval is an important parameter to be aware of because an overly aggressive interval can cause excessive load on your devices. The optimal interval will depend on the data you are collecting. For example, a collector fetching the content of a large routing table with thousands of entries can cause a higher load than collecting the status of a handful of BFD sessions.

Query Tag Filter

Another useful customization option is the Query Tag Filter. Let’s say you tagged some switches in your blueprint as storage for a specific monitoring use case. You can configure this filter to perform the telemetry collection only on devices with the matching tag as shown in the following example:

Displaying the raw data from your custom telemetry collector shows only the raw data, so it may be difficult to conclude whether it signifies your network's normal or anomalous state. With Asptra, you are proactively notified when any anomaly is detected.

Performing Analytics

An IBA probe functions as an analytics pipeline. All IBA probes have at least one source processor at the start of their pipeline. In our example, we added an Extensible Service Data Collector processor that ingests data from your custom telemetry collector.

You can chain additional processors in the probe to perform additional analytics on the data to provide more meaningful insight into your network’s health. These processors are referred to as analytics processors.

Analytics processors enable you to aggregate and apply logic to your data and define an intended state (or a reference state) to raise anomalies. For instance, you might not be interested in instantaneous values of raw telemetry data, but rather in an aggregation or trends.

Analytics processors aggregate information such as calculating average, min/max, standard deviation, and so on. You can then compare the aggregated data against expectations so that you can identify whether the data is inside or outside a specified range, in which case an anomaly is raised. You might also want to check whether this anomaly is sustained for a period of time and exceeds a specific threshold. An anomaly is flagged only when the threshold is exceeded to avoid flagging anomalies for transient or temporary conditions. You can achieve this by configuring a Time_In_State processor.

Table 1 describes the different types of analytics processors.

Table 1: Analytics Processors
Type of Processor	Description
Range processors Processor names: Range, State, Time_In_State, Match_String	Range processors define reference state and generate anomalies.
Grouping processors Processor names: Match_Count, Match_perc, Set_Count, Sum, Avg, Min, Max, and Std_Dev	Group processors aggregate and process data before feeding into the range processors. These processors can: Produce a per-device count of protocol states. Produce a sum of counters from multiple devices to represent a total over the fabric.
Multi-input processors Processor names: Match_Count, Match_perc, Set_Count, Sum, Avg, Min, Max, and Std_Dev	Analytics processors take input from multiple stages. These processors can: Produce a single output data set that is a union of input from multiple stages. Perform a logical comparison between input from multiple stages.

For detailed descriptions of all analytic processors, see Probe Processor (Analytics) in the Juniper Apstra User Guide.

Note:

Multi-input processors are not supported for dynamic data types (dynamic text or dynamic number).

In the next section, we'll configure our BFD example probe to detect and raise anomalies.

Raising Anomalies and Storing Historical Data

Now we'll configure our example probe to detect and raise anomalies if a BFD session goes down and store the anomalies in a historical database for reference.

First, add a second processor to the probe you created in Create a Probe, then click Add Processor.
Select the Match Count processor and give the processor a descriptive name, such as Down sessions count.
The match count processor counts the number of BFD sessions in the Down state and groups the count by device.
Configure the second processor, then Enter Down in the Reference State field.
This processor configures the probe pipeline so that data from the previous processor is fed into each other.

When you update the probe, the output shows the number of BFD sessions in the Down state by each device.
Add the third and final processor. This processor produces anomalies to alert you when there are one or more BFD sessions in the Down state.
Click Add Processor, then select the Match Count processor.
Give the processor a descriptive name (in this example, BFD anomaly (down > 0), then click Add.
Configure the third processor.
1. Enter the Input Stage – Stage Name, then select value for the Column name. In our example, we defined the stage name as Down sessions count.
2. Set the Anomalous Range to More than equal to and 1.
3. Click Raise Anomaly.
While still in the probe configuration interface, click Enable Metric Logging, then select the output stage for your second processor. This action enables historical logging of data.
Click Update the Probe.
If you have any BFD sessions in the Down state, the probe generates anomalies for the BFD sessions.
Check Enable Streaming in the probe configuration.
Finally, select the Data source: Time Series view to see the history of changes in the data value monitored by this stage.