Guidelines for Aggregating Junos Telemetry Interface Data
One important feature of the Junos telemetry interface is that data processing occurs at the collector that streams data, rather than the device. Data is not automatically aggregated, but it can be aggregated for analysis.
Data aggregation is useful in the following scenarios:
Data for the same metric over fixed spans of time, such as, the average number physical interface ingress errors over a 30-second interval.
Data from different sources (such as multiple line cards) for the same metric, such as label-switched path (LSP) statistics or filter counter statistics.
Data from multiple sources, such as input and output statistics for aggregated Ethernet interfaces.
The follow sections describe how to perform data aggregation for various scenarios. The examples in these sections use the InfluxDB time-series database to accept queries on telemetry data. InfluxDB is an open source database written in Go specifically to handle time-series data.
Aggregating Data Over Fixed Time Spans
Aggregating data for the same metric over fixed spans of time is a common and useful way to detect trends. Metrics can include gauges, that is, single values, or cumulative counters. You might also want to aggregate data continuously.
Example: Aggregating Data for Gauge Metrics
In this example, data for
JuniperNetworksSensors.jnpr_interface_ext.interface_stats.egress_queue_info.current_buffer_occupancy
from port.proto
is written to the InfluxDB
database with tags that identify the host name, an interface
name and corresponding queue number and measurement called
current_buffer_occupancy
. See
Table 1 for the specific values used in this example.
Time Stamp (seconds) |
Value |
Tags |
---|---|---|
1458704133 |
1547 |
queue_number=0,interface_name=‘xe-1/0/0’,host=‘sjc-a’ |
1458704143 |
3221 |
queue_number=0,interface_name=‘xe-1/0/0’,host=‘sjc-a’ |
1458704155 |
4860 |
queue_number=0,interface_name=‘xe-1/0/0’,host=‘sjc-a’ |
1458704166 |
6550 |
queue_number=0,interface_name=’xe-1/0/0’,host=’sjc-a’ |
Each measurement data point has a timestamp and recorded value.
In this example, the tag queue_number
is
the numerical identifier of the interface queue.
To aggregate this data over 30-second intervals, use the following influxDB query:
select mean(value) from current_buffer_occupancy where time >= $time_start and time <= $time_end and queue_number=’0’ and interface_name=’xe-1/0/0’ and host=’sjc-a’ group by time(30s)
For $time_start
and $time_end
,
specify the actual range of time.
Example: Aggregating Data for Cumulative Statistics
Some Junos telemetry interface sensors report cumulative counter
values, such as the number of ingress packets, defined as
JuniperNetworksSensors.jnpr_interface_ext.interface_stats.ingress_stats.packets
.
It is common to derive traffic rates from packet or byte counters. Unlike with gauge metrics, the initial data point in the series for cumulative counters is used only to set the baseline.
Use the following guidelines to create a database query for cumulative statistics:
-
Calculate the cumulative value for a specific time interval. You can calculate either an average among several data points recorded during the time interval, or you can interpolate a value. All data points should belong to the same series. If a counter reset has occurred between the two data points reported at different times, do not use both data points.
-
Determine the appropriate value for the previous time interval. If a counter has been reset since the last update, declare that value as unavailable.
-
If the previous interval is available, calculate the difference between the data points and the traffic rate.
These guidelines are summarized in the following influxDB query.
This query assumes that data is stored in the measurement
ingress_packets
. The query uses the
same tags as the gauge metric example as well as the tag for
counter initialization time, init_time
. The
query uses average values over a 30-second time interval. It
calculates the rate for the metrics that have the same
counter initialization.
select non_negative_derivative(mean(value)) from ingress_packets where time >= $time_start and time <= $time_end and interface_name=’xe-1/0/0’ and host=’sjc-a’ group by time(30s), init_time
Use the following query to calculate the number of packets received over an interval of time, without deriving the rate.
select difference(mean(value)) from ingress_packets where time >= $time_start and time <= $time_end and interface_name=’xe-1/0/0’ and host=’sjc-a’ group by time(30s), init_time
In some cases, more than one aggregated data point is returned by
the query for a particular time interval. For example, four
data points are available for a time interval. Two data
points have init_time t0
, and the other two
have init_time t1
. You can run a query that
uses the last change timestamp tag,
last_change
, instead of
init_time
, to calculate the
difference and to derive the rate between the two data
points with the same last change timestamp.
select difference(mean(value)) from ingress_packets where time >= $time_start and time <= $time_end and interface_name=’xe-1/0/0’ and host=’sjc-a’ group by time(30s), last_change
These queries can all be run as continuous queries and can periodically populate new time-series measurements.
Aggregating Data From Multiple Sources
Certain metrics are reported from multiple line cards or packet forwarding engines. It is useful to aggregate data derived from different sources in the following scenarios:
-
Packet and byte counts for label-switched paths (LSPs) are reported separately by each line card. However, a view of LSP paths for the entire device is required for path computation element controllers.
-
For Juniper Networks devices that support virtual output queues, the tail drop or random early detection drop statistics for each queue are reported separately by each line card for every physical interface. It is useful to be able to aggregate the statistics for all the line cards for an interface.
-
Filter counters for a firewall filter attached to a forwarding table or to an aggregated Ethernet interface are reported separately by each line card. It is useful to aggregate the statistics for all the line cards.
To aggregate data from multiple sources, perform the following:
Aggregate data for a specific period of time for each source, for example, each line card.
Aggregate the data you derive for each source in step 1.
For data stored in an InfluxDB database, you can complete step 1
in the procedure by running a continuous query and populating a new
measurement. We strongly recommend that you group the data points
according to each source. For example, for LSP statistics, the
component_id
in the the gpb message
identifies the line card sending the data. Group the data points
based on each unique component_id
.
Example: Aggregating Data from Multiple Sources
In this example, you run two queries to derive the LSP packet rate for data from all line cards.
First, you run the following continuous query on the measurement
named lsp_packet_count
for each
component_id
tag and the
counter_name
tag. Each unique
component_id
tag corresponds to a
different line card. This query populates a new measurement,
lsp_packet_rate.
select non_negative_derivative(mean(value)) as value from lsp_packet_count into lsp_packet_rate group by time(30s), component_id, counter_name, host
The LSP statistics sensor does not report counter initialization time.
Use the new measurement derived from this continuous
query—lsp_packet_count
—to run the
following query, which aggregates data from all line cards
for packet rates for an LSP named
lsp-sjc-den-1
.
select sum(value) from lsp_packet_rate where counter_name=’lsp-sjc-den-1’, host=’sjc-a’
Because this query does not group data according to the
component_id
tag, or line card,
the LSP packet rates from all components, or line
cards, are returned.
Aggregating Data for Multiple Metrics
It can be useful to aggregate metrics for multiple values. For example, for aggregated Ethernet interfaces, you would typically want to track packet and byte rates for each interface member as well as interface utilization for the aggregated link.
Example: Aggregating Multiple Metric Values
In this example, you run the following two queries:
-
Continuous query to derive ingress packet counts for each member link in an aggregated Ethernet interface
-
Query to aggregate packet count data for all the member links that belong to the same aggregated Ethernet interface
The following continuous query derives a measurement,
ingress_packets
, for each member
link in an aggregated Ethernet interface. The
interface_name
tag identifies each
member interface. You also use the
parent-ae-name
tag to identify
membership in a specific aggregated Ethernet interface.
Grouping each member link with the
parent-ae-name
tag ensures that
data is collected only for current member links. For
example, an interface might change its membership during the
reporting interval. Grouping member interfaces with the
specific aggregated Ethernet interface means that data for
the member link will not be transferred to the new
aggregated Ethernet interface of which it is now a
member.
select difference(mean(value)) as value from ingress_packets into ingress_packets_difference group by time(30s), component_id, interface_name, host, parent-ae-name
The following query aggregates data for the ingress packets for the aggregated Ethernet interface, that is all member links.
select sum(value) from ingress_packets_difference where parent-ae-name=’ae0’ and host=’sjc-a’
This query aggregates data for aggregated Ethernet
interface ae0
. The
parent-ae-name
tag does not
verify the actual member links.