NorthStar Analytics Raw and Aggregated Data Retention

Raw data logs are retained in Elasticsearch for a user-configurable number of days. Data is also rolled up (aggregated) every hour and retained for a user-configurable number of days. The purpose of aggregation is to make longer retention of data more feasible given limited disk space. When you modify these retention parameters, keep in mind that there is an impact on your storage resources.

Stored hourly aggregated data filenames use the following format: rollups-northstar-yyyy-mm-dd.

The parameters described in Table 1 work together to control data retention and aggregation behaviors. You can modify these settings using the NorthStar CLI as described in Configuring NorthStar Settings Using the NorthStar CLI in the NorthStar Controller/Planner Getting Started Guide. Use the set northstar system scheduler tasks command hierarchy to access all of these parameters.

Table 1: Data Retention and Aggregation Parameters
Parameter	Description
interval (collection-cleanup)	To modify, use the `set northstar system scheduler tasks collection-cleanup interval` command. Controls how often the collection-cleanup system task is run, in number of days expressed as “d” or “days”. Examples: 1days, 4d. This task executes the collector-utils.py script to clean up old logs. The default is one day (1d). To disable collection cleanup, set the value to 0d. The collector-utils.py script runs at approximately 1:00 AM, NorthStar server time. The collector-utils.py script uses the Elasticsearch APIs to clean up “old” data as follows: Logs of raw data older than the value of the raw-data-retention-duration parameter are purged. Logs of hourly aggregated data older than the value of the rollup-data-retention-duration parameter are purged. The collection-cleanup task is called from the NorthStar server. You can view (but not modify) the cleanup task by navigating to `Administration` > `Task Scheduler`.
raw-data-retention-duration	To modify, use the `set northstar system scheduler tasks collection-cleanup raw-data-retention-duration` command. Defines what is considered an “old” log of raw data in number of days. The units can be entered as “d” or “days”. The default is 14 days (14d or 14days), meaning that raw data logs are retained in Elasticsearch for 14 days. To disable the retention of raw data logs, set the value to 0d.
rollup-data-retention-duration	To modify, use the `set northstar system scheduler tasks collection-cleanup rollup-data-retention-duration` command. Defines what is considered “old” aggregated data in number of days. The units can be entered as “d” or “days”. The default is 180 days (180d or 180days), meaning that hourly aggregated data is retained in Elasticsearch for 180 days. To disable retention of aggregated data, set the value to 0d.
interval (rollup)	To modify, use the `set northstar system scheduler tasks rollup interval` command. Note: We recommend that you do not change this default value except to disable aggregation. If you want to disable data aggregation, set the value to 0h. Sets how often the ESRollup system task is run in number of hours. The units can be entered as “h” or “hours”. The ESRollup system task executes the esrollup.py script to aggregate the previous interval’s data. The default is 1 hour (1h or 1hours). The esrollup.py script uses the Elasticsearch APIs to perform the data aggregation. The ESRollup task is called from the NorthStar server. You can view (but not modify) the rollup task by navigating to `Administration` > `Task Scheduler`.

The NorthStar REST API supports telemetry data aggregation with the additional parameters described in Table 2. See the NorthStar REST API documentation for more information.

Table 2: Additional Aggregation Parameters Used for API Queries
Parameter	Description
disable-rollup-query	If set, disables rollup query functionality from hourly aggregated data.
rollup-query-cutoff-interval	If set, and the requested time range is greater than rollup-query-cutoff-interval from now, the query uses the roll-up index to search data.

To give you an example of how aggregation parameters work together, suppose you set the following:

In this example, raw data logs older than 30 days and hourly aggregated data logs older than 800 days are set to be purged every seven days.

The data included in the rollup tasks (aggregation types, fields, and counters) is defined in the view-only esrollup_config.json file located in the /opt/northstar/utils directory.

To view the system tasks that launch the esrollup.py and collector-utils.py scripts, navigate to Administration > Task Scheduler in the NorthStar web UI. In the Task list, the Name column indicates CollectionCleanup or ESRollup Task. In the Type column, they are designated as ExecuteScript. An example is shown in Figure 1.

Figure 1: Task List Showing System Tasks

There is an optional column in the task list that indicates whether each task is a system task. Hover over any column heading, click the down arrow that appears, and highlight Columns to display a list of available columns. Click the check box for System Task to select the System Task column (true/false) for inclusion in the display.

When you select a system task, Summary, Status, and History tabs are available at the bottom of the window.

NorthStar Analytics Raw and Aggregated Data Retention

Related Documentation