NorthStar Analytics Raw and Aggregated Data Retention
Raw data logs are retained in Elasticsearch for a user-configurable number of days. Data is also rolled up (aggregated) every hour and retained for a user-configurable number of days. The purpose of aggregation is to make longer retention of data more feasible given limited disk space. When you modify these retention parameters, keep in mind that there is an impact on your storage resources.
Stored hourly aggregated data filenames use the following format: rollups-northstar-yyyy-mm-dd.
The parameters described in Table 1 work together to control
data retention and aggregation behaviors. You can modify these settings
using the NorthStar CLI as described in Configuring NorthStar Settings Using the NorthStar CLI. Use the set northstar system scheduler tasks
command
hierarchy to access all of these parameters.
Parameter |
Description |
---|---|
interval (collection-cleanup) |
To modify, use the set northstar system scheduler tasks collection-cleanup interval command. Controls how often the collection-cleanup system task is run, in number of days expressed as “d” or “days”. Examples: 1days, 4d. This task executes the collector-utils.py script to clean up old logs. The default is one day (1d). To disable collection cleanup, set the value to 0d. The collector-utils.py script runs at approximately 1:00 AM, NorthStar server time. The collector-utils.py script uses the Elasticsearch APIs to clean up “old” data as follows:
The collection-cleanup task is called from the NorthStar server. You can view (but not modify) the cleanup task by navigating to Administration > Task Scheduler. |
raw-data-retention-duration |
To modify, use the set northstar system scheduler tasks collection-cleanup raw-data-retention-duration command. Defines what is considered an “old” log of raw data in number of days. The units can be entered as “d” or “days”. The default is 14 days (14d or 14days), meaning that raw data logs are retained in Elasticsearch for 14 days. To disable the retention of raw data logs, set the value to 0d. |
rollup-data-retention-duration |
To modify, use the set northstar system scheduler tasks collection-cleanup rollup-data-retention-duration command. Defines what is considered “old” aggregated data in number of days. The units can be entered as “d” or “days”. The default is 180 days (180d or 180days), meaning that hourly aggregated data is retained in Elasticsearch for 180 days. To disable retention of aggregated data, set the value to 0d. |
interval (rollup) |
To modify, use the set northstar system scheduler tasks rollup interval command. Note:
We recommend that you do not change this default value except to disable aggregation. If you want to disable data aggregation, set the value to 0h. Sets how often the ESRollup system task is run in number of hours. The units can be entered as “h” or “hours”. The ESRollup system task executes the esrollup.py script to aggregate the previous interval’s data. The default is 1 hour (1h or 1hours). The esrollup.py script uses the Elasticsearch APIs to perform the data aggregation. The ESRollup task is called from the NorthStar server. You can view (but not modify) the rollup task by navigating to Administration > Task Scheduler. |
The NorthStar REST API supports telemetry data aggregation with the additional parameters described in Table 2. See the NorthStar REST API documentation for more information.
Parameter |
Description |
---|---|
disable-rollup-query |
If set, disables rollup query functionality from hourly aggregated data. |
rollup-query-cutoff-interval |
If set, and the requested time range is greater than rollup-query-cutoff-interval from now, the query uses the roll-up index to search data. |
To give you an example of how aggregation parameters work together, suppose you set the following:
collection-cleanup interval=7d raw-data-retention-duration=30d rollup-data-retention-duration=800d
In this example, raw data logs older than 30 days and hourly aggregated data logs older than 800 days are set to be purged every seven days.
The data included in the rollup tasks (aggregation types, fields, and counters) is defined in the view-only esrollup_config.json file located in the /opt/northstar/utils directory.
To view the system tasks that launch the esrollup.py and collector-utils.py scripts, navigate to Administration > Task Scheduler in the NorthStar web UI. In the Task list, the Name column indicates CollectionCleanup or ESRollup Task. In the Type column, they are designated as ExecuteScript. An example is shown in Figure 1.
There is an optional column in the task list that indicates whether each task is a system task. Hover over any column heading, click the down arrow that appears, and highlight Columns to display a list of available columns. Click the check box for System Task to select the System Task column (true/false) for inclusion in the display.
When you select a system task, Summary, Status, and History tabs are available at the bottom of the window.