RCI Fault Model (API)
You can access complete Apstra API documentation from the web interface in the Platform > Developers section.
- A blueprint is associated with zero or more Root Cause Identification instances.
- Root Cause Identification instances are enabled (created) / disabled (deleted) via CRUD API for Root Cause Identification sub-resource under the blueprint.
- The instances that can be created depends on the reference design of the blueprint. In this first phase of Root Cause Identification, only two_stage_l3clos has Root Cause Identification support, and right now it only allows one Root Cause Identification instance per blueprint.
Create Root Cause Identification Instance
POST /api/blueprints/<blueprint_id>/arca Request Payload schema { "model_name": s.String() # Name of ARCA instance's system fault model (ref design specific) "trigger_period": s.Float(min=10.0) # ARCA instance runs every <trigger_period> seconds. }
Example for blueprints for ref design two_stage_l3clos:
{ "model_name": "default", "trigger_period": 10.0 } Return values: 201 - Successfully created the RCI instance. Response payload: {"id": <RCI instance ID>} The ID is used in GET, PUT, DELETE 404 - Blueprint does not exist or is not deployed 422 - Validation error. Response payload: {"error": <message>} Possible error messages: Model name is not found for the reference design An ARCA instance already exists for given model name trigger_period is too small
Update Root Cause Identification Instance
Using the PUT API, you can tweak the execution frequency of the Root Cause Identification instance.
PUT /api/blueprints/<blueprint_id>/arca/<arca_id> Request Payload schema { "trigger_period": s.Float(min=10.0) } Return values: 200 - Update succeeded. 404 - ARCA instance not found. 422 - Validation error. Response payload: {"error": <message>} Possible error messages: trigger_period is too small
Delete Root Cause Identification Instance
Using the GET API, you can obtain the current status (set of root causes) of the Root Cause Identification instance.
GET /api/blueprints/<blueprint_id>/arca/<arca_id> Return values: 200 - see response schema below 404 - ARCA instance not found
Response payload schema { "id": String, # ARCA instance ID "model_name": String, # see POST payload "trigger_period": Float, # see POST payload "state": Enum("created", "operational"), "config_updated_at": Timestamp # of last update to instance via POST/PUT "status_updated_at": Timestamp # of last update to ARCA results "root_cause_count": Integer(min=0) # Number of root causes identified "root_causes": List(ROOT_CAUSE_OBJ) # Actual root causes }
Timestamps are in ISO8601 format in UTC timezone, e.g. “2018-10-16T22:12:34+0000” If state == “created”, then Status_updated_at == UNIX epoch root_cause_count == 0 “root_causes” key is not returned
Each ROOT_CAUSE_OBJ has the following schema:
{ "id": String, # Unique ID for the root cause in the ARCA instance "context": String, # Encoded context such as references to graph nodes "description": String, # Human-readable text, e.g. "link <blah> broken" "timestamp": Timestamp, # of when RC is detected (ISO8601 format) "symptoms": List(SYMPTOM_OBJ), # List of symptoms; always non-empty }
Notes on root cause detection and IDs: A root cause may be detected multiple times over the blueprint’s lifetime. For instance, a root cause is defined for broken cable between spine1 and leaf1. This root cause can appear at any time, and it may disappear once the problem is fixed. A root cause has a unique ID scoped in the ARCA instance. This means that the ID may appear and disappear corresponding to whether the problem occurs or gets fixed, e.g. cable gets broken or reconnected What to expected as root cause ID: In two_stage_l3clos the root cause ID is a composition of graph node and relationship IDs, and some immutable but readable name of the root cause. Example: <graph link node id>/broken.
Each SYMPTOM_OBJ has the following schema:
{ "id": String, # Unique ID for the symptom in the ARCA instance "context": String, # Encoded context such as system ID, service name "description": String, # Readable, e.g. "interface swp1 on leaf1 is down" }
Given the same ARCA system fault model, the set of symptom IDs are always the same for given root cause. However, the context may be different. For instance, the symptom “interface swp1 on leaf1 is down” is the same, while context of different instances of this symptom may have different system IDs depending on which system ID is assigned to leaf1 when the root cause for this symptom is detected. Example symptom ID: <graph interface node id>/down
List Root Cause Identification Instances
GET /api/blueprints/<blueprint_id>/arca Return values 200 - see response schema below 404 - blueprint not found or blueprint not deployed
Response schema:
{ "items": List(ARCA_INSTANCE_DIGEST), # list may be empty }
ARCA_INSTANCE_DIGEST has the same schema as the response payload of GET individual ARCA instance, except that it does not contain the “root_causes” key.
In this phase, for two_stage_l3clos blueprints, there is at most 1 element in the list, because only 1 ARCA instance is allowed per blueprint.