- About this Document
- Solution Benefits
- AI Use Case and Reference Design
- Solution Architecture
- Configuration Walkthrough
- NVIDIA Configuration
- Terraform Automation of Apstra for the AI Fabric
- Validation Framework
- Network Connectivity: Reference Examples
- WEKA Storage Solution
- Tested Optics
- Results Summary and Analysis
- Recommendations
Validation Framework
Platforms / Devices Under Test (DUT)
Table 25: Platforms / Devices Under Test (DUT)
Component | Frontend | Storage Backend | GPU Backend (Cluster 1 and 2) |
---|---|---|---|
Architecture | 3-stage clos | 3-stage clos | 3-stage clos rail optimized |
Spine nodes | QFX5130-32CD x 2 | QFX5220-32CD x 2 | QFX5230-64CD x 2 (cluster 1) PTX-10008 JNP10K-LC1201 (cluster 1) QFX5240-64OD x 2 (cluster 2) |
Leaf nodes | QFX5130-32CD x 1 ( frontend-gpu-leaf ) QFX5130-32CD x 1 ( frontend-weka-leaf ) | QFX5220-32CD x 2 ( storage-backend-gpu-leaf ) QFX5220-32CD x 2 ( storage-backend-weka-leaf ) | QFX5220-64CD x 8 (cluster 1 – stripe 1) QFX5230-64CD x 8 (cluster 1 – stripe 2) QFX5240-64CD x 8 (cluster 2 – stripes 1-2) |
Leaf nodes <=> spine node links | 2 x 400GE (per frontend-leaf <=> frontend-spine link) | 2 x 400GE (per storage-backend-weka-leaf <=> storage-backend-spine) 3 x 400GE (per storage-backend-gpu-leaf <=> storage-backend-spine) | 2 x 400GE (per gpu-backend-spine <=> gpu-backend-leaflink) |
Number of NVIDIA DGX H100 GPU servers | 2 (Cluster 2 - stripe 1) 2 (Cluster 2 - stripe 2) | ||
Number of NVIDIA HGX A100 GPU servers | 4 (Cluster 1 - stripe 1) 4 (Cluster 1 - stripe 1) | ||
NVIDIA DGX H100 GPU servers <=> GPU leaf nodes links | 1 x 100GE (per gpu server <=> frontend-gpu-leaflink) | 1 x 200GE (per gpu server <=> storage-backend-gpu-leaf link) | 1 x 400GE (Cluster 2) (per gpu server <=> gpu-backend-leaflink) |
NVIDIA HGX A100 GPU servers <=> GPU leaf nodes links | 1 x 100GE (per gpu server <=> frontend-gpu-leaflink) | 1 x 100GE (per gpu server <=> storage-backend-gpu-leaf link) | 1 x 200GE (Cluster 1) (per gpu server <=> gpu-backend-leaflink) |
Total number of GPUs | 96: 32 x stripe in cluster 1 16 x stripe in cluster 2 | ||
WEKA storage servers | 8 | ||
WEKA storage servers <=> WEKA storage leaf nodes links | 1 x 100GE (per weka server <=> frontend-weka-leaf link) | 1 x 200GE (per weka server <=> storage-backend-weka-leaf link) | N/A |