AI Infrastructure Field Day 2
Demonstration of Day 2 AI network operations, monitoring and anomaly detection with Aviz
33m
Aviz Networks' AI Infrastructure Field Day demonstration focused on Day 2 operations, monitoring, and anomaly detection for AI workloads. The core challenge addressed is the specialized networking requirements of AI, including multiple networks, differentiated QoS, and the need to manage compute as part of the end-to-end network topology. Aviz presented solutions for orchestrating AI fabrics based on Sonic and NVIDIA's Spectrum-X reference architecture, showcasing a customer workflow that includes network design, Day 0 infrastructure deployment, Day 1 tenant onboarding and traffic isolation, and Day 2 operations like adding Pods, handling alerts, and troubleshooting.
The presentation demonstrated Aviz's orchestration capabilities for Sonic-based and NVIDIA RA-based AI fabrics. For Sonic, the presenter showed how to orchestrate the fabric using YAML-based intent, validating configurations, and performing operational checks. The demonstration emphasized the ease of use of industry-standard CLI, built-in validation, and the ability to compare configurations to identify any drift. With the NVIDIA Spectrum-X platform, the presentation highlighted agentless orchestration, the use of NVIDIA AIR for simulating deployments, and config comparison.
Finally, the presentation detailed Aviz's monitoring and anomaly detection features. The tool provides comprehensive monitoring with a bottom-up approach for networks, servers, and GPUs. The demo showed how to view various telemetry data, including traffic, queue drops, and GPU health metrics. The presentation also covered Aviz's built-in anomaly detection system, which allows users to create custom rules and receive notifications through tools like Slack and Zendesk. The system includes curated rules, role-based access control, and configuration comparison capabilities to streamline operations and reduce potential errors.
Presented by Ravi Kumar, Solutions Architect, Aviz Networks. Recorded live in Santa Clara, California, on April 25, 2025, as part of AI Infrastructure Field Day. Watch the entire presentation at https://techfieldday.com/appearance/aviz-networks-presents-at-ai-infrastructure-field-day-2/ or https://techfieldday.com/event/aiifd2/ for more information.
Up Next in AI Infrastructure Field Day 2
-
Introduction to Multi-Tenancy & Netwo...
Netris helps GPU-based AI infrastructure operators automate their networks, provide multi-tenancy and isolation, and offer essential cloud networking features like VPCs, internet gateways, and load balancers. Netris focuses on network software designed for AI and cloud infrastructure operators be...
-
How it works. Multi-Tenancy & Network...
Netris, as presented by CEO Alex Soroyan, offers cloud-provider-grade network automation and multi-tenancy software tailored for AI Infrastructure operators. The core of their solution lies in the Netris Controller, which acts as the centralized source of truth for network engineers. It allows fo...
-
Multi-Tenancy & Network Automation fo...
Netris CEO Alex Soroyan demonstrated the multi-tenancy and network automation solution in AI infrastructure. The presentation began with a live demonstration of the Netris controller, showcasing how it facilitates the setup and management of AI infrastructure networking. Utilizing Terraform modul...