Arista CloudVision 360° Observability for AI
Networking Field Day 40
•
13m
Monitoring and managing complex AI infrastructure requires moving beyond traditional networking tools that treat the environment as a black box. Praful Bhaidsana explains that the industry has long suffered from a mean time to truth problem where network operators are blamed for issues they cannot properly diagnose because they lack visibility into what is connected to the network. Arista aims to change this Stone Age approach by evolving from simple monitoring to 360-degree observability. This strategy is centered on CloudVision, a NetOps platform that utilizes a common network data lake called NetDL to aggregate high-fidelity streaming telemetry from every Arista device across the data center, campus, and WAN.
The architecture relies on the fact that Arista's EOS provides consistent, reliable state data, ranging from MAC address tables and routing updates to microburst signals and configuration changes. This information is stored in a time-series database, allowing operators to travel back in time to compare network states before and after an incident. To manage the resulting deluge of data, Arista employs an AI/ML engine known as AVA, or Autonomous Virtual Assist. AVA identifies patterns and anomalies, filtering out the noise to show only the relevant signals. This allows human operators to focus on making informed decisions rather than spending hours manually correlating events across different silos.
Furthermore, CloudVision has opened its ecosystem to ingest data from third-party systems, AI job orchestrators, and compute and storage metrics via Prometheus. This integration is critical for AI environments where a job stall could be caused by anything from a GPU failure to a NIC issue. Arista has introduced a dedicated AI jobs dashboard that correlates specific training jobs with the underlying flows, servers, and switches. To simplify interactions with this massive dataset, a digital virtual assistant allows users to query their infrastructure using natural language. This integrated approach ensures that expensive GPU resources do not sit idle and that the resolution of complex performance bottlenecks can happen in minutes rather than days.
Presented by Praful Bhaidasna, Head of Products. Recorded live at Networking Field Day 40 in San Jose on April 9, 2026. Watch the entire presentation at https://techfieldday.com/appearance/arista-presents-at-networking-field-day-40/ or visit https://TechFieldDay.com/event/nfd40 or https://Arista.com/ for more information.
Up Next in Networking Field Day 40
-
Arista Networking for AI: The Etherne...
We go deep into the fabric of the AI cluster. We'll discuss why Ethernet has become the definitive backplane for AI workloads. We'll explore hardware innovations in power efficiency and the protocol optimizations--like Dynamic Load Balancing (DLB) and advanced congestion control--that keep data m...
-
Upscale AI Purpose-Built for AI Scale
Upscale AI was founded in 2025 and quickly emerged from stealth to become a unicorn following $300 million in seed and Series A funding. The leadership team consists of industry veterans from major firms like Cisco, Broadcom, and NVIDIA, focusing on a clean sheet architecture designed to solve th...
-
AI Changes in the Norm with Upscale AI
Upscale AI, founded in 2025, recently emerged from stealth as a unicorn following $300 million in combined seed and Series A funding. With a team of industry veterans, Upscale AI is focused on building a clean sheet networking architecture specifically for the backend and lean front-end of AI cl...