Arista Networking for AI: The Ethernet Backplane
Networking Field Day 40
•
33m
We go deep into the fabric of the AI cluster. We'll discuss why Ethernet has become the definitive backplane for AI workloads. We'll explore hardware innovations in power efficiency and the protocol optimizations--like Dynamic Load Balancing (DLB) and advanced congestion control--that keep data moving at the speed of thought. This section will cover different networks for AI networking from scale-up to scale-out and scale-across, and discuss optimizations and enhancements to Ethernet standards such as UEC and E-SUN for AI applications.
Tom Emmons emphasizes that as AI networks become business-critical, quality and power efficiency are the primary drivers of architectural decisions. Every problem in an AI network escalates immediately because of the massive financial investments involved, making a reliable network essential. Since power is the fundamental limiting factor for GPU density in a data center, Arista focuses on reducing the network power footprint, ideally to less than 10% of total facility power, through liquid cooling, low-power optics, and high-radix switches that minimize the number of tiers. By reducing tiers, operators save on optics, which are the largest contributors to network power consumption, while also simplifying load balancing and reducing potential congestion points.
The presentation identifies four distinct AI fabrics: front-end, scale-out, scale-across, and scale-up. While scale-out provides the essential east-west connectivity for GPU training, scale-across is becoming increasingly vital for customers who must link geographically dispersed buildings to overcome local power and space constraints. Scale-across networking leverages Arista's extensive experience in WAN and routing, utilizing deep buffers, encryption, and traffic engineering to manage latency and protect data. Meanwhile, the front-end network mirrors traditional data center designs but demands higher reliability and security to manage the billions of dollars in hardware it connects to the world and local storage resources.
Arista is a vocal advocate for Ethernet as the universal backplane, specifically for the emerging scale-up market where GPU-to-GPU memory copies occur. Through leadership in consortiums like the Ultra Ethernet Consortium (UEC) and the Ethernet for Scale-up Networks (ESun) workgroup, Arista is refining Ethernet to handle 256-byte cache line transactions and packet spraying more efficiently. Emmons posits that the dominance of Ethernet is driven by the industry's desire for multi-vendor ecosystems and a unified management model. By running a single EOS image across all four fabric types, Arista provides a mature, tested software stack that allows operators to use the same BGP stack and telemetry tools regardless of whether they are managing a local scale-up cluster or a global scale-across network.
Presented by Tom Emmons, Software Engineer. Recorded live at Networking Field Day 40 in San Jose on April 9, 2026. Watch the entire presentation at https://techfieldday.com/appearance/arista-presents-at-networking-field-day-40/ or visit https://TechFieldDay.com/event/nfd40 or https://Arista.com/ for more information.
Up Next in Networking Field Day 40
-
Upscale AI Purpose-Built for AI Scale
Upscale AI was founded in 2025 and quickly emerged from stealth to become a unicorn following $300 million in seed and Series A funding. The leadership team consists of industry veterans from major firms like Cisco, Broadcom, and NVIDIA, focusing on a clean sheet architecture designed to solve th...
-
AI Changes in the Norm with Upscale AI
Upscale AI, founded in 2025, recently emerged from stealth as a unicorn following $300 million in combined seed and Series A funding. With a team of industry veterans, Upscale AI is focused on building a clean sheet networking architecture specifically for the backend and lean front-end of AI cl...
-
Upscale AI Networking - What Has Chan...
Upscale AI argues that traditional cloud and front-end networks, which are largely based on a client-server architecture, are fundamentally ill-suited for the unique demands of AI workloads. While standard web traffic is connection-oriented and tolerant of latency, AI clusters rely on collective ...