From Neural Networks to GPU Fabrics - Networking for Modern AI:ML Infrastructure
40m
Michael Witte, a principal architect at Worldwide Technology, discusses the fundamental shift in data center networking required to support large-scale AI and machine learning workloads. The presentation transitions from the basic biological inspiration behind neural networks to the physical and electrical realities that necessitate high-bandwidth GPU fabrics. Witte explains that because neural networks have grown too large to fit within the memory of a single GPU, they must be distributed across thousands of nodes, creating a massive synchronization challenge. This global synchronization acts as a batch job where GPUs must constantly exchange data to align weights and biases, leading to the massive 400Gbps and 800Gbps elephant flows that characterize modern AI traffic.
The technical core of the session focuses on why traditional networking fails under these conditions and how RDMA (Remote Direct Memory Access) over Converged Ethernet (RoCEv2) provides a solution by bypassing the CPU to allow direct memory-to-memory transfers. Witte highlights the necessity of non-blocking, one-to-one subscribed fabrics, as oversubscription leads to packet drops that are catastrophic for UDP-based RoCEv2 traffic. To manage potential congestion without dropping packets, he details the use of Data Center Quantized Congestion Notification (DCQCN), which combines Explicit Congestion Notification (ECN) to throttle senders and Priority Flow Control (PFC) as a hammer to pause traffic when buffers reach critical levels.
Looking toward the future, the summary addresses the evolution of these fabrics into scheduled fabrics and the upcoming standards from the Ultra Ethernet Consortium (UEC). Witte explains that scheduled fabrics improve efficiency by breaking massive elephant flows into smaller flowlets and spraying them across all available paths, though this requires advanced NICs or DPUs to reassemble packets in the correct order. He concludes by emphasizing that as GPU capabilities continue to outpace traditional networking, the industry is moving toward more deterministic, flow-aware scheduling and co-packaged optics to minimize latency and maximize tokens per watt in the race for AI compute efficiency.
Presented by Michael Witte, Principal Architect, WWT. Recorded live at Networking Field Day 40 in San Jose on April 10, 2026. Watch the entire presentation at https://techfieldday.com/appearance/networking-field-day-40-community-sessions/ or visit https://TechFieldDay.com/event/nfd40 or https://wwt.com for more information.