Upscale AI's Point of View with Aravind Srikumar
Networking Field Day 40
•
15m
Upscale AI distinguishes between two critical domains: scale-up networking, which creates a large compute environment within a rack where multiple GPUs see a flat, unified memory, and scale-out networking, which connects these domains through memory copy operations. The presentation highlights that the network has become the backplane of a distributed ecosystem, moving from a standard client-server model to a highly synchronized all-to-all communication pattern. Upscale AI aims to solve the challenges of this new era by providing purpose-built hardware and software that prioritizes predictable, ultra-low latency and zero-oversubscription bandwidth to prevent computational stalls.
In the scale-up domain, the architecture must support load-store operations with latencies under one microsecond. Aravind Srikumar, introduces the Skyhammer architecture, a clean-sheet design specifically built for the scale-up environment that emphasizes "performance, performance, performance." Unlike traditional networking, these systems utilize lightweight, optimized headers and offload congestion handling, such as Link Layer Retry (LLR) and Priority Flow Control (PFC), directly to the switch to minimize jitter. For scale-out needs, Upscale AI has partnered with NVIDIA to utilize the Spectrum-X substrate, building open, Ethernet-based systems around it that feature AI-optimized operating systems, hitless upgrades, and specialized circuitry for real-time power management and telemetry.
The company's overarching vision is to enable a future of heterogeneous compute where customers can mix and match various processing units, such as GPUs, LPUs, and DPUs, without being locked into a single proprietary ecosystem. By utilizing open standards like SONiC, ESON, and UA Link, Upscale AI ensures that its fabric remains technology-agnostic and interoperable. This approach is designed to protect customer investments over a five-to-seven-year lifecycle, allowing the network to adapt as new AI workloads and specialized chips emerge. Ultimately, the goal is to transform the data center into an efficient token factory where every ounce of power and compute is maximized through an architected, rather than merely tuned, networking stack.
Presented by Aravind Srikumar, SVP Product and Marketing, and Deepti Chandra, VP Product and Marketing. Recorded live at Networking Field Day 40 in San Jose on April 9, 2026. Watch the entire presentation at https://techfieldday.com/appearance/upscale-ai-presents-at-networking-field-day-40/ or visit https://TechFieldDay.com/event/nfd40 or https://upscale.ai for more information.
Up Next in Networking Field Day 40
-
Netris Introduction and Overview with...
CEO and co-founder Alex Saroyan discusses the evolution of network engineering in the era of AI. Saroyan highlights that AI networking significantly differs from traditional data center networking due to the massive scale of GPU clusters, ranging from 1,000 to over 50,000 GPUs, and the sheer dens...
-
Netris and the Lifecycle of AI Networ...
Alex Saroyan, CEO and co-founder of Netris, provides insights from the company's experience in deploying and automating large-scale GPU clusters. This second part of the presentation focuses specifically on the life cycle of AI networking, emphasizing that sustainable AI business strategies requi...
-
Consuming AI Networks with Netris
This presentation focuses on the consumption model of AI networks, specifically helping network engineers enable self-service capabilities for AI factory and neocloud operators. Alex Saroyan argues that while network engineers manage complex physical infrastructures, the consumers, such as comput...