Ishan Sharma, Group Product Manager in the Google Kubernetes Engine team, presented on GKE and AI Hypercomputer, focusing on industry-leading infrastructure, training quickly at mega scale, serving with lower cost and latency, economic access to GPUs and TPUs, and faster time to value. He emphasized that Google Cloud is committed to ensuring new accelerators are available on GKE on day one. The AI Hypercomputer, the entire stack, and a reference architecture, is the same stack that Google uses internally for Vertex AI.
The presentation highlighted Cluster Director for GKE, which enables the deployment, scaling, and management of AI-optimized GKE clusters where physically co-located accelerators function as a single unit, delivering high performance and ultra-low latency. Key benefits include running densely co-located accelerators, mega-scale training jobs, topology-aware scheduling, ease of use, 360-degree observability, and resiliency. Cluster Director for GKE uses standard Kubernetes APIs and the existing ecosystem, which allows users to orchestrate these capabilities.
Sharma also demonstrated the GKE Inference Gateway, which enhances LLM inference responses by routing requests based on model server metrics like KVCache and queue line, reducing variability and improving time to first token latency. Additionally, he showcased the GKE Inference Quickstart, a feature on the GKE homepage within the Google Cloud console, which recommends optimized infrastructure configurations for different models, like the Nvidia L4 for Gemma 2 2B instruction-tuned model. This simplifies model deployment and optimizes performance.
Presented by Ishan Sharma, Group Product Manager, Google Kubernetes Engine, Google Cloud. Recorded live in Santa Clara, California, on April 22, 2025, as part of AI Infrastructure Field Day. Watch the entire presentation at https://techfieldday.com/appearance/google-cloud-presents-at-ai-infrastructure-field-day-2/ or https://techfieldday.com/event/aiifd2/ for more information.
Up Next in AI Infrastructure Field Day 2
-
Overview of Cloud Storage Storage for...
Marco Abela, Product Manager at Google Cloud Storage, presented an overview of Google Cloud's storage solutions optimized for AI/ML workloads. The presentation addressed the critical role of storage in AI pipelines, emphasizing that an inadequate storage solution can significantly bottleneck GPU ...
-
Intro to Managed Lustre with Google C...
Dan Eawaz, Senior Product Manager at Google Cloud, introduced Managed Lustre with Google Cloud, a fully managed parallel file system built on DDN Exascaler. The aim is to solve the demanding requirements of data preparation, model training, and inference in AI workloads. Managed Lustre provides h...
-
The latest in high-performance storag...
Michal Szymaniak, Principal Engineer at Google Cloud, presented on Rapid Storage, a new zonal storage product within the cloud storage portfolio, powered by Google's foundational distributed file system, Colossus. The goal in designing Rapid Storage was to create a storage system that offers the ...