Dennis Liu, a Product Manager at Google Cloud specializing in GPUs, presented on AI hypercomputer and GPU acceleration with Google Cloud. Liu covered Google Cloud's AI hypercomputer, from consumption models to purpose-built hardware. Focus was given to Google's cluster director for managing GPU fleets.
Dennis then moved to the hardware aspect of Google Cloud's AI infrastructure, discussing current and upcoming GPU systems. Available systems include A3 Ultra (H200 GPUs), A4 (B200 GPUs), and A4X (GB200 systems), which are built on Rocky on CX-7. Also discussed were two systems coming in 2025, the NVIDIA RTX Pro 6000 and a GB300 system, offering advancements in memory and networking.
The presentation also featured performance projections for LLM training, with A4 offering approximately 2x the performance of H100s. The A4 was described as a Goldilocks solution due to its balance of price and performance. There was also discussion on whether Hopper-generation GPUs would decrease in price because of newer generations of hardware.
Presented by Dennis Liu, Product Manager, Google Cloud. Recorded live in Santa Clara, California, on April 22, 2025, as part of AI Infrastructure Field Day. Watch the entire presentation at https://techfieldday.com/appearance/google-cloud-presents-at-ai-infrastructure-field-day-2/ or https://techfieldday.com/event/aiifd2/ for more information.
Up Next in AI Infrastructure Field Day 2
-
AI Hypercomputer and TPU (Tensor) acc...
Rose Zhu, a Product Manager at Google Cloud TPU, presented on TPUs for large-scale training and inference, emphasizing the rapid growth of AI models and the corresponding demands for compute, memory, and networking. Zhu highlighted the specialization of Google's TPU chips and systems, purpose-bui...
-
Cloud WAN Connecting networks for the...
This presentation by Aniruddha Agharkar, Product Manager at Google Cloud Networking, centers on Cloud WAN, Google's fully managed backbone solution designed for the enterprise era and powered by Google's planet-scale network. Customers have historically relied on bespoke networks using leased lin...
-
Secure and optimize AI and ML workloa...
Vaibhav Katkade, a Product Manager at Google Cloud Networking, presented on infrastructure enhancements in cloud networking for secure, optimized AI/ML workloads. Focusing on the lifecycle of AI/ML, encompassing training, fine-tuning, and serving/inference, and the corresponding network imperativ...