Rose Zhu, a Product Manager at Google Cloud TPU, presented on TPUs for large-scale training and inference, emphasizing the rapid growth of AI models and the corresponding demands for compute, memory, and networking. Zhu highlighted the specialization of Google's TPU chips and systems, purpose-built ASICs for machine learning applications, coupled with innovations in power efficiency, networking using Jupiter optical networks and ICI, and liquid cooling. A key focus was on co-designing TPUs with software, enabling them to function as a supercomputer, supported by frameworks like JAX and PyTorch, and a low-level compiler (XLA) to maximize performance.
Showcasing real-world TPU usage, powering Google's internal applications like Gmail and YouTube, and serving external cloud customers across various segments like Anthropic, Salesforce, Mercedes, and Kakao. The adoption of Cloud TPUs has seen significant growth, with an eightfold increase in chip-per-hour consumption within 12 months. A major announcement was the upcoming 7th generation TPU, Ironwood, slated for general availability in Q4 2025, featuring two configurations, TPU7 and TPU7X, to address diverse data center requirements and customer needs for locality and low latency.
Zhu detailed the specifications of Ironwood, including its BF16 and FP8 support, teraflops performance, and high bandwidth memory. Ironwood boasts significant performance and power efficiency improvements compared to previous TPU generations. Rose also touched on optimizing TPU performance through techniques like flash attention, host DRAM offload, mixed precision training, and an inference stack for TPU. GKE manages TPU for orchestration, focusing on scheduling goodput and runtime goodput. Zhu highlighted GKE's capabilities in managing large-scale training and inference, emphasizing scheduling and runtime efficiency improvements.
Presented by Rose Zhu, Product Manager, Google Cloud TPU, Google Cloud. Recorded live in Santa Clara, California, on April 22, 2025, as part of AI Infrastructure Field Day. Watch the entire presentation at https://techfieldday.com/appearance/google-cloud-presents-at-ai-infrastructure-field-day-2/ or https://techfieldday.com/event/aiifd2/ for more information.
Up Next in AI Infrastructure Field Day 2
-
Cloud WAN Connecting networks for the...
This presentation by Aniruddha Agharkar, Product Manager at Google Cloud Networking, centers on Cloud WAN, Google's fully managed backbone solution designed for the enterprise era and powered by Google's planet-scale network. Customers have historically relied on bespoke networks using leased lin...
-
Secure and optimize AI and ML workloa...
Vaibhav Katkade, a Product Manager at Google Cloud Networking, presented on infrastructure enhancements in cloud networking for secure, optimized AI/ML workloads. Focusing on the lifecycle of AI/ML, encompassing training, fine-tuning, and serving/inference, and the corresponding network imperativ...
-
AI Unbound, Your Data Center Your Way...
Praful Lalchandani, VP of Product, Data Center Platforms and AI Solutions at Juniper Networks, opened the presentation by highlighting the rapid growth of the AI data center space and its unique challenges. He noted that Juniper Networks, with its 25 years of experience in networking and security...