AI Infrastructure Field Day 2
Analytics Storage and AI, Data Prep and Data Lakes with Google Cloud
25m
Vivek Sarswat, Group Product Manager at Google Cloud Storage, presented on analytics storage and AI, focusing on data preparation and data lakes. He emphasized the close ties between analytics and AI workloads, highlighting key innovations built to address related challenges. The presentation demonstrates that analytics play a crucial role in the AI data pipeline, particularly in ingestion, data preparation, and cleaning.
Sarswat explained how customers increasingly build unified data lake houses using open metadata table formats like Apache Iceberg. This approach enables analytics and AI workloads, including running analytics on AI data. He cited Snap as a customer example, processing trillions of user events weekly using Spark for data preparation and cleaning on top of Google Cloud Storage. Google Cloud Storage offers optimizations like the Cloud Storage Connector, Anywhere Cache, and Hierarchical Namespace (HNS) to enhance data preparation.
Sarswat covered the concept of a data lakehouse, combining structured and unstructured data in a unified platform with a separation layer using open table formats. Examples from Snowflake, Databricks, Uber, and Google Cloud's BigQuery tables for Apache Iceberg illustrated the diverse architectures employed. Sarswat also addressed common customer challenges like data fragmentation, performance bottlenecks, and optimization for resilience, security, and cost, offering solutions like Storage Intelligence, Anywhere Cache, and Bucket Relocate, referencing customer case studies such as Spotify and Two Sigma.
Presented by Vivek Sarswat, Group Product Manager, Google Cloud. Recorded live in Santa Clara, California, on April 22, 2025, as part of AI Infrastructure Field Day. Watch the entire presentation at https://techfieldday.com/appearance/google-cloud-presents-at-ai-infrastructure-field-day-2/ or https://techfieldday.com/event/aiifd2/ for more information.
Up Next in AI Infrastructure Field Day 2
-
AI hypercomputer and GPU acceleration...
Dennis Liu, a Product Manager at Google Cloud specializing in GPUs, presented on AI hypercomputer and GPU acceleration with Google Cloud. Liu covered Google Cloud's AI hypercomputer, from consumption models to purpose-built hardware. Focus was given to Google's cluster director for managing GPU f...
-
AI Hypercomputer and TPU (Tensor) acc...
Rose Zhu, a Product Manager at Google Cloud TPU, presented on TPUs for large-scale training and inference, emphasizing the rapid growth of AI models and the corresponding demands for compute, memory, and networking. Zhu highlighted the specialization of Google's TPU chips and systems, purpose-bui...
-
Cloud WAN Connecting networks for the...
This presentation by Aniruddha Agharkar, Product Manager at Google Cloud Networking, centers on Cloud WAN, Google's fully managed backbone solution designed for the enterprise era and powered by Google's planet-scale network. Customers have historically relied on bespoke networks using leased lin...