Jesse Gonzales, Staff Solution Architect, offers sizing guidance for AI inferencing based on real-world experience. The presentation focuses on the critical aspect of appropriately sizing AI infrastructure, particularly for inferencing workloads. Gonzales emphasized the need to understand model requirements, GPU device types, and the role of inference engines. He walks the audience through considerations like CPU and memory requirements based on the selected inference engine, and how this directly impacts the resources needed on Kubernetes worker nodes. The discussion also touches on the importance of accounting for administrative overhead and high availability when deploying LLM endpoints, offering a practical guide to managing resources within a Kubernetes cluster.
The presentation highlights the value of the Nutanix Enterprise AI's pre-validated models, offering recommendations on the specific resources needed to run a model in a production-ready environment. Gonzales discussed the shift in customer focus from proof-of-concept to centralized systems that allow for sharing large models. The discussion also underscores the importance of accounting for factors like planned maintenance and ensuring sufficient capacity for pod migration. Gonzales explained the sizing process, starting with model selection, GPU device identification, and determining GPU count, followed by calculating CPU and memory needs.
Throughout the presentation, Gonzales addresses critical aspects like FinOps and cost management, highlighting the forthcoming integration of metrics for request counts, latency, and eventually, token-based consumption. He addressed questions about the deployment and licensing options for Nutanix Enterprise AI (NAI), offering different scenarios for on-premises, bare metal, and cloud deployments, depending on the customer's existing infrastructure. Nutanix's approach revolves around flexibility, supporting various choices in infrastructure, virtualization, and Kubernetes distributions. The presentation demonstrates how the company streamlines AI deployment and management, making it easier for customers to navigate the complexities of AI infrastructure and scale as needed.
Presented by Jesse Gonzales, Staff Solution Architect, Nutanix. Recorded live in Santa Clara, California, on April 24, 2025, as part of AI Infrastructure Field Day. Watch the entire presentation at https://techfieldday.com/appearance/nutanix-presents-at-ai-infrastructure-field-day-2/ or https://techfieldday.com/event/aiifd2/ for more information.
Up Next in AI Infrastructure Field Day 2
-
Wrapping up and summarizing Nutanix E...
The Nutanix presentation at AI Infrastructure Field Day focused on enterprise AI solutions, emphasizing giving customers a solid technical understanding of Nutanix Enterprise AI (NAI) and its role in addressing key customer challenges. The discussion highlighted the curated model catalog, offerin...
-
Design, deploy, and monitor networks ...
Thomas Scheibe, Chief Product Officer, offers solutions for designing, deploying, and monitoring networks for AI workloads. Their focus is on addressing the specialized networking needs of AI, including multiple networks, differentiated Quality of Service (QoS), and the integration of compute int...
-
Our Major Takeaways from AI Infrastru...
In this episode of Tech Field Day Takeaways from AI Infrastructure Field Day, Alastair Cooke, shares key insights from the event, highlighting how AI infrastructure must align with each phase of the AI pipeline—from data ingestion to training, fine-tuning, and inference. Training phases demand hi...