Standard benchmarks often fall short and can be misleading. Leaderboards can erode trust in model claims, as they rarely address specific, real-world needs. In this talk, Demetrios Brinkmann will detail how MLOps engineers and developers can build and continuously update their own evaluation systems to create a strong competitive advantage. He’ll cover how to build a reliable “golden dataset,” optimize data collection, labeling, and utilize the right tools to ensure evaluations truly reflect their intended use case.
Up Next in JFrog swampUP 2025
-
Guy Levi - Revolutionizing Software D...
Guy is fascinated by the transformative power of AI to revolutionize traditional software development and delivery processes. He will detail how AI can automate complex tasks, enhance security, improve decision-making, and accelerate innovation, ultimately enabling teams to build more secure, rel...
-
Hariharan Ragothaman - Architecting a...
Hariharan explores approaches for architecting a 'Unified Deployment Pipeline' that accelerates developer velocity and productivity while enforcing robust security governance across the SDLC with integrated logging, tracing, and metrics. Additionally, by automating SBOM generation, our strategy d...
-
Enhancing Cloud Security with AI and ...
Jiong Liu, VP of Product Marketing with WIZ, talks about how security teams often struggle to connect artifact vulnerabilities detected in development with runtime exposure and real-world risks. Jiong discusses best practices for DevSecOps to overcome this and achieve faster, more effective vulne...