Root Cause Analysis with Nokia AI Operations Automation
36m
Clayton Wagar introduced Nokia's AI-driven approach to root cause analysis, focusing on solving difficult day-two operational challenges. The presentation highlighted the chronic pain of hidden impairments or gray failures, where traditional monitoring systems fail because physical links appear active while protocols or services are down. The goal of Nokia's deep RCA tool is to move beyond simple port-up/port-down alarming by correlating end-to-end application connectivity (from VM to VM) with all layers of the network, including the underlay, overlay, and control plane, to dramatically compress troubleshooting time.
A live demonstration was shown on the EDA SaaS platform using a real hardware Spine-Leaf network. The team introduced a gray failure by impairing a fiber link in a way that kept the interface status "up" but caused the BFD and BGP protocols to fail. Wagar explained that the AI's multi-agent workflow correctly diagnosed this. Instead of using one large, monolithic model, a planning model first determines which tool-calling agents to deploy. These agents gather specific, relevant data from logs, topology, and configuration, which is then filtered and passed to a reasoning model. This agentic-based curation of data is Nokia's key to reducing costs and avoiding the AI hallucinations that would otherwise be a risk in mission-critical networks.
The tool's capabilities were further demonstrated by successfully identifying a classic, hard-to-find MTU mismatch. Another key feature highlighted was the "Time Machine," which allows an operator to select a past timeframe, such as thirty minutes prior to an event, and run the same AI-driven root cause analysis on the historical data from that moment. The entire process concludes with the AI generating a comprehensive report that provides a human-readable summary, a confidence score, and the specific evidence gathered by the agents, effectively solving a complex logic puzzle that would have taken an engineer hours or days to manually diagnose.
Presented by Clayton Wagar, Principal Consulting Engineer. Recorded live at Networking Field Day 39 in Silicon Valley on November 5, 2025. Watch the entire presentation at https://techfieldday.com/appearance/nokia-presents-at-networking-field-day-39/ or visit https://techfieldday.com/event/nfd39/ or https://Nokia.com for more information.