Agentic Smart Router

Tagged As
nvidia
nims
agents
router
rag

In an Agentic AI application, not all prompts are the same, different prompts need to have different requirements in terms of complexity of the prompt - some need reasoning, some need access to RAG system and some need tool calling capabilities. In this custom-built Agentic Smart Router, we demonstrate that capability by leveraging NVIDIA Agent Intelligence Toolkit, NVIDIA NIMs, NVIDIA LLM Router blueprint and stitching them together to build an Agentic Smart Router application.

Features

This NVIDIA Agentic Toolkit (NAT) application introduces the first integration of the NVIDIA LLM Router within a multi-framework, agent-oriented architecture. The supervisory agent and routing control plane are implemented using LangChain, while the retrieval-augmented generation (RAG) subsystem is built on LlamaIndex. Together, these components form an end-to-end intelligent agent workflow that accepts a user prompt and, by leveraging integrated retrieval and routing capabilities, dynamically determines and invokes the most appropriate model to service the request.

Core components

  1. Retrieve Tool: This component is backed by a comprehensive knowledge base specific to the workload. It enriches the agent’s reasoning by retrieving relevant contextual information to support accurate and grounded responses.

  2. LLM Router tool (NVIDIA LLM Blueprint): The routing layer follows NVIDIA’s LLM Blueprint design and includes a Router Server and Router Controller. The router intelligently maps the request to the most suitable model. At its core is a classifier model that evaluates the incoming prompt to determine whether it represents:

    • General conversational (chit-chat) queries, (model used: llama 3.3-70b-instruct)

    • More complex tasks require deeper reasoning, brainstorming, or code generation. (NVIDIA/llama-3.3 nemotron-super-49b-v1).

  3. Observability and Monitoring: The observability layer is implemented using the open-source tool Arize-Phoenix. This component provides detailed visibility into the agent’s execution path, including action traces, decision flows, and end- to-end latency metrics, enabling effective debugging, performance analysis, and optimization.