Pick a model from our selection, or deploy a previously fine-tuned model.
New NVIDIA-optimized NVFP4 release of GLM 5.1, Z.ai's 754B-parameter MoE model for agentic engineering and long-horizon coding.
New DeepSeek V4 Pro is a 1.6T-parameter MoE model with 49B activated parameters and a one-million-token context window for advanced long-context reasoning and agentic tasks.
New DeepSeek V4 Flash is a 284B-parameter MoE model with 13B activated parameters and a one-million-token context window, built for efficient long-context intelligence.
New GLM 5.1 FP8 is the FP8 checkpoint of Z.ai's 754B-parameter flagship MoE model for agentic engineering and long-horizon coding.
New GLM 5.1 is Z.ai's flagship 754B-parameter MoE model for agentic engineering, long-horizon coding, repository generation, and terminal-based tasks.
New Kimi K2.6 is a native multimodal 1T-parameter MoE model with 32B activated parameters, aimed at long-horizon coding, agentic workflows, and autonomous task orchestration.
New MiniMax M2.7 is a 229B-parameter MoE model for software engineering, agentic tool use, productivity workflows, and long-context reasoning.
New NVIDIA-optimized NVFP4 release of MiniMax M2.7, a 230B-parameter sparse MoE model for complex software engineering, agentic tool use, and productivity workflows.
New NVIDIA-optimized NVFP4 release of Moonshot AI Kimi K2.6, a 1T-parameter native multimodal MoE model for agentic coding and autonomous workflows.
New Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16 is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows.
New Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8 is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows.
New NVIDIA-optimized NVFP4 release of Moonshot AI Kimi K2.5: a large multimodal model (text, image, video) with up to 256k context, quantized with NVIDIA Model Optimizer for efficient inference.
New Mistral Small 4 is a frontier-class multimodal model in the Mistral Small line, with strong multilingual, vision, and long-context performance.
New NVFP4-quantized Mistral Small 4: a frontier-class multimodal model with strong multilingual, vision, and long-context performance at lower memory footprint.
New Cohere speech transcription model with great performance for automatic speech recognition and long-form audio.
New Reasoning-optimized sparse MoE model (~398B total, ~13B active per token) with native extended chain-of-thought and strong agentic / tool-calling performance.
New Gemma 4 is Google's open multimodal family built on the same research stack as Gemini. This instruction-tuned 31B variant supports image-text input and conversational use.
New Gemma 4 26B A4B is a mixture-of-experts instruction-tuned model in the Gemma 4 line, with multimodal (image-text) conversational capabilities.
New Compact Gemma 4 E4B instruction-tuned variant with multimodal (image-text and broader modality) support for efficient deployment.
New Smallest Gemma 4 E2B instruction-tuned variant with multimodal (image-text and broader modality) support for edge-friendly deployment.
New Qwen3.5 397B A17B is the largest of the next-generation multimodal models that combines unified vision-language learning, efficient hybrid architecture, large-scale reinforcement learning, and broad multilingual support to deliver highly capable, scalable, and globally accessible AI.
New Qwen3.5-27B is a mid-sized version of Qwen3.5 that retains its advanced multimodal capabilities, efficient architecture, strong RL-driven generalization, and broad multilingual support while offering a more balanced performance-efficiency tradeoff.
New Qwen3.5-9B is a smaller, efficient variant of Qwen3.5 that preserves its multimodal intelligence, scalable reasoning, and broad multilingual capabilities while optimizing for lower resource usage and faster deployment.
New Qwen3-Coder-Next is an open-weight coding-focused model that delivers high performance with minimal active parameters, excels in agentic reasoning and tool use, and integrates seamlessly into real-world IDEs with long context support.
New Nemotron-3-Super-120B-A12B-BF16 is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities.
New Nemotron-3-Nano-30B-A3B-BF16 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks.
New The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.
New Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.
New Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.
New GPT OSS is the latest OpenAI open-model designed for powerful reasoning, agentic tasks, and versatile developer use cases. GPT OSS 120B is intended for production, general purpose, and high reasoning use cases.
New GPT OSS is the latest OpenAI open-model designed for powerful reasoning, agentic tasks, and versatile developer use cases. GPT OSS 20B is intended for lower latency, and local or specialized use cases.
Deprecated Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
Deprecated Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
Deprecated Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.
Deprecated Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters
Deprecated Grok 2.5 is a model trained and used at xAI in 2024, recently released. Grok 2.5 was xAI's best model in 2024.
Deprecated The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
Deprecated The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
Deprecated DeepSeek-R1 is an advanced reasoning system using reinforcement learning with cold-start initialization, achieving performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
Deprecated DeepSeek-R1 is an advanced reasoning system using reinforcement learning with cold-start initialization, achieving performance comparable to leading proprietary models. The release includes distilled variants like DeepSeek-R1-Distill-Qwen-32B, which outperforms OpenAI's GPT-4-mini across mathematical reasoning (92.3% MATH) and coding benchmarks while maintaining commercial usability. This 32B-parameter model sets new state-of-the-art results for dense models in its class through optimized knowledge distillation from the R1 framework.
Deprecated DeepSeek-R1 is an advanced reasoning model using reinforcement learning with cold-start initialization, matching top-tier systems in math and coding performance. The release includes distilled variants like DeepSeek-R1-Distill-Llama-70B, which preserves 94.5% MATH accuracy in the efficient Llama-70B architecture while supporting commercial applications.
Deprecated The Llama-3.2-11B-Vision-Instruct is a multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering questions about images, trained on 6 billion image-text pairs, with 11 billion parameters, and supported for commercial and research use in English.
Deprecated The Llama-3.2-90B-Vision-Instruct is a multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering questions about images, trained on 6 billion image-text pairs, with 90 billion parameters, and supported for commercial and research use in English.
Deprecated The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out).
Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)
Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)
Deprecated The Mixtral-8x22B-Instruct-v0.1 LLM, a fine-tuned version of the high-efficiency, sparse Mixture-of-Experts Mixtral-8x22B model, excels in multilingual fluency, advanced math, coding skills, and scalable tech application development.
Deprecated Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.
Deprecated Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length.
Deprecated Nomic Embed Text v1.5 is the first open source, open data, and open training code text embedding model with 8192 token context length.
Deprecated OpenAI CLIP is a multimodal encoder model designed to understand both visual and textual data. It excels at tasks such as zero-shot image classification, image-text similarity, and cross-modal retrieval by leveraging a shared embedding space.
Deprecated Meta Llama 3 is a family of instruction-tuned, auto-regressive LLMs with optimized transformer architecture, offered in 8B and 70B sizes, designed for superior performance in dialogue applications and enhanced safety and helpfulness.
Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)
Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)
Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)
Deprecated Meta Llama 3 is a family of instruction-tuned, auto-regressive LLMs with optimized transformer architecture, offered in 8B and 70B sizes, designed for superior performance in dialogue applications and enhanced safety and helpfulness.
Deprecated Mixtral 8x7B is an open-source sparse Mixture of Experts model (SMoE) under Apache 2.0, which excels in multilingual communication and code generation, offers a vast 32k token context, and boasts superior cost-performance metrics, surpassing Llama 2 70B and GPT3.5 in most benchmarks.
Deprecated The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.