Model Catalog

Pick a model from our selection, or deploy a previously fine-tuned model.

Search
Platform
Action
Model
Models for Deployment
57 items
Author avatar

GLM 5.1 NVFP4

Deployable
754B
NVIDIA Open Model License
1 Platform

New NVIDIA-optimized NVFP4 release of GLM 5.1, Z.ai's 754B-parameter MoE model for agentic engineering and long-horizon coding.

Author avatar

DeepSeek V4 Pro

Deployable
1.6T
MIT
2 Platforms

New DeepSeek V4 Pro is a 1.6T-parameter MoE model with 49B activated parameters and a one-million-token context window for advanced long-context reasoning and agentic tasks.

Author avatar

DeepSeek V4 Flash

Deployable
284B
MIT
4 Platforms

New DeepSeek V4 Flash is a 284B-parameter MoE model with 13B activated parameters and a one-million-token context window, built for efficient long-context intelligence.

Author avatar

GLM 5.1 FP8

Deployable
754B
MIT
3 Platforms

New GLM 5.1 FP8 is the FP8 checkpoint of Z.ai's 754B-parameter flagship MoE model for agentic engineering and long-horizon coding.

Author avatar

GLM 5.1

Deployable
754B
MIT
1 Platform

New GLM 5.1 is Z.ai's flagship 754B-parameter MoE model for agentic engineering, long-horizon coding, repository generation, and terminal-based tasks.

Author avatar

Kimi K2.6

Deployable
1T
Modified MIT
4 Platforms

New Kimi K2.6 is a native multimodal 1T-parameter MoE model with 32B activated parameters, aimed at long-horizon coding, agentic workflows, and autonomous task orchestration.

Author avatar

MiniMax M2.7

Deployable
229B
Other
6 Platforms

New MiniMax M2.7 is a 229B-parameter MoE model for software engineering, agentic tool use, productivity workflows, and long-context reasoning.

Author avatar

MiniMax M2.7 NVFP4

Deployable
230B
NVIDIA Software and Model Evaluation License
3 Platforms

New NVIDIA-optimized NVFP4 release of MiniMax M2.7, a 230B-parameter sparse MoE model for complex software engineering, agentic tool use, and productivity workflows.

Author avatar

Kimi K2.6 NVFP4

Deployable
1T
NVIDIA Open Model License
2 Platforms

New NVIDIA-optimized NVFP4 release of Moonshot AI Kimi K2.6, a 1T-parameter native multimodal MoE model for agentic coding and autonomous workflows.

Author avatar

Nemotron 3 Nano Omni 30B A3B Reasoning BF16

Deployable
33B
NVIDIA Open Model Agreement
5 Platforms

New Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16 is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows.

Author avatar

Nemotron 3 Nano Omni 30B A3B Reasoning FP8

Deployable
33B
NVIDIA Open Model Agreement
6 Platforms

New Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8 is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows.

Author avatar

Kimi K2.5 NVFP4

Deployable
1.1T
NVIDIA Open Model License
2 Platforms

New NVIDIA-optimized NVFP4 release of Moonshot AI Kimi K2.5: a large multimodal model (text, image, video) with up to 256k context, quantized with NVIDIA Model Optimizer for efficient inference.

Author avatar

Mistral Small 4 119B 2603

Deployable
119.4B
Apache 2.0
4 Platforms

New Mistral Small 4 is a frontier-class multimodal model in the Mistral Small line, with strong multilingual, vision, and long-context performance.

Author avatar

Mistral Small 4 119B 2603 NVFP4

Deployable
119.4B
Apache 2.0
1 Platform

New NVFP4-quantized Mistral Small 4: a frontier-class multimodal model with strong multilingual, vision, and long-context performance at lower memory footprint.

Author avatar

Cohere Transcribe 03-2026

Deployable
5B
Apache 2.0
2 Platforms

New Cohere speech transcription model with great performance for automatic speech recognition and long-form audio.

Author avatar

Trinity Large Thinking

Deployable
398B
Apache 2.0
1 Platform

New Reasoning-optimized sparse MoE model (~398B total, ~13B active per token) with native extended chain-of-thought and strong agentic / tool-calling performance.

Author avatar

Gemma 4 31B IT

Deployable
31.3B
Gemma
6 Platforms

New Gemma 4 is Google's open multimodal family built on the same research stack as Gemini. This instruction-tuned 31B variant supports image-text input and conversational use.

Author avatar

Gemma 4 26B A4B IT

Deployable
26.5B
Gemma
5 Platforms

New Gemma 4 26B A4B is a mixture-of-experts instruction-tuned model in the Gemma 4 line, with multimodal (image-text) conversational capabilities.

Author avatar

Gemma 4 E4B IT

Deployable
8B
Gemma
5 Platforms

New Compact Gemma 4 E4B instruction-tuned variant with multimodal (image-text and broader modality) support for efficient deployment.

Author avatar

Gemma 4 E2B IT

Deployable
5.1B
Gemma
5 Platforms

New Smallest Gemma 4 E2B instruction-tuned variant with multimodal (image-text and broader modality) support for edge-friendly deployment.

Author avatar

Qwen3.5 397B A17B

Deployable
397B
Apache 2.0
1 Platform

New Qwen3.5 397B A17B is the largest of the next-generation multimodal models that combines unified vision-language learning, efficient hybrid architecture, large-scale reinforcement learning, and broad multilingual support to deliver highly capable, scalable, and globally accessible AI.

Author avatar

Qwen3.5 27B

Deployable
27B
Apache 2.0
4 Platforms

New Qwen3.5-27B is a mid-sized version of Qwen3.5 that retains its advanced multimodal capabilities, efficient architecture, strong RL-driven generalization, and broad multilingual support while offering a more balanced performance-efficiency tradeoff.

Author avatar

Qwen3.5 9B

Deployable
9B
Apache 2.0
3 Platforms

New Qwen3.5-9B is a smaller, efficient variant of Qwen3.5 that preserves its multimodal intelligence, scalable reasoning, and broad multilingual capabilities while optimizing for lower resource usage and faster deployment.

Author avatar

Qwen3 Coder Next

Deployable
80B
Apache 2.0
4 Platforms

New Qwen3-Coder-Next is an open-weight coding-focused model that delivers high performance with minimal active parameters, excels in agentic reasoning and tool use, and integrates seamlessly into real-world IDEs with long context support.

Author avatar

NVIDIA Nemotron 3 Super 120B A12B BF16

Deployable
120B
NVIDIA Open Model License
4 Platforms

New Nemotron-3-Super-120B-A12B-BF16 is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities.

Author avatar

NVIDIA Nemotron 3 Nano 30B A3B BF16

Deployable
32B
NVIDIA Open Model License
5 Platforms

New Nemotron-3-Nano-30B-A3B-BF16 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks.

Author avatar

Ministral 3 Small 14B Reasoning 2512

Deployable
13.9B
Apache 2.0
5 Platforms

New The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.

Author avatar

Mistral Large 3 675B Instruct 2512

Deployable
675B
Apache 2.0
1 Platform

New Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.

Author avatar

Mistral Large 3 675B Instruct 2512 NVFP4

Deployable
675B
Apache 2.0
1 Platform

New Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.

Author avatar

GPT OSS 120B

Deployable
117B
Apache 2.0
7 Platforms

New GPT OSS is the latest OpenAI open-model designed for powerful reasoning, agentic tasks, and versatile developer use cases. GPT OSS 120B is intended for production, general purpose, and high reasoning use cases.

Author avatar

GPT OSS 20B

Deployable
21B
Apache 2.0
8 Platforms

New GPT OSS is the latest OpenAI open-model designed for powerful reasoning, agentic tasks, and versatile developer use cases. GPT OSS 20B is intended for lower latency, and local or specialized use cases.

Author avatar

IBM Granite 4.0 H Micro

Deployable
3.2B
Apache 2.0
5 Platforms

Deprecated Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Author avatar

IBM Granite 4.0 H Tiny

Deployable
6.9B
Apache 2.0
5 Platforms

Deprecated Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Author avatar

IBM Granite 4.0 H Small

Deployable
32.2B
Apache 2.0
5 Platforms

Deprecated Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Author avatar

Magistral Small 2507

Deployable
23.6B
Apache 2.0
5 Platforms

Deprecated Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters

Author avatar

Grok 2.5

Deployable
269.5B
Grok 2 Community License
1 Platform

Deprecated Grok 2.5 is a model trained and used at xAI in 2024, recently released. Grok 2.5 was xAI's best model in 2024.

Author avatar

Llama 4 Maverick 17B 128E Instruct

Deployable
402B
Llama 4
2 Platforms

Deprecated The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Author avatar

Llama 4 Scout 17B 16E Instruct

Deployable
109B
Llama 4
5 Platforms

Deprecated The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Author avatar

gemma 3 27b it

Deployable
27.4B
Gemma
3 Platforms

Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Author avatar

gemma 3 12b it

Deployable
12.2B
Gemma
4 Platforms

Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Author avatar

gemma 3 4b it

Deployable
4.3B
Gemma
4 Platforms

Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Author avatar

gemma 3 1b it

Deployable
1B
Gemma
4 Platforms

Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Author avatar

deepseek r1

Deployable
685B
MIT
2 Platforms

Deprecated DeepSeek-R1 is an advanced reasoning system using reinforcement learning with cold-start initialization, achieving performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

Author avatar

deepseek r1 distill qwen 32b

Deployable
32.8B
MIT
6 Platforms

Deprecated DeepSeek-R1 is an advanced reasoning system using reinforcement learning with cold-start initialization, achieving performance comparable to leading proprietary models. The release includes distilled variants like DeepSeek-R1-Distill-Qwen-32B, which outperforms OpenAI's GPT-4-mini across mathematical reasoning (92.3% MATH) and coding benchmarks while maintaining commercial usability. This 32B-parameter model sets new state-of-the-art results for dense models in its class through optimized knowledge distillation from the R1 framework.

Author avatar

deepseek r1 distill llama 70b

Deployable
70.6B
MIT
5 Platforms

Deprecated DeepSeek-R1 is an advanced reasoning model using reinforcement learning with cold-start initialization, matching top-tier systems in math and coding performance. The release includes distilled variants like DeepSeek-R1-Distill-Llama-70B, which preserves 94.5% MATH accuracy in the efficient Llama-70B architecture while supporting commercial applications.

Author avatar

Llama 3.2 11B Vision Instruct

Deployable
10.7B
Llama 3.2
5 Platforms

Deprecated The Llama-3.2-11B-Vision-Instruct is a multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering questions about images, trained on 6 billion image-text pairs, with 11 billion parameters, and supported for commercial and research use in English.

Author avatar

Llama 3.2 90B Vision Instruct

Deployable
88.6B
Llama 3.2
3 Platforms

Deprecated The Llama-3.2-90B-Vision-Instruct is a multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering questions about images, trained on 6 billion image-text pairs, with 90 billion parameters, and supported for commercial and research use in English.

Author avatar

Llama 3.3 70B Instruct

Deployable
70.6B
Llama 3.3
4 Platforms

Deprecated The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out).

Author avatar

meta llama 3.1 70b instruct

Deployable
70.6B
Llama 3.1
4 Platforms

Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)

Author avatar

Meta Llama 3.1 8b Instruct

Deployable
8B
Llama 3.1
6 Platforms

Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)

Author avatar

Mixtral 8x22B Instruct v0.1

Deployable
141B
Apache 2.0
3 Platforms

Deprecated The Mixtral-8x22B-Instruct-v0.1 LLM, a fine-tuned version of the high-efficiency, sparse Mixture-of-Experts Mixtral-8x22B model, excels in multilingual fluency, advanced math, coding skills, and scalable tech application development.

Author avatar

whisper small.en

Deployable
242M
Apache 2.0
0 Platforms

Deprecated Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.

Author avatar

Phi 3.5 mini instruct

Deployable
3.8B
MIT
0 Platforms

Deprecated Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length.

Author avatar

nomic embed text v1.5

Deployable
137M
Apache 2.0
0 Platforms

Deprecated Nomic Embed Text v1.5 is the first open source, open data, and open training code text embedding model with 8192 token context length.

Author avatar

clip vit base patch32

Deployable
151M
MIT
0 Platforms

Deprecated OpenAI CLIP is a multimodal encoder model designed to understand both visual and textual data. It excels at tasks such as zero-shot image classification, image-text similarity, and cross-modal retrieval by leveraging a shared embedding space.

Author avatar

Meta Llama 3 8b Instruct

Deployable
8B
Llama 3
4 Platforms

Deprecated Meta Llama 3 is a family of instruction-tuned, auto-regressive LLMs with optimized transformer architecture, offered in 8B and 70B sizes, designed for superior performance in dialogue applications and enhanced safety and helpfulness.

Author avatar

Meta Llama 3.1 405B Instruct FP8

Deployable
410B
Llama 3.1
1 Platform

Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)

Models for Training
5 items
Author avatar

Meta Llama 3.1 70b

Trainable
70.6B
Llama 3.1
1 Platform

Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)

Author avatar

Meta Llama 3.1 8b

Trainable
8B
Llama 3.1
4 Platforms

Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)

Author avatar

Meta Llama 3 8b

Trainable
8B
Llama 3
4 Platforms

Deprecated Meta Llama 3 is a family of instruction-tuned, auto-regressive LLMs with optimized transformer architecture, offered in 8B and 70B sizes, designed for superior performance in dialogue applications and enhanced safety and helpfulness.

Author avatar

Mixtral 8x7B v0.1

Trainable
46.7B
Apache 2.0
3 Platforms

Deprecated Mixtral 8x7B is an open-source sparse Mixture of Experts model (SMoE) under Apache 2.0, which excels in multilingual communication and code generation, offers a vast 32k token context, and boasts superior cost-performance metrics, surpassing Llama 2 70B and GPT3.5 in most benchmarks.

Author avatar

Mistral 7B v0.1

Trainable
7.2B
Apache 2.0
4 Platforms

Deprecated The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.