Model Catalog | Dell Enterprise Hub by Hugging Face

Gemma 4 31B IT FP8 Dynamic

Deployable

33B

Apache 2.0

7 Platforms

New RedHatAI's gemma-4-31B-it-FP8-dynamic was obtained by quantizing the weights and activations of google/gemma-4-31B-it to FP8 data type using dynamic per-token quantization.

Qwen AgentWorld 35B A3B

Deployable

35B

Apache 2.0

4 Platforms

New Qwen-AgentWorld-35B-A3B is a 35B parameter Qwen model with 3B active parameters, optimized for agentic workloads and long-context reasoning.

North Mini Code 1.0 GGUF

Deployable

30B

Apache 2.0

1 Platform

New Unsloth GGUF-quantized North Mini Code 1.0 release for llama.cpp deployments, optimized for code generation, agentic software engineering, and terminal tasks on GB10 systems.

Step 3.7 Flash GGUF

Deployable

201.4B

Apache 2.0

1 Platform

New GGUF-quantized Step 3.7 Flash release for llama.cpp deployments, supporting reasoning, tool-calling, and multimodal agent workflows on GB10 systems.

Qwen3.6 27B MTP GGUF

Deployable

27.8B

Apache 2.0

1 Platform

New Unsloth GGUF-quantized Qwen3.6-27B-MTP release for llama.cpp deployments, optimized for efficient local inference on GB10 systems.

Qwen3.6 35B A3B MTP GGUF

Deployable

36B

Apache 2.0

1 Platform

New Unsloth GGUF-quantized Qwen3.6-35B-A3B-MTP release for llama.cpp deployments, using a sparse MoE architecture for efficient local inference on GB10 systems.

Gemma 4 26B A4B IT GGUF

Deployable

26.5B

Apache 2.0

1 Platform

New Unsloth GGUF-quantized Gemma 4 26B A4B instruction-tuned release for llama.cpp deployments on GB10 systems.

Gemma 4 E4B IT GGUF

Deployable

8B

Apache 2.0

1 Platform

New Unsloth GGUF-quantized Gemma 4 E4B instruction-tuned release for llama.cpp deployments on GB10 systems.

GPT OSS 20B GGUF

Deployable

21.5B

Apache 2.0

1 Platform

New Unsloth GGUF-quantized GPT OSS 20B release for llama.cpp deployments, designed for efficient local reasoning and agentic workloads on GB10 systems.

GLM 5.2 FP8

Deployable

753B

MIT

3 Platforms

New GLM 5.2 FP8 is the FP8 checkpoint of Z.ai's 753B-parameter flagship MoE model for long-horizon tasks, with a solid 1M-token context.

MiniMax M3

Deployable

428B

MiniMax Community License

1 Platform

New MiniMax-M3 is a native multimodal model with 1M context. It has ~428B parameters and ~23B activated parameters.

NVIDIA Nemotron 3 Ultra 550B A55B BF16

Deployable

550B

NVIDIA Open Model License

2 Platforms

New Nemotron 3 Ultra 550B A55B BF16 is an NVIDIA large language model for advanced reasoning, agentic tool use, and long-context conversational workloads.

NVIDIA Nemotron 3 Ultra 550B A55B NVFP4

Deployable

550B

NVIDIA Open Model License

5 Platforms

New Nemotron 3 Ultra 550B A55B NVFP4 is an NVIDIA-optimized release for advanced reasoning, agentic tool use, and long-context conversational workloads.

GLM 5.1 NVFP4

Deployable

754B

NVIDIA Open Model License

1 Platform

New NVIDIA-optimized NVFP4 release of GLM 5.1, Z.ai's 754B-parameter MoE model for agentic engineering and long-horizon coding.

DeepSeek V4 Pro

Deployable

1.6T

MIT

2 Platforms

New DeepSeek V4 Pro is a 1.6T-parameter MoE model with 49B activated parameters and a one-million-token context window for advanced long-context reasoning and agentic tasks.

DeepSeek V4 Flash

Deployable

284B

MIT

4 Platforms

New DeepSeek V4 Flash is a 284B-parameter MoE model with 13B activated parameters and a one-million-token context window, built for efficient long-context intelligence.

GLM 5.1 FP8

Deployable

754B

MIT

3 Platforms

New GLM 5.1 FP8 is the FP8 checkpoint of Z.ai's 754B-parameter flagship MoE model for agentic engineering and long-horizon coding.

GLM 5.1

Deployable

754B

MIT

1 Platform

New GLM 5.1 is Z.ai's flagship 754B-parameter MoE model for agentic engineering, long-horizon coding, repository generation, and terminal-based tasks.

Kimi K2.6

Deployable

1T

Modified MIT

4 Platforms

New Kimi K2.6 is a native multimodal 1T-parameter MoE model with 32B activated parameters, aimed at long-horizon coding, agentic workflows, and autonomous task orchestration.

MiniMax M2.7

Deployable

229B

Other

6 Platforms

New MiniMax M2.7 is a 229B-parameter MoE model for software engineering, agentic tool use, productivity workflows, and long-context reasoning.

MiniMax M2.7 NVFP4

Deployable

230B

NVIDIA Software and Model Evaluation License

3 Platforms

New NVIDIA-optimized NVFP4 release of MiniMax M2.7, a 230B-parameter sparse MoE model for complex software engineering, agentic tool use, and productivity workflows.

Kimi K2.6 NVFP4

Deployable

1T

NVIDIA Open Model License

2 Platforms

New NVIDIA-optimized NVFP4 release of Moonshot AI Kimi K2.6, a 1T-parameter native multimodal MoE model for agentic coding and autonomous workflows.

Nemotron 3 Nano Omni 30B A3B Reasoning BF16

Deployable

33B

NVIDIA Open Model Agreement

5 Platforms

New Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16 is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows.

Nemotron 3 Nano Omni 30B A3B Reasoning FP8

Deployable

33B

NVIDIA Open Model Agreement

6 Platforms

New Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8 is a multimodal large language model that unifies video, audio, image, and text understanding to support enterprise-grade Q&A, summarization, transcription, and document intelligence workflows.

Kimi K2.5 NVFP4

Deployable

1.1T

NVIDIA Open Model License

2 Platforms

New NVIDIA-optimized NVFP4 release of Moonshot AI Kimi K2.5: a large multimodal model (text, image, video) with up to 256k context, quantized with NVIDIA Model Optimizer for efficient inference.

Mistral Small 4 119B 2603

Deployable

119.4B

Apache 2.0

4 Platforms

New Mistral Small 4 is a frontier-class multimodal model in the Mistral Small line, with strong multilingual, vision, and long-context performance.

Mistral Small 4 119B 2603 NVFP4

Deployable

119.4B

Apache 2.0

1 Platform

New NVFP4-quantized Mistral Small 4: a frontier-class multimodal model with strong multilingual, vision, and long-context performance at lower memory footprint.

Cohere Transcribe 03-2026

Deployable

5B

Apache 2.0

2 Platforms

New Cohere speech transcription model with great performance for automatic speech recognition and long-form audio.

Trinity Large Thinking

Deployable

398B

Apache 2.0

1 Platform

New Reasoning-optimized sparse MoE model (~398B total, ~13B active per token) with native extended chain-of-thought and strong agentic / tool-calling performance.

Gemma 4 31B IT

Deployable

31.3B

Gemma

6 Platforms

New Gemma 4 is Google's open multimodal family built on the same research stack as Gemini. This instruction-tuned 31B variant supports image-text input and conversational use.

Gemma 4 26B A4B IT

Deployable

26.5B

Gemma

5 Platforms

New Gemma 4 26B A4B is a mixture-of-experts instruction-tuned model in the Gemma 4 line, with multimodal (image-text) conversational capabilities.

Gemma 4 E4B IT

Deployable

8B

Gemma

5 Platforms

New Compact Gemma 4 E4B instruction-tuned variant with multimodal (image-text and broader modality) support for efficient deployment.

Gemma 4 E2B IT

Deployable

5.1B

Gemma

5 Platforms

New Smallest Gemma 4 E2B instruction-tuned variant with multimodal (image-text and broader modality) support for edge-friendly deployment.

Qwen3.5 397B A17B

Deployable

397B

Apache 2.0

1 Platform

New Qwen3.5 397B A17B is the largest of the next-generation multimodal models that combines unified vision-language learning, efficient hybrid architecture, large-scale reinforcement learning, and broad multilingual support to deliver highly capable, scalable, and globally accessible AI.

Qwen3.5 27B

Deployable

27B

Apache 2.0

4 Platforms

New Qwen3.5-27B is a mid-sized version of Qwen3.5 that retains its advanced multimodal capabilities, efficient architecture, strong RL-driven generalization, and broad multilingual support while offering a more balanced performance-efficiency tradeoff.

Qwen3.5 9B

Deployable

9B

Apache 2.0

3 Platforms

New Qwen3.5-9B is a smaller, efficient variant of Qwen3.5 that preserves its multimodal intelligence, scalable reasoning, and broad multilingual capabilities while optimizing for lower resource usage and faster deployment.

Qwen3 Coder Next

Deployable

80B

Apache 2.0

4 Platforms

New Qwen3-Coder-Next is an open-weight coding-focused model that delivers high performance with minimal active parameters, excels in agentic reasoning and tool use, and integrates seamlessly into real-world IDEs with long context support.

NVIDIA Nemotron 3 Super 120B A12B BF16

Deployable

120B

NVIDIA Open Model License

4 Platforms

New Nemotron-3-Super-120B-A12B-BF16 is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities.

NVIDIA Nemotron 3 Nano 30B A3B BF16

Deployable

32B

NVIDIA Open Model License

5 Platforms

New Nemotron-3-Nano-30B-A3B-BF16 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks.

Ministral 3 Small 14B Reasoning 2512

Deployable

13.9B

Apache 2.0

5 Platforms

New The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities.

Mistral Large 3 675B Instruct 2512

Deployable

675B

Apache 2.0

1 Platform

New Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.

Mistral Large 3 675B Instruct 2512 NVFP4

Deployable

675B

Apache 2.0

1 Platform

New Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.

GPT OSS 120B

Deployable

117B

Apache 2.0

7 Platforms

New GPT OSS is the latest OpenAI open-model designed for powerful reasoning, agentic tasks, and versatile developer use cases. GPT OSS 120B is intended for production, general purpose, and high reasoning use cases.

GPT OSS 20B

Deployable

21B

Apache 2.0

8 Platforms

New GPT OSS is the latest OpenAI open-model designed for powerful reasoning, agentic tasks, and versatile developer use cases. GPT OSS 20B is intended for lower latency, and local or specialized use cases.

IBM Granite 4.0 H Micro

Deployable

3.2B

Apache 2.0

5 Platforms

Deprecated Granite-4.0-H-Micro is a 3B parameter long-context instruct model finetuned from Granite-4.0-H-Micro-Base. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

IBM Granite 4.0 H Tiny

Deployable

6.9B

Apache 2.0

5 Platforms

Deprecated Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

IBM Granite 4.0 H Small

Deployable

32.2B

Apache 2.0

5 Platforms

Deprecated Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base. Granite 4.0 instruct models feature improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications.

Magistral Small 2507

Deployable

23.6B

Apache 2.0

5 Platforms

Deprecated Building upon Mistral Small 3.1 (2503), with added reasoning capabilities, undergoing SFT from Magistral Medium traces and RL on top, it's a small, efficient reasoning model with 24B parameters

Grok 2.5

Deployable

269.5B

Grok 2 Community License

1 Platform

Deprecated Grok 2.5 is a model trained and used at xAI in 2024, recently released. Grok 2.5 was xAI's best model in 2024.

Llama 4 Maverick 17B 128E Instruct

Deployable

402B

Llama 4

2 Platforms

Deprecated The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Llama 4 Scout 17B 16E Instruct

Deployable

109B

Llama 4

5 Platforms

Deprecated The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

gemma 3 27b it

Deployable

27.4B

Gemma

3 Platforms

Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

gemma 3 12b it

Deployable

12.2B

Gemma

4 Platforms

Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

gemma 3 4b it

Deployable

4.3B

Gemma

4 Platforms

Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

gemma 3 1b it

Deployable

1B

Gemma

4 Platforms

Deprecated Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

deepseek r1

Deployable

685B

MIT

2 Platforms

Deprecated DeepSeek-R1 is an advanced reasoning system using reinforcement learning with cold-start initialization, achieving performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

deepseek r1 distill qwen 32b

Deployable

32.8B

MIT

6 Platforms

Deprecated DeepSeek-R1 is an advanced reasoning system using reinforcement learning with cold-start initialization, achieving performance comparable to leading proprietary models. The release includes distilled variants like DeepSeek-R1-Distill-Qwen-32B, which outperforms OpenAI's GPT-4-mini across mathematical reasoning (92.3% MATH) and coding benchmarks while maintaining commercial usability. This 32B-parameter model sets new state-of-the-art results for dense models in its class through optimized knowledge distillation from the R1 framework.

deepseek r1 distill llama 70b

Deployable

70.6B

MIT

5 Platforms

Deprecated DeepSeek-R1 is an advanced reasoning model using reinforcement learning with cold-start initialization, matching top-tier systems in math and coding performance. The release includes distilled variants like DeepSeek-R1-Distill-Llama-70B, which preserves 94.5% MATH accuracy in the efficient Llama-70B architecture while supporting commercial applications.

Llama 3.2 11B Vision Instruct

Deployable

10.7B

Llama 3.2

5 Platforms

Deprecated The Llama-3.2-11B-Vision-Instruct is a multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering questions about images, trained on 6 billion image-text pairs, with 11 billion parameters, and supported for commercial and research use in English.

Llama 3.2 90B Vision Instruct

Deployable

88.6B

Llama 3.2

3 Platforms

Deprecated The Llama-3.2-90B-Vision-Instruct is a multimodal large language model optimized for visual recognition, image reasoning, captioning, and answering questions about images, trained on 6 billion image-text pairs, with 90 billion parameters, and supported for commercial and research use in English.

Llama 3.3 70B Instruct

Deployable

70.6B

Llama 3.3

4 Platforms

Deprecated The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out).

meta llama 3.1 70b instruct

Deployable

70.6B

Llama 3.1

4 Platforms

Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)

Meta Llama 3.1 8b Instruct

Deployable

8B

Llama 3.1

6 Platforms

Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)

Mixtral 8x22B Instruct v0.1

Deployable

141B

Apache 2.0

3 Platforms

Deprecated The Mixtral-8x22B-Instruct-v0.1 LLM, a fine-tuned version of the high-efficiency, sparse Mixture-of-Experts Mixtral-8x22B model, excels in multilingual fluency, advanced math, coding skills, and scalable tech application development.

whisper small.en

Deployable

242M

Apache 2.0

0 Platforms

Deprecated Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without the need for fine-tuning.

Phi 3.5 mini instruct

Deployable

3.8B

MIT

0 Platforms

Deprecated Phi-3.5-mini is a lightweight, state-of-the-art open model built upon datasets used for Phi-3 - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data. The model belongs to the Phi-3 model family and supports 128K token context length.

nomic embed text v1.5

Deployable

137M

Apache 2.0

0 Platforms

Deprecated Nomic Embed Text v1.5 is the first open source, open data, and open training code text embedding model with 8192 token context length.

clip vit base patch32

Deployable

151M

MIT

0 Platforms

Deprecated OpenAI CLIP is a multimodal encoder model designed to understand both visual and textual data. It excels at tasks such as zero-shot image classification, image-text similarity, and cross-modal retrieval by leveraging a shared embedding space.

Meta Llama 3 8b Instruct

Deployable

8B

Llama 3

4 Platforms

Deprecated Meta Llama 3 is a family of instruction-tuned, auto-regressive LLMs with optimized transformer architecture, offered in 8B and 70B sizes, designed for superior performance in dialogue applications and enhanced safety and helpfulness.

Meta Llama 3.1 405B Instruct FP8

Deployable

410B

Llama 3.1

1 Platform

Deprecated The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out)