Learn more about the Dell Enterprise Hub
The Dell Enterprise Hub is an online portal making it easy to train and deploy the latest open AI models on-premise using Dell platforms, and securely build Generative AI applications. The Dell Enterprise Hub is the result of a deep engineering collaboration between Dell Technologies and Hugging Face, it includes:
The Dell Enterprise hub provides a secure, streamlined experience for Dell customers to build Generative AI applications in confidence, taking full advantage of the computing power of the Dell Platforms at their disposal.
Deploying a model on a Dell Platform is a simple 4 steps process:
If you want to deploy a fine-tuned model instead of the curated models above, refer to How can I deploy a fine-tuned model?
The Dell Enterprise Hub inference containers leverage Hugging Face ML production technologies, including Text Generation Inference for Large Language Models. The predefined configurations provided can be easily adjusted to fit your needs, by changing the default values for:
NUM_SHARD
: The number of shards, or tensor parallelism, used for the model.MAX_INPUT_LENGTH
: The maximum input length that the model can handle.MAX_TOTAL_TOKENS
: The maximum total tokens the model can generate.MAX_BATCH_PREFILL_TOKENS
: The maximum number of tokens to prefill the batch used for continuous batching.More information can be found in the Hugging Face Text Generation Inference documentation, and also in the Text Generation Inference v3 release notes.
To start training one of the models available in the Dell Model Catalog, please follow the following steps:
Training containers leverage Hugging Face autotrain
, a powerful tool that simplifies the process of model training. Hugging Face autotrain
supports a variety of configurations to customize training jobs, including:
lr
: Initial learning rate for the training.epochs
: The number of training epochs.batch_size
: Size of the batches used during training.More details on these configurations can be found in the Autotrain CLI documentation.
To finetune LLMs your dataset should have a column with the formatted training samples. The column used for training is defined through the text-column
argument when starting your training, below it would be text
.
Example Format:
text
human: hello \n bot: hi nice to meet you
human: how are you \n bot: I am fine
human: What is your name? \n bot: My name is Mary
human: Which is the best programming language? \n bot: Python
You can use both CSV and JSONL files. For more details, refer to the original documentation.
To deploy a fine-tuned model on your Dell Platform, you can use the special "Bring Your Own Model" (BYOM) Dell inference container available in the Dell Enterprise Hub. This makes it easy to integrate fine-tuned models seamlessly into your Dell environment.
Unlike direct deployment of models provided in the Dell Model Catalog, when you deploy a fine-tuned model, the model is mounted to the BYOM Dell inference container. It's important to make sure that the mounted directory contains the fine-tuned model and the provided path is correct.
For models fine-tuned from the Gemma base model, the following hardware configurations are recommended for deployment:
Dell Platforms | Number of Shards (GPUs) | Max Input Tokens | Max Total Tokens | Max Batch Prefill Tokens |
---|---|---|---|---|
xe9680-nvidia-h100 | 1 | 4000 | 4096 | 16182 |
xe9680-amd-mi300x | 1 | 4000 | 4096 | 16182 |
xe8640-nvidia-h100 | 1 | 4000 | 4096 | 16182 |
r760xa-nvidia-h100 | 1 | 4000 | 4096 | 16182 |
r760xa-nvidia-l40s | 2 | 4000 | 4096 | 8192 |
r760xa-nvidia-l40s | 4 | 4000 | 4096 | 16182 |
For models fine-tuned from the Llama 3.1 8B base model, the following SKUs are suitable:
Dell Platforms | Number of Shards (GPUs) | Max Input Tokens | Max Total Tokens | Max Batch Prefill Tokens |
---|---|---|---|---|
xe9680-nvidia-h100 | 1 | 8000 | 8192 | 32768 |
xe9680-amd-mi300x | 1 | 8000 | 8192 | 32768 |
xe8640-nvidia-h100 | 1 | 8000 | 8192 | 32768 |
r760xa-nvidia-h100 | 1 | 4000 | 4096 | 16182 |
r760xa-nvidia-l40s | 2 | 8000 | 8192 | 16182 |
r760xa-nvidia-l40s | 4 | 8000 | 8192 | 32768 |
For models fine-tuned from the Llama 3.1 70B base model, use these configurations for deployment:
Dell Platforms | Number of Shards (GPUs) | Max Input Tokens | Max Total Tokens | Max Batch Prefill Tokens |
---|---|---|---|---|
xe9680-nvidia-h100 | 4 | 8000 | 8192 | 16182 |
xe9680-nvidia-h100 | 8 | 8000 | 8192 | 16182 |
xe9680-amd-mi300x | 4 | 8000 | 8192 | 16182 |
xe9680-amd-mi300x | 8 | 8000 | 8192 | 16182 |
xe8640-nvidia-h100 | 4 | 8000 | 8192 | 8192 |
Hardware configurations for models fine-tuned from the Mistral 7B are as follows:
Dell Platforms | Number of Shards (GPUs) | Max Input Tokens | Max Total Tokens | Max Batch Prefill Tokens |
---|---|---|---|---|
xe9680-nvidia-h100 | 1 | 8000 | 8192 | 32768 |
xe9680-amd-mi300x | 1 | 8000 | 8192 | 32768 |
xe8640-nvidia-h100 | 1 | 8000 | 8192 | 32768 |
r760xa-nvidia-h100 | 1 | 4000 | 4096 | 16182 |
r760xa-nvidia-l40s | 2 | 8000 | 8192 | 16182 |
r760xa-nvidia-l40s | 4 | 8000 | 8192 | 32768 |
For models fine-tuned from the Mixtral base model, the deployment configurations are:
Dell Platforms | Number of Shards (GPUs) | Max Input Tokens | Max Total Tokens | Max Batch Prefill Tokens |
---|---|---|---|---|
xe9680-nvidia-h100 | 4 | 8000 | 8192 | 16182 |
xe9680-nvidia-h100 | 8 | 8000 | 8192 | 16182 |
xe9680-amd-mi300x | 4 | 8000 | 8192 | 16182 |
xe9680-amd-mi300x | 8 | 8000 | 8192 | 16182 |
xe8640-nvidia-h100 | 4 | 8000 | 8192 | 8192 |
r760xa-nvidia-h100 | 4 | 8000 | 8192 | 16182 |
A deprecated model status indicates the model is no longer actively maintained, but remains fully functional for inference and fine-tuning (if applicable), whilst it will no longer receive updates, or regular maintenance.
Deprecation typically occurs due to low usage metrics, customers can always continue using deprecated models, but we recommend migrating to actively maintained alternatives e.g. Meta Llama 3.1 may be deprecated, but Meta Llama 3.3 is available.