Documentation

Comprehensive guides and documentation to help you start and utilize our platform.

Fine-tune and deploy a model

Fine-tuning allows you to customize pre-trained language models for your specific use cases and datasets, improving their performance on domain-specific tasks. The Dell Enterprise Hub provides you with guided steps, pre-configured settings and optimized training containers for fine-tuning models from the Dell Model Catalog and deploying them on your Dell infrastructure.

To start training one of the models available in the Dell Model Catalog, please follow the following steps:

  1. Select base model: Start by choosing a trainable model in the Model Catalog. Currently the following models are available for training:
  2. Configure training settings: From the Model Card, click Train, then select the Dell Platform you want to use. Next, set the local path of the CSV training dataset file, and the path to store the fine-tuned model. Learn below how to format and prepare your dataset at how should my dataset look. Finally, adjust the training configuration default settings to match your requirements.
  3. Deploy training container: With Dell Enterprise Hub, model training jobs are configured within ready-to-use, optimized training containers. You can run your training job by deploying the container using the provided command, executed within your Dell environment.
  4. Monitor training job: Track the progress of your training job to ensure optimal performance and results.

Training containers leverage Hugging Face autotrain, a powerful tool that simplifies the process of model training. Hugging Face autotrain supports a variety of configurations to customize training jobs, including:

  • lr: Initial learning rate for the training.
  • epochs: The number of training epochs.
  • batch_size: Size of the batches used during training.

More details on these configurations can be found in the Autotrain CLI documentation.

Formatting your dataset

To finetune LLMs your dataset should have a column with the formatted training samples. The column used for training is defined through the text-column argument when starting your training, below it would be text.

Example format:

text
human: hello \n bot: hi nice to meet you
human: how are you \n bot: I am fine
human: What is your name? \n bot: My name is Mary
human: Which is the best programming language? \n bot: Python

You can use both CSV and JSONL files. For more details, refer to the original documentation.

Deploying a fine-tuned model

To deploy a fine-tuned model on your Dell Platform, you can use the special "Bring Your Own Model" (BYOM) Dell inference container available in the Dell Enterprise Hub. This makes it easy to integrate fine-tuned models seamlessly into your Dell environment.

  1. Select base model: In the Model Catalog, open the Model Card for the base model used for fine-tuning, then click "Deploy Fine-Tuned Model" to access the BYOM feature.
  2. Configure inference settings: Select the Dell Platform you want to use, and the configuration options. Make sure to correctly set the Path to the local directory where your fine-tuned model is stored.
  3. Run deployment command: Copy the generated command, and run it inside your Dell environment.
  4. Test your model: Once the BYOM container is set up and endpoints are up and running, test your model with the provided sample code snippets.

Unlike direct deployment of models provided in the Dell Model Catalog, when you deploy a fine-tuned model, the model is mounted to the BYOM Dell inference container. It's important to make sure that the mounted directory contains the fine-tuned model and the provided path is correct.

Hardware requirements

Gemma

For models fine-tuned from the Gemma base model, the following hardware configurations are recommended for deployment:

Dell Platforms Number of Shards (GPUs) Max Input Tokens Max Total Tokens Max Batch Prefill Tokens
xe9680-nvidia-h100 1 4000 4096 16182
xe9680-amd-mi300x 1 4000 4096 16182
xe8640-nvidia-h100 1 4000 4096 16182
r760xa-nvidia-h100 1 4000 4096 16182
r760xa-nvidia-l40s 2 4000 4096 8192
r760xa-nvidia-l40s 4 4000 4096 16182

Llama 3.1 8B

For models fine-tuned from the Llama 3.1 8B base model, the following SKUs are suitable:

Dell Platforms Number of Shards (GPUs) Max Input Tokens Max Total Tokens Max Batch Prefill Tokens
xe9680-nvidia-h100 1 8000 8192 32768
xe9680-amd-mi300x 1 8000 8192 32768
xe8640-nvidia-h100 1 8000 8192 32768
r760xa-nvidia-h100 1 4000 4096 16182
r760xa-nvidia-l40s 2 8000 8192 16182
r760xa-nvidia-l40s 4 8000 8192 32768

Llama 3.1 70B

For models fine-tuned from the Llama 3.1 70B base model, use these configurations for deployment:

Dell Platforms Number of Shards (GPUs) Max Input Tokens Max Total Tokens Max Batch Prefill Tokens
xe9680-nvidia-h100 4 8000 8192 16182
xe9680-nvidia-h100 8 8000 8192 16182
xe9680-amd-mi300x 4 8000 8192 16182
xe9680-amd-mi300x 8 8000 8192 16182
xe8640-nvidia-h100 4 8000 8192 8192

Mistral 7B

Hardware configurations for models fine-tuned from the Mistral 7B are as follows:

Dell Platforms Number of Shards (GPUs) Max Input Tokens Max Total Tokens Max Batch Prefill Tokens
xe9680-nvidia-h100 1 8000 8192 32768
xe9680-amd-mi300x 1 8000 8192 32768
xe8640-nvidia-h100 1 8000 8192 32768
r760xa-nvidia-h100 1 4000 4096 16182
r760xa-nvidia-l40s 2 8000 8192 16182
r760xa-nvidia-l40s 4 8000 8192 32768

Mixtral 8x7B

For models fine-tuned from the Mixtral base model, the deployment configurations are:

Dell Platforms Number of Shards (GPUs) Max Input Tokens Max Total Tokens Max Batch Prefill Tokens
xe9680-nvidia-h100 4 8000 8192 16182
xe9680-nvidia-h100 8 8000 8192 16182
xe9680-amd-mi300x 4 8000 8192 16182
xe9680-amd-mi300x 8 8000 8192 16182
xe8640-nvidia-h100 4 8000 8192 8192
r760xa-nvidia-h100 4 8000 8192 16182