Quickstart

To deploy a model from the Dell Model Catalog, you can follow these simple steps:

Browse the Model Catalog: Navigate to the Model Catalog and select the model you want to deploy.
Open the Model Card: Click on the model to view its details, specifications, and deployment options.
Configure Deployment Settings:
- Select your Dell Platform (e.g., xe9680-nvidia-h100, r760xa-nvidia-l40s)
- Choose the number of GPUs/shards needed
- Configure parameters like max tokens and batch size
Copy the Deployment Command: The platform will generate a Docker command optimized for your selected hardware configuration.
Run the Container: Execute the command in your Dell environment to start the model inference server.
Test Your Model: Once the container is running, you can test the model using the provided sample code snippets or API endpoints.

For models without pre-downloaded weights, make sure to include both HF_TOKEN and MODEL_ID environment variables when running your Docker container to authenticate with Hugging Face Hub and download the model weights.

Managing deployments via CLI and Python SDK

Dell provides both a Command Line Interface (CLI) and a Python SDK to support various developer workflows. These tools enable you to:

Browse available models and applications.
Auto-generate deployment snippets for Dell hardware.
Run system checks for software/hardware compatibility.
Use APIs for greater control and automation.

For more information you can visit the Dell AI CLI and SDK documentation: