Registry Image Library
Overview of the Image library
This article delves into the fundamentals of Harbor Registry and its usage, providing an understanding of the pull, tag, and push methods. To gain a comprehensive insight, refer to the aforementioned concepts.
Hosted or Developed Images
Custom Docker images optimized for AI development and deployment, featuring CUDA, Jupyter, vLLM models, and Triton Server, all readily available in our Harbor Registry.
Name | Image Name | Description | Exposed Port |
---|---|---|---|
CUDA Development Kit | cuda-12.4.0-devel-ubuntu22.04 | CUDA 12.4.0 development image based on Ubuntu 22.04 | N/A |
Jupyter Notebook Base | jupyter-base-notebook | Base Jupyter Notebook image | 8888 |
NVIDIA VS Code Launcher | nvidia-vscode-rs-launcher | VS Code launcher with NVIDIA and tools support | 8080 |
vLLM LLaMA 3 8B Instruct | vllm-llama-3-8b-instruct | vLLM LLaMA 3 8B model for instruction tasks | 8000 |
vLLM Mistral-7B Instruct | vllm-mistral-7b-v0.3-instruct | vLLM Mistral-7B v3 model for instruction tasks | 8000 |
vLLM Mistral-NeMo 12B | vllm-mistral-nemo-instruct-2407 | The Mistral-Nemo-Instruct-2407 Large Language Model (LLM)k | 8080 |
vLLM Llava model 7b | vllm-llava-1.5-7b-hf | LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. | 8080 |
vLLM OpenAI Compatible | vllm-openai | vLLM image compatible with OpenAI | 8000 |
PostgreSQL | pgvector:latest | PostgreSQL with pg_vector | 5432 |
To deploy the listed images, please follow the step-by-step instructions provided in our Run:AI Quickstart guide, and replace your Harbor Registry address with your own. As we continuously update these images, you can always pull the latest version by using the 'latest' tag, for example:
docker pull <HARBOR_ADDRESS>/{repo-public,rax-public}/vllm-llama-7b-v3-instruct:latest
vLLM High-Performance Language Model Serving Image
This image is designed for efficient, GPU-accelerated serving of large language models. It leverages vLLM to provide an OpenAI-compatible API server, making it ideal for deploying and scaling language model applications.
Key Features:
- Flexible Configuration: Supports various environment variables for easy customization of model serving parameters.
- OpenAI API Compatibility: Provides an API server compatible with OpenAI's format for seamless integration.
- High-Performance Serving: Uses vLLM for efficient serving of large language models.
- Quantization Support: Allows for model quantization to optimize memory usage and inference speed.
- Customizable Chat Templates: Supports setting custom chat templates for varied use cases.
Advanced Features
- GPU Optimization: Automatically detects and utilizes available NVIDIA GPUs.
- NVIDIA Collective Communications Library (NCCL) P2P communication control.
- Ray framework integration for distributed computing capabilities.
- Eager execution mode for immediate operation processing.
- Tensor Parallelism: Employs tensor parallelism for improved performance on multi-GPU setups.
Available Environment Variables
- MODEL: Sets the path to the model. Default: "/models/Mistral-7B-Instruct-v0.3"
- SERVED_MODEL_NAME: Specifies the name under which the model will be served. Default: Same as MODEL if not set
- EXTRA_ARGS: Allows passing additional arguments to the server startup command.
- QUANTIZATION: Enables quantization for the model.
- Requires DTYPE to be set when used
- DTYPE: Specifies the data type to use when QUANTIZATION is set.
- GPU_MEMORY_UTILIZATION: Sets the GPU memory utilization. Default: 1
- MAX_MODEL_LEN: Specifies the maximum model length.
- CHAT_TEMPLATE: Sets a custom chat template.
- NUM_GPU: It is automatically filled; however, you can set the environment variable with a value of less or more. Default: auto
- PORT: Auto (8080) or user-defined.
Updated about 1 month ago