Registry Image Library

Overview of the Image library

This article delves into the fundamentals of Harbor Registry and its usage, providing an understanding of the pull, tag, and push methods. To gain a comprehensive insight, refer to the aforementioned concepts.

Hosted or Developed Images

Custom Docker images optimized for AI development and deployment, featuring CUDA, Jupyter, vLLM models, and Triton Server, all readily available in our Harbor Registry.

Name	Image Name	Description	Exposed Port
CUDA Development Kit	cuda-12.4.0-devel-ubuntu22.04	CUDA 12.4.0 development image based on Ubuntu 22.04	N/A
Jupyter Notebook Base	jupyter-base-notebook	Base Jupyter Notebook image	8888
NVIDIA VS Code Launcher	nvidia-vscode-rs-launcher	VS Code launcher with NVIDIA and tools support	8080
vLLM LLaMA 3 8B Instruct	vllm-llama-3-8b-instruct	vLLM LLaMA 3 8B model for instruction tasks	8000
vLLM Mistral-7B Instruct	vllm-mistral-7b-v0.3-instruct	vLLM Mistral-7B v3 model for instruction tasks	8000
vLLM Mistral-NeMo 12B	vllm-mistral-nemo-instruct-2407	The Mistral-Nemo-Instruct-2407 Large Language Model (LLM)k	8080
vLLM Llava model 7b	vllm-llava-1.5-7b-hf	LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data.	8080
vLLM OpenAI Compatible	vllm-openai	vLLM image compatible with OpenAI	8000
PostgreSQL	pgvector:latest	PostgreSQL with pg_vector	5432

To deploy the listed images, please follow the step-by-step instructions provided in our Run:AI Quickstart guide, and replace your Harbor Registry address with your own. As we continuously update these images, you can always pull the latest version by using the 'latest' tag, for example:

docker pull <HARBOR_ADDRESS>/{repo-public,rax-public}/vllm-llama-7b-v3-instruct:latest

vLLM High-Performance Language Model Serving Image

This image is designed for efficient, GPU-accelerated serving of large language models. It leverages vLLM to provide an OpenAI-compatible API server, making it ideal for deploying and scaling language model applications.

Key Features:

Flexible Configuration: Supports various environment variables for easy customization of model serving parameters.
OpenAI API Compatibility: Provides an API server compatible with OpenAI's format for seamless integration.
High-Performance Serving: Uses vLLM for efficient serving of large language models.
Quantization Support: Allows for model quantization to optimize memory usage and inference speed.
Customizable Chat Templates: Supports setting custom chat templates for varied use cases.

Advanced Features

GPU Optimization: Automatically detects and utilizes available NVIDIA GPUs.
NVIDIA Collective Communications Library (NCCL) P2P communication control.
Ray framework integration for distributed computing capabilities.
Eager execution mode for immediate operation processing.
Tensor Parallelism: Employs tensor parallelism for improved performance on multi-GPU setups.

Available Environment Variables

MODEL: Sets the path to the model. Default: "/models/Mistral-7B-Instruct-v0.3"
SERVED_MODEL_NAME: Specifies the name under which the model will be served. Default: Same as MODEL if not set
EXTRA_ARGS: Allows passing additional arguments to the server startup command.
QUANTIZATION: Enables quantization for the model.
Requires DTYPE to be set when used
DTYPE: Specifies the data type to use when QUANTIZATION is set.
GPU_MEMORY_UTILIZATION: Sets the GPU memory utilization. Default: 1
MAX_MODEL_LEN: Specifies the maximum model length.
CHAT_TEMPLATE: Sets a custom chat template.
NUM_GPU: It is automatically filled; however, you can set the environment variable with a value of less or more. Default: auto
PORT: Auto (8080) or user-defined.

Updated 10 months ago