Registry Image Library

Overview of the Image library

This article delves into the fundamentals of Harbor Registry and its usage, providing an understanding of the pull, tag, and push methods. To gain a comprehensive insight, refer to the aforementioned concepts.

Hosted or Developed Images

Custom Docker images optimized for AI development and deployment, featuring CUDA, Jupyter, vLLM models, and Triton Server, all readily available in our Harbor Registry.

NameImage NameDescriptionExposed Port
CUDA Development Kitcuda-12.4.0-devel-ubuntu22.04CUDA 12.4.0 development image based on Ubuntu 22.04N/A
Jupyter Notebook Basejupyter-base-notebookBase Jupyter Notebook image8888
NVIDIA VS Code Launchernvidia-vscode-rs-launcherVS Code launcher with NVIDIA and tools support8080
vLLM LLaMA 3 8B Instructvllm-llama-3-8b-instructvLLM LLaMA 3 8B model for instruction tasks8000
vLLM Mistral-7B Instructvllm-mistral-7b-v0.3-instructvLLM Mistral-7B v3 model for instruction tasks8000
vLLM Mistral-NeMo 12Bvllm-mistral-nemo-instruct-2407The Mistral-Nemo-Instruct-2407 Large Language Model (LLM)k8080
vLLM Llava model 7bvllm-llava-1.5-7b-hfLLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data.8080
vLLM OpenAI Compatiblevllm-openaivLLM image compatible with OpenAI8000
PostgreSQLpgvector:latestPostgreSQL with pg_vector5432

To deploy the listed images, please follow the step-by-step instructions provided in our Run:AI Quickstart guide, and replace your Harbor Registry address with your own. As we continuously update these images, you can always pull the latest version by using the 'latest' tag, for example:

docker pull <HARBOR_ADDRESS>/{repo-public,rax-public}/vllm-llama-7b-v3-instruct:latest

vLLM High-Performance Language Model Serving Image

This image is designed for efficient, GPU-accelerated serving of large language models. It leverages vLLM to provide an OpenAI-compatible API server, making it ideal for deploying and scaling language model applications.

Key Features:

  • Flexible Configuration: Supports various environment variables for easy customization of model serving parameters.
  • OpenAI API Compatibility: Provides an API server compatible with OpenAI's format for seamless integration.
  • High-Performance Serving: Uses vLLM for efficient serving of large language models.
  • Quantization Support: Allows for model quantization to optimize memory usage and inference speed.
  • Customizable Chat Templates: Supports setting custom chat templates for varied use cases.

Advanced Features

  • GPU Optimization: Automatically detects and utilizes available NVIDIA GPUs.
  • NVIDIA Collective Communications Library (NCCL) P2P communication control.
  • Ray framework integration for distributed computing capabilities.
  • Eager execution mode for immediate operation processing.
  • Tensor Parallelism: Employs tensor parallelism for improved performance on multi-GPU setups.

Available Environment Variables

  • MODEL: Sets the path to the model. Default: "/models/Mistral-7B-Instruct-v0.3"
  • SERVED_MODEL_NAME: Specifies the name under which the model will be served. Default: Same as MODEL if not set
  • EXTRA_ARGS: Allows passing additional arguments to the server startup command.
  • QUANTIZATION: Enables quantization for the model.
  • Requires DTYPE to be set when used
  • DTYPE: Specifies the data type to use when QUANTIZATION is set.
  • GPU_MEMORY_UTILIZATION: Sets the GPU memory utilization. Default: 1
  • MAX_MODEL_LEN: Specifies the maximum model length.
  • CHAT_TEMPLATE: Sets a custom chat template.
  • NUM_GPU: It is automatically filled; however, you can set the environment variable with a value of less or more. Default: auto
  • PORT: Auto (8080) or user-defined.