Deploying the Doubleword Control Layer

These guides provides instructions for deploying the Doubleword Inference Stack to Kubernetes using Helm.

Kubernetes deployment is ideal for organizations requiring high availability, automatic scaling, and integration with existing Kubernetes infrastructure. For simpler single-server deployments, we recommend running containers directly.

Prerequisites

Before beginning your deployment, ensure you have the necessary infrastructure and credentials prepared.

System Requirements

Your Kubernetes cluster must be running version 1.24 or later with kubectl configured to access your target cluster. You'll also need Helm 3.8 or later installed for managing the deployment.

Node Availability

Ensure that your Kubernetes nodes have sufficient resources (CPU, memory, and disk space) to run the Inference Stack components. It's recommended to use dedicated nodes for production deployments.

GPU Nodes

To run most inference workloads you need to provision GPU nodes in your Kubernetes cluster. We strongly recommend using the latest NVIDIA GPU powered nodes you can budget for your project. They will run your workloads most efficiently and provide the best performance.

📄️ Getting Started

In this guide, we'll walk you through the steps to deploy the Doubleword Inference Stack using Helm on a Kubernetes cluster. Customization and advanced configurations will not be covered in this introductory guide.

📄️ Faster Model Loading

LLM weights need to be loaded from the internet during the initial startup of inference containers. These downloads can significantly delay your first deployments, scaling operations, and downtime during upgrades. This is especially problematic when running GPU workloads, where provisioning additional high-performance nodes for blue-green deployments is very expensive.

Prerequisites​

System Requirements​

Node Availability​

GPU Nodes​