Deploying the Doubleword Control Layer
These guides provides instructions for deploying the Doubleword Inference Stack to Kubernetes using Helm.
Kubernetes deployment is ideal for organizations requiring high availability, automatic scaling, and integration with existing Kubernetes infrastructure. For simpler single-server deployments, we recommend running containers directly.
Prerequisites
Before beginning your deployment, ensure you have the necessary infrastructure and credentials prepared.
System Requirements
Your Kubernetes cluster must be running version 1.24 or later with kubectl configured to access your target cluster. You'll also need Helm 3.8 or later installed for managing the deployment.
Node Availability
Ensure that your Kubernetes nodes have sufficient resources (CPU, memory, and disk space) to run the Inference Stack components. It's recommended to use dedicated nodes for production deployments.
📄️ Getting Started
In this guide, we'll walk you through the steps to deploy the Doubleword Inference Stack using Helm on a Kubernetes cluster. Customization and advanced configurations will not be covered in this introductory guide.
📄️ Faster Model Loading
LLM weights need to be loaded from the internet during the initial startup of inference containers. These downloads can significantly delay your first deployments, scaling operations, and downtime during upgrades. This is especially problematic when running GPU workloads, where provisioning additional high-performance nodes for blue-green deployments is very expensive.