3 docs tagged with "kubernetes"

Faster Model Loading

LLM weights need to be loaded from the internet during the initial startup of inference containers. These downloads can significantly delay your first deployments, scaling operations, and downtime during upgrades. This is especially problematic when running GPU workloads, where provisioning additional high-performance nodes for blue-green deployments is very expensive.

Ingress

This guide details how to create network access to your models deployed in a Kubernetes cluster.

Your First Deployment

In this guide, we'll walk you through the steps to deploy the Doubleword Inference Stack using Helm on a Kubernetes cluster. Customization and advanced configurations will not be covered in this introductory guide.