Usage

Take a look at the usage guides below to get started with the Inference Stack.

📄️ Metrics

We use Prometheus as our chosen integration for aggregation of metrics from your LLM applications. This is an open source time series database and monitoring solution.

📄️ Ingress

This guide details how to create network access to your models deployed in a Kubernetes cluster.

📄️ Active Monitoring

Once your inference stack is deployed you want to set up active monitoring to ensure you are alerted to any issues before they impact your users.

📄️ Probes

Kubernetes offers Liveness, Readiness and Startup probes to monitor the health of your applications. These can be configured in the Inference Stack to ensure your models are running correctly.