Usage
Take a look at the usage guides below to get started with the Inference Stack.
📄️ Metrics
We use Prometheus as our chosen integration for aggregation of metrics from your LLM applications. This is an open source time series database and monitoring solution.
📄️ Ingress
This guide details how to create network access to your models deployed in a Kubernetes cluster.
📄️ Active Monitoring
Once your inference stack is deployed you want to set up active monitoring to ensure you are alerted to any issues before they impact your users.
📄️ Probes
Kubernetes offers Liveness, Readiness and Startup probes to monitor the health of your applications. These can be configured in the Inference Stack to ensure your models are running correctly.