Active Monitoring
Once your inference stack is deployed you want to set up active monitoring to ensure you are alerted to any issues before they impact your users.
Once your inference stack is deployed you want to set up active monitoring to ensure you are alerted to any issues before they impact your users.
Kubernetes offers Liveness, Readiness and Startup probes to monitor the health of your applications. These can be configured in the Inference Stack to ensure your models are running correctly.
We use Prometheus as our chosen integration for aggregation of metrics from your LLM applications. This is an open source time series database and monitoring solution.