0.10.0

January 22, 2024

Introduced a new custom takeoff inference engine, which standardizes backend processes and offers an enhanced interface for generation models.
In light of the unified backend, continuous batching now works for all generation models.
Implemented GPU/CPU utilization tracking metrics.
Released takeoff_client, a Python client package on PyPI for server interaction.
Removed the option to select backends from the management frontend.
Overhauled all documentation. Add API References section.
Added support for Mixtral