- Introduced a new custom takeoff inference engine, which standardizes backend processes and offers an enhanced interface for generation models.
- In light of the unified backend, continuous batching now works for all generation models.
- Implemented GPU/CPU utilization tracking metrics.
- Released
takeoff_client
, a Python client package on PyPI for server interaction.
- Removed the option to select backends from the management frontend.
- Overhauled all documentation.
Add
API References
section.
- Added support for Mixtral