Model Name
zai-org/GLM-5.1-FP8zai-org/GLM-5.1-FP8
- Type: Generation
- Capabilities:
reasoning
Overview
Meet GLM-5.1-FP8 - Z.ai's next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than GLM-5. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo and Terminal-Bench 2.0, making it especially strong for real-world coding, repository generation, terminal tasks, and long-horizon agentic workflows. GLM-5.1 is designed to stay productive over extended sessions, breaking down ambiguous problems, running experiments, reading results, identifying blockers, and improving through repeated iteration.
Best for:
- Agentic engineering and complex coding tasks
- Long-running tool-use workflows
- Repository generation and terminal-based development
- Ambiguous problems requiring judgment, experimentation, and sustained reasoning
Max Total Tokens: 202752
Sampling Parameters:
We have set the default sampling parameters using the recommended values from the GLM-5.1 generation configuration:
Temperature=1.0 and TopP=0.95.
You can adjust these on a per-request basis by setting the sampling parameters in the request body.
Thinking Mode:
This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}
This model is built for long-horizon agentic work and can sustain useful reasoning over extended sessions involving planning, tool use, experiments, and iterative debugging.
Pricing
| Priority | Input Tokens (per 1M) | Output Tokens (per 1M) |
|---|---|---|
| Realtime1 | $1.40 | $4.40 |
| High (1h) | $1.05 | $3.30 |
| Standard (24h) | $0.70 | $2.20 |
Playground
Open this model in the Playground.
Footnotes
-
Realtime availability is limited. Doubleword is primarily a batch API. ↩