DoublewordDoubleword

Model Name

zai-org/GLM-5.1-FP8

zai-org/GLM-5.1-FP8

  • Type: Generation
  • Capabilities: reasoning

Overview

Meet GLM-5.1-FP8 - Z.ai's next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than GLM-5. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo and Terminal-Bench 2.0, making it especially strong for real-world coding, repository generation, terminal tasks, and long-horizon agentic workflows. GLM-5.1 is designed to stay productive over extended sessions, breaking down ambiguous problems, running experiments, reading results, identifying blockers, and improving through repeated iteration.

Best for:

  • Agentic engineering and complex coding tasks
  • Long-running tool-use workflows
  • Repository generation and terminal-based development
  • Ambiguous problems requiring judgment, experimentation, and sustained reasoning

Max Total Tokens: 202752

Sampling Parameters:

We have set the default sampling parameters using the recommended values from the GLM-5.1 generation configuration:


Temperature=1.0 and TopP=0.95.


You can adjust these on a per-request basis by setting the sampling parameters in the request body.


Thinking Mode:

This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}

This model is built for long-horizon agentic work and can sustain useful reasoning over extended sessions involving planning, tool use, experiments, and iterative debugging.

Pricing

PriorityInput Tokens (per 1M)Output Tokens (per 1M)
Realtime1$1.40$4.40
High (1h)$1.05$3.30
Standard (24h)$0.70$2.20

Playground

Open this model in the Playground.

Footnotes

  1. Realtime availability is limited. Doubleword is primarily a batch API.