GLM 5.1

Type: Generation
Capabilities: reasoning

Overview

Meet GLM-5.1-FP8 - Z.ai's next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than GLM-5. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo and Terminal-Bench 2.0, making it especially strong for real-world coding, repository generation, terminal tasks, and long-horizon agentic workflows. GLM-5.1 is designed to stay productive over extended sessions, breaking down ambiguous problems, running experiments, reading results, identifying blockers, and improving through repeated iteration.

Best for:

Agentic engineering and complex coding tasks
Long-running tool-use workflows
Repository generation and terminal-based development
Ambiguous problems requiring judgment, experimentation, and sustained reasoning

Max Total Tokens: 202752

Sampling Parameters:

We have set the default sampling parameters using the recommended values from the GLM-5.1 generation configuration:

Temperature=1.0 and TopP=0.95.

You can adjust these on a per-request basis by setting the sampling parameters in the request body.

Thinking Mode:

This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}

This model is built for long-horizon agentic work and can sustain useful reasoning over extended sessions involving planning, tool use, experiments, and iterative debugging.

Pricing

Priority	Input Tokens (per 1M)	Output Tokens (per 1M)
Realtime¹	$1.40	$4.40
Async	$1.05	$3.30
Batch (24h)	$0.70	$2.20

Playground

Open this model in the Playground.

Realtime availability is limited. Doubleword is primarily a batch API. ↩

GLM 5.1

Overview

Pricing

Playground

Footnotes