Model Name
Qwen/Qwen3.5-4BQwen3.5 4B
- Type: Generation
- Capabilities:
vision,reasoning
Overview
Qwen3.5-4B is a compact open 4B model with a native 262K context window, designed to deliver strong reasoning, coding, and long-context performance in a very small footprint. Qwen reports that it outperforms GPT-OSS-20B across several key benchmarks, including MMLU-Pro, GPQA Diamond, AA-LCR, and LongBench v2, making it a standout small model for cost-sensitive workloads.
Thinking Mode:
This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}
This model does not support graduated thinking levels. Parameters such as reasoning_effort are not supported and will have no effect.
Pricing
| Priority | Input Tokens (per 1M) | Output Tokens (per 1M) |
|---|---|---|
| High (1h) | $0.05 | $0.08 |
| Standard (24h) | $0.04 | $0.06 |
Playground
Open this model in the Playground.