Qwen3.5 4B

Type: Generation
Capabilities: vision, reasoning

Overview

Qwen3.5-4B is a compact open 4B model with a native 262K context window, designed to deliver strong reasoning, coding, and long-context performance in a very small footprint. Qwen reports that it outperforms GPT-OSS-20B across several key benchmarks, including MMLU-Pro, GPQA Diamond, AA-LCR, and LongBench v2, making it a standout small model for cost-sensitive workloads.

Thinking Mode:

This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}

This model does not support graduated thinking levels. Parameters such as reasoning_effort are not supported and will have no effect.

Pricing

Priority	Input Tokens (per 1M)	Output Tokens (per 1M)
Async	$0.05	$0.08
Batch (24h)	$0.04	$0.06

Playground

Open this model in the Playground.