DoublewordDoubleword

Model Name

Qwen/Qwen3.5-4B

Qwen3.5 4B

  • Type: Generation
  • Capabilities: vision, reasoning

Overview

Qwen3.5-4B is a compact open 4B model with a native 262K context window, designed to deliver strong reasoning, coding, and long-context performance in a very small footprint. Qwen reports that it outperforms GPT-OSS-20B across several key benchmarks, including MMLU-Pro, GPQA Diamond, AA-LCR, and LongBench v2, making it a standout small model for cost-sensitive workloads.


Thinking Mode:

This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}

This model does not support graduated thinking levels. Parameters such as reasoning_effort are not supported and will have no effect.

Pricing

PriorityInput Tokens (per 1M)Output Tokens (per 1M)
High (1h)$0.05$0.08
Standard (24h)$0.04$0.06

Playground

Open this model in the Playground.