Model Name
Qwen/Qwen3-VL-235B-A22B-Instruct-FP8Qwen3 VL 235B A22B Instruct
- Type: Generation
- Capabilities:
vision
Overview
Meet Qwen3-VL-235B - delivering performance similar to GPT-5 Chat and Claude 4 Opus Thinking on challenging tasks including advanced reasoning, mathematics, and complex code generation. Offers frontier-level capabilities at a fraction of the cost. Best for:
- Tasks requiring maximum intelligence
- Complex analysis
- Sophisticated coding projects
- Scenarios where quality justifies the additional cost over smaller models
Max New Tokens: 16384
Max Total Tokens: 262144
Sampling Parameters:
We have set the default sampling parameters using the recommended values set out by the Qwen team:
We suggest using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.
For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
We use a default presence_penalty of 1.5 to bias the model against endless repetitions, if you still notice this behaviour try increasing the presence_penalty.
You can adjust these on a per-request basis by setting the sampling parameters in the request body.
Pricing
| Priority | Input Tokens (per 1M) | Output Tokens (per 1M) |
|---|---|---|
| Realtime1 | $0.60 | $1.20 |
| High (1h) | $0.15 | $0.55 |
| Standard (24h) | $0.10 | $0.40 |
Playground
Open this model in the Playground.
Footnotes
-
Realtime availability is limited. Doubleword is primarily a batch API. ↩