DoublewordDoubleword

Models & Pricing

Doubleword Batch API is priced per model based on token usage. Costs are calculated separately for input tokens (the content you send) and output tokens (the content generated by the model).

The table below outlines the models we have available and their pricing. If you are interested in understanding pricing for a model not listed below or if you'd like to request a new model - please reach out to support@doubleword.ai.

Model NamePriorityInput Tokens (per 1M)Output Tokens (per 1M)
Qwen/Qwen3.5-4BHigh (1h)$0.05$0.08
Qwen/Qwen3.5-4BStandard (24h)$0.04$0.06
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4Realtime1$0.30$0.75
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4High (1h)$0.23$0.56
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4Standard (24h)$0.00$0.00
Qwen/Qwen3.5-9BRealtime1$0.08$0.70
Qwen/Qwen3.5-9BHigh (1h)$0.04$0.35
Qwen/Qwen3.5-9BStandard (24h)$0.03$0.29
Qwen/Qwen3.5-35B-A3B-FP8Realtime1$0.25$2.00
Qwen/Qwen3.5-35B-A3B-FP8High (1h)$0.07$0.30
Qwen/Qwen3.5-35B-A3B-FP8Standard (24h)$0.05$0.20
Qwen/Qwen3-14B-FP8Realtime1$0.05$0.60
Qwen/Qwen3-14B-FP8High (1h)$0.03$0.30
Qwen/Qwen3-14B-FP8Standard (24h)$0.02$0.20
Qwen/Qwen3.5-397B-A17BRealtime1$0.60$3.60
Qwen/Qwen3.5-397B-A17BHigh (1h)$0.30$1.80
Qwen/Qwen3.5-397B-A17BStandard (24h)$0.15$1.20
Qwen/Qwen3-Embedding-8BRealtime1$0.04$0.00
Qwen/Qwen3-Embedding-8BHigh (1h)$0.03$0.00
Qwen/Qwen3-Embedding-8BStandard (24h)$0.02$0.00
openai/gpt-oss-20bRealtime1$0.04$0.30
openai/gpt-oss-20bHigh (1h)$0.03$0.20
openai/gpt-oss-20bStandard (24h)$0.02$0.15
Qwen/Qwen3-VL-30B-A3B-Instruct-FP8Realtime1$0.16$0.80
Qwen/Qwen3-VL-30B-A3B-Instruct-FP8High (1h)$0.07$0.30
Qwen/Qwen3-VL-30B-A3B-Instruct-FP8Standard (24h)$0.05$0.20
Qwen/Qwen3-VL-235B-A22B-Instruct-FP8Realtime1$0.60$1.20
Qwen/Qwen3-VL-235B-A22B-Instruct-FP8High (1h)$0.15$0.55
Qwen/Qwen3-VL-235B-A22B-Instruct-FP8Standard (24h)$0.10$0.40

If you'd like to estimate the cost of your job, please upload your file in the Doubleword Console to view a cost estimate prior to submitting a batch.

Note

Priority indicates the maximum processing time for batch requests. Actual processing times are typically faster than stated.

Model Details

Qwen/Qwen3.5-4B

Playground

Qwen3.5-4B is a compact 4B parameter reasoning model with a 262K token native context length, designed for strong reasoning performance while remaining extremely cost-efficient. Despite its small size, it performs remarkably well on complex tasks and in Qwen's benchmarks show it is comparable to the much larger GPT-OSS-20 model.

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

Playground

NVIDIA Nemotron 3 Super 120B A12B NVFP4 is an open hybrid Mamba-Transformer LatentMoE model with 120 billion total parameters and 12 billion active parameters, built for agentic reasoning workloads such as coding, planning, tool use, and long-context tasks. It sits in the same capability tier as Qwen3.5-122B non-reasoning and ahead of GPT-OSS-120B, while also delivering higher throughput.


In line with NVIDIA's guidance, we use temperature=1.0 and top_p=0.95 across all tasks and serving backends, including reasoning, tool calling, and general chat.


To enable reasoning, pass extra_body={"chat_template_kwargs": {"enable_thinking": true}}. For a more concise reasoning mode that uses significantly fewer reasoning tokens than full thinking mode, add "low_effort": true to the same payload.

Qwen/Qwen3.5-9B

Playground

Qwen3.5-9B is a compact 9B parameter reasoning model with a 262K token native context length, designed for strong reasoning performance while remaining extremely cost-efficient. Despite its small size, it performs remarkably well on complex tasks and in Qwen's benchmarks outperformed the, much larger, GPT-OSS-120 model.

Qwen/Qwen3.5-35B-A3B-FP8

Playground

Qwen3.5-35B-A3B is a high-intelligence, mid-sized model that hits a very compelling price/performance point for async workloads. In Qwen's published benchmarks, this model outperformed GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5.


Thinking Mode:

This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}

This model does not support graduated thinking levels. Parameters such as reasoning_effort are not supported and will have no effect.

Qwen/Qwen3-14B-FP8

Playground

Meet Qwen3-14B - a small text-only model from the Qwen3 release.

Best for:

  • High volume tasks
  • Tasks that do not require maximum performance, such as classification, extraction, or summarization

Max New Tokens: 16384

Max Total Tokens: 262144

Sampling Parameters:

We have set the default sampling parameters using the recommended values set out by the Qwen team:


We suggest using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.

For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.


We use a default presence_penalty of 1.5 to bias the model against endless repetitions, if you still notice this behaviour try increasing the presence_penalty.

You can adjust these on a per-request basis by setting the sampling parameters in the request body.

Qwen/Qwen3.5-397B-A17B

Playground

Meet Qwen3.5-397B-A17B - released Feb 2026, it is Qwen's most powerful model, delivering performance similar to GPT-5.2 and Claude Opus 4.5 on challenging tasks including advanced reasoning, mathematics, and complex code generation. Offers frontier-level capabilities at a fraction of the cost. Best for:

  • Tasks requiring maximum intelligence
  • Complex analysis
  • Sophisticated coding projects
  • Scenarios where quality justifies the additional cost over smaller models

Max New Tokens: 16384

Max Total Tokens: 262144

Sampling Parameters:

We have set the default sampling parameters using the recommended values set out by the Qwen team:


We suggest using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.

For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.


We use a default presence_penalty of 1.5 to bias the model against endless repetitions, if you still notice this behaviour try increasing the presence_penalty.

You can adjust these on a per-request basis by setting the sampling parameters in the request body.

Qwen/Qwen3-Embedding-8B

Playground

The Qwen3 Embedding model series is the latest embeddings model in the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

Exceptional Versatility: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks No.1 in the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58), while the reranking model excels in various text retrieval scenarios.

Comprehensive Flexibility: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.

Multilingual Capability: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.

Qwen3-Embedding-8B has the following features:

  • Model Type: Text Embedding
  • Supported Languages: 100+ Languages
  • Number of Paramaters: 8B
  • Context Length: 32k
  • Embedding Dimension: Up to 4096, supports user-defined output dimensions ranging from 32 to 4096

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub.

openai/gpt-oss-20b

Playground

Meet gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

Playground

Meet Qwen3-VL-30B, the smaller model of the Qwen3-VL family, delivering performance similar to GPT-4.1-mini and Claude Sonnet 4. This highly capable mid-size model is suited for tasks that are constrained or require high token volumes. Excels at reasoning, coding, and structured output generation.

Best for:

  • Production workloads requiring strong performance without frontier model costs
  • Complex reasoning tasks
  • Code generation

Qwen/Qwen3-VL-235B-A22B-Instruct-FP8

Playground

Meet Qwen3-VL-235B - delivering performance similar to GPT-5 Chat and Claude 4 Opus Thinking on challenging tasks including advanced reasoning, mathematics, and complex code generation. Offers frontier-level capabilities at a fraction of the cost. Best for:

  • Tasks requiring maximum intelligence
  • Complex analysis
  • Sophisticated coding projects
  • Scenarios where quality justifies the additional cost over smaller models

Max New Tokens: 16384

Max Total Tokens: 262144

Sampling Parameters:

We have set the default sampling parameters using the recommended values set out by the Qwen team:


We suggest using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.

For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.


We use a default presence_penalty of 1.5 to bias the model against endless repetitions, if you still notice this behaviour try increasing the presence_penalty.

You can adjust these on a per-request basis by setting the sampling parameters in the request body.

Footnotes

  1. Realtime availability is limited. Doubleword is primarily a batch API. 2 3 4 5 6 7 8 9