Models & Pricing
Doubleword Batch API is priced per model based on token usage. Costs are calculated separately for input tokens (the content you send) and output tokens (the content generated by the model).
The table below outlines the models we have available and their pricing. If you are interested in understanding pricing for a model not listed below or if you'd like to request a new model - please reach out to support@doubleword.ai.
If you'd like to estimate the cost of your job, please upload your file in the Doubleword Console to view a cost estimate prior to submitting a batch.
Priority indicates the maximum processing time for batch requests. Actual processing times are typically faster than stated.
Model Details
google/gemma-4-31B-it
Gemma 4 31B is Google DeepMind’s most capable open model, built for advanced reasoning, coding, and multimodal understanding. It sits in the same general tier as Claude 4.5 Haiku and NVIDIA Nemotron 3 Super, with native function calling and structured JSON output for agentic workflows; strong image and video understanding for tasks like OCR and chart analysis; 256K context for long documents and repositories; and support for 140+ languages.
Thinking Mode
To enable reasoning, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}
———
Multimodal Input
Gemma 4 supports multimodal input, so you can send images or videos together with text in a single request.
Image Example
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
},
{
"type": "text",
"text": "Describe this image."
}
]
}
]Video Example
"messages": [
{
"role": "user",
"content": [
{
"type": "video_url",
"image_url": {
"url": "https://example.com/sample_video.mp4"
}
},
{
"type": "text",
"text": "Summarize what happens in this video."
}
]
}
]nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
NVIDIA Nemotron 3 Super 120B A12B NVFP4 is an open hybrid Mamba-Transformer LatentMoE model with 120 billion total parameters and 12 billion active parameters, built for agentic reasoning workloads such as coding, planning, tool use, and long-context tasks. It sits in the same capability tier as Qwen3.5-122B non-reasoning and ahead of GPT-OSS-120B, while also delivering higher throughput.
In line with NVIDIA's guidance, we use temperature=1.0 and top_p=0.95 across all tasks and serving backends, including reasoning, tool calling, and general chat.
To enable reasoning, pass extra_body={"chat_template_kwargs": {"enable_thinking": true}}. For a more concise reasoning mode that uses significantly fewer reasoning tokens than full thinking mode, add "low_effort": true to the same payload.
Qwen/Qwen3.5-9B
Qwen3.5-9B is a compact 9B parameter reasoning model with a 262K token native context length, designed for strong reasoning performance while remaining extremely cost-efficient. Despite its small size, it performs remarkably well on complex tasks and in Qwen's benchmarks outperformed the, much larger, GPT-OSS-120 model.
Thinking Mode:
This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}
This model does not support graduated thinking levels. Parameters such as reasoning_effort are not supported and will have no effect.
Qwen/Qwen3.5-4B
Qwen3.5-4B is a compact open 4B model with a native 262K context window, designed to deliver strong reasoning, coding, and long-context performance in a very small footprint. Qwen reports that it outperforms GPT-OSS-20B across several key benchmarks, including MMLU-Pro, GPQA Diamond, AA-LCR, and LongBench v2, making it a standout small model for cost-sensitive workloads.
Thinking Mode:
This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}
This model does not support graduated thinking levels. Parameters such as reasoning_effort are not supported and will have no effect.
Qwen/Qwen3.5-35B-A3B-FP8
Qwen3.5-35B-A3B is a high-intelligence, mid-sized model that hits a very compelling price/performance point for async workloads. In Qwen's published benchmarks, this model outperformed GPT-5-mini, GPT-OSS-120B, and Claude Sonnet 4.5.
Thinking Mode:
This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}
This model does not support graduated thinking levels. Parameters such as reasoning_effort are not supported and will have no effect.
Qwen/Qwen3.5-397B-A17B
Meet Qwen3.5-397B-A17B - released Feb 2026, it is Qwen's most powerful model, delivering performance similar to GPT-5.2 and Claude Opus 4.5 on challenging tasks including advanced reasoning, mathematics, and complex code generation. Offers frontier-level capabilities at a fraction of the cost. Best for:
- Tasks requiring maximum intelligence
- Complex analysis
- Sophisticated coding projects
- Scenarios where quality justifies the additional cost over smaller models
Max New Tokens: 16384
Max Total Tokens: 262144
Sampling Parameters:
We have set the default sampling parameters using the recommended values set out by the Qwen team:
We suggest using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.
For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
We use a default presence_penalty of 1.5 to bias the model against endless repetitions, if you still notice this behaviour try increasing the presence_penalty.
You can adjust these on a per-request basis by setting the sampling parameters in the request body.
Thinking Mode:
This model reasons step-by-step before responding by default. To disable thinking, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}
This model does not support graduated thinking levels. Parameters such as reasoning_effort are not supported and will have no effect.
deepseek-ai/DeepSeek-OCR-2
Meet DeepSeek-OCR-2 - Deepseek's latest OCR model. This model expands on Deepseek-OCR with a novel causal vision encoder that captures reading order to enhance structured extraction of text.
Usage Tips
Use Free OCR. for plain text extraction when you want only the text content from the image, without preserving layout or structure.
messages = [{"role": "user", "content": [{"type": "text", "text": "Free OCR."}, {"type": "image_url", "image_url": {"url": image_url}}]}]Use <|grounding|>Convert the document to markdown. for structured markdown extraction when you want to preserve headings, paragraphs, lists, and tables.
messages = [{"role": "user", "content": [{"type": "text", "text": "<|grounding|>Convert the document to markdown."}, {"type": "image_url", "image_url": {"url": image_url}}]}]lightonai/LightOnOCR-2-1B-bbox-soup
LightOnOCR-2 is an efficient end-to-end 1B-parameter vision-language model for converting documents (PDFs, scans, images) into clean, naturally ordered text without relying on brittle pipelines. This second version is trained on a larger and higher-quality corpus with stronger French, arXiv, and scan coverage, improved LaTeX handling, and cleaner normalization. LightOnOCR-2 achieves state-of-the-art performance on OlmOCR-Bench while being ~9× smaller and significantly faster than competing approaches.
Merged bbox variant: This model combines OCR-improving RLVR signals with bounding-box-focused RLVR updates via joint merging, preserving OCR quality while providing image localization.
Usage Tips
Do not include a system prompt or user prompt as the model has a tendency to repeat the prompt in its answer.
payload = {
"model": "lightonai/LightOnOCR-2-1B-bbox-soup",
"messages": [{
"role": "user",
"content": [{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"}
}]
}],
"max_tokens": 4096,
"temperature": 0.2,
"top_p": 0.9,
}Rendering and Preprocessing Tips
- Render PDFs to PNG or JPEG at a target longest dimension of 1540px
- Maintain aspect ratio to preserve text geometry
- Use one image per page
allenai/olmOCR-2-7B-1025-FP8
This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been additionally fine tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases.
Usage Tips
This model expects a prompt alongside the image. The default one used in the olmocr repo is this:
prompt = "Attached is one page of a document that you must process. Just return the plain text representation of this document as if you were reading it naturally. Convert equations to LateX and tables to HTML.\nIf there are any figures or charts, label them with the following markdown syntax \nReturn your output as markdown, with a front matter section on top specifying values for the primary_language, is_rotation_valid, rotation_correction, is_table, and is_diagram parameters"
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}},
],
}
]
Image Processing
This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels.
Qwen/Qwen3-VL-30B-A3B-Instruct-FP8
Meet Qwen3-VL-30B, the smaller model of the Qwen3-VL family, delivering performance similar to GPT-4.1-mini and Claude Sonnet 4. This highly capable mid-size model is suited for tasks that are constrained or require high token volumes. Excels at reasoning, coding, and structured output generation.
Best for:
- Production workloads requiring strong performance without frontier model costs
- Complex reasoning tasks
- Code generation
Qwen/Qwen3-VL-235B-A22B-Instruct-FP8
Meet Qwen3-VL-235B - delivering performance similar to GPT-5 Chat and Claude 4 Opus Thinking on challenging tasks including advanced reasoning, mathematics, and complex code generation. Offers frontier-level capabilities at a fraction of the cost. Best for:
- Tasks requiring maximum intelligence
- Complex analysis
- Sophisticated coding projects
- Scenarios where quality justifies the additional cost over smaller models
Max New Tokens: 16384
Max Total Tokens: 262144
Sampling Parameters:
We have set the default sampling parameters using the recommended values set out by the Qwen team:
We suggest using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.
For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
We use a default presence_penalty of 1.5 to bias the model against endless repetitions, if you still notice this behaviour try increasing the presence_penalty.
You can adjust these on a per-request basis by setting the sampling parameters in the request body.
openai/gpt-oss-20b
Meet gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
Qwen/Qwen3-Embedding-8B
The Qwen3 Embedding model series is the latest embeddings model in the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.
Exceptional Versatility: The embedding model has achieved state-of-the-art performance across a wide range of downstream application evaluations. The 8B size embedding model ranks No.1 in the MTEB multilingual leaderboard (as of June 5, 2025, score 70.58), while the reranking model excels in various text retrieval scenarios.
Comprehensive Flexibility: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
Multilingual Capability: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
Qwen3-Embedding-8B has the following features:
- Model Type: Text Embedding
- Supported Languages: 100+ Languages
- Number of Paramaters: 8B
- Context Length: 32k
- Embedding Dimension: Up to 4096, supports user-defined output dimensions ranging from 32 to 4096
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub.
Qwen/Qwen3-14B-FP8
Meet Qwen3-14B - a small text-only model from the Qwen3 release.
Best for:
- High volume tasks
- Tasks that do not require maximum performance, such as classification, extraction, or summarization
Max New Tokens: 16384
Max Total Tokens: 262144
Sampling Parameters:
We have set the default sampling parameters using the recommended values set out by the Qwen team:
We suggest using Temperature=0.7, TopP=0.8, TopK=20, and MinP=0.
For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
We use a default presence_penalty of 1.5 to bias the model against endless repetitions, if you still notice this behaviour try increasing the presence_penalty.
You can adjust these on a per-request basis by setting the sampling parameters in the request body.