Model Name
allenai/olmOCR-2-7B-1025-FP8olmOCR 2 7B 1025
- Type: OCR
- Capabilities:
vision
Overview
This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the olmOCR-mix-1025 dataset. It has been additionally fine tuned using GRPO RL training to boost its performance at math equations, tables, and other tricky OCR cases.
Usage Tips
This model expects a prompt alongside the image. The default one used in the olmocr repo is this:
prompt = "Attached is one page of a document that you must process. Just return the plain text representation of this document as if you were reading it naturally. Convert equations to LateX and tables to HTML.\nIf there are any figures or charts, label them with the following markdown syntax \nReturn your output as markdown, with a front matter section on top specifying values for the primary_language, is_rotation_valid, rotation_correction, is_table, and is_diagram parameters"
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}},
],
}
]
Image Processing
This model expects as input a single document image, rendered such that the longest dimension is 1288 pixels.
Pricing
| Priority | Input Tokens (per 1M) | Output Tokens (per 1M) |
|---|---|---|
| High (1h) | $0.15 | $0.15 |
| Standard (24h) | $0.10 | $0.10 |
Playground
Open this model in the Playground.