LightOnOCR 2 1B bbox soup

Type: OCR
Capabilities: vision

Overview

LightOnOCR-2 is an efficient end-to-end 1B-parameter vision-language model for converting documents (PDFs, scans, images) into clean, naturally ordered text without relying on brittle pipelines. This second version is trained on a larger and higher-quality corpus with stronger French, arXiv, and scan coverage, improved LaTeX handling, and cleaner normalization. LightOnOCR-2 achieves state-of-the-art performance on OlmOCR-Bench while being ~9× smaller and significantly faster than competing approaches.

Merged bbox variant: This model combines OCR-improving RLVR signals with bounding-box-focused RLVR updates via joint merging, preserving OCR quality while providing image localization.

Usage Tips

Do not include a system prompt or user prompt as the model has a tendency to repeat the prompt in its answer.

payload = {
    "model": "lightonai/LightOnOCR-2-1B-bbox-soup",
    "messages": [{
        "role": "user",
        "content": [{
            "type": "image_url",
            "image_url": {"url": f"data:image/png;base64,{image_base64}"}
        }]
    }],
    "max_tokens": 4096,
    "temperature": 0.2,
    "top_p": 0.9,
}

Rendering and Preprocessing Tips

Render PDFs to PNG or JPEG at a target longest dimension of 1540px
Maintain aspect ratio to preserve text geometry
Use one image per page

Pricing

Priority	Input Tokens (per 1M)	Output Tokens (per 1M)
Async	$0.08	$0.08
Batch (24h)	$0.05	$0.05

Playground

Open this model in the Playground.