DoublewordDoubleword

Model Name

lightonai/LightOnOCR-2-1B-bbox-soup

LightOnOCR 2 1B bbox soup

  • Type: OCR
  • Capabilities: vision

Overview

LightOnOCR-2 is an efficient end-to-end 1B-parameter vision-language model for converting documents (PDFs, scans, images) into clean, naturally ordered text without relying on brittle pipelines. This second version is trained on a larger and higher-quality corpus with stronger French, arXiv, and scan coverage, improved LaTeX handling, and cleaner normalization. LightOnOCR-2 achieves state-of-the-art performance on OlmOCR-Bench while being ~9× smaller and significantly faster than competing approaches.

Merged bbox variant: This model combines OCR-improving RLVR signals with bounding-box-focused RLVR updates via joint merging, preserving OCR quality while providing image localization.


Usage Tips

Do not include a system prompt or user prompt as the model has a tendency to repeat the prompt in its answer.

payload = {
    "model": "lightonai/LightOnOCR-2-1B-bbox-soup",
    "messages": [{
        "role": "user",
        "content": [{
            "type": "image_url",
            "image_url": {"url": f"data:image/png;base64,{image_base64}"}
        }]
    }],
    "max_tokens": 4096,
    "temperature": 0.2,
    "top_p": 0.9,
}

Rendering and Preprocessing Tips

  • Render PDFs to PNG or JPEG at a target longest dimension of 1540px
  • Maintain aspect ratio to preserve text geometry
  • Use one image per page

Pricing

PriorityInput Tokens (per 1M)Output Tokens (per 1M)
High (1h)$0.08$0.08
Standard (24h)$0.05$0.05

Playground

Open this model in the Playground.