Model Name
lightonai/LightOnOCR-2-1B-bbox-soupLightOnOCR 2 1B bbox soup
- Type: OCR
- Capabilities:
vision
Overview
LightOnOCR-2 is an efficient end-to-end 1B-parameter vision-language model for converting documents (PDFs, scans, images) into clean, naturally ordered text without relying on brittle pipelines. This second version is trained on a larger and higher-quality corpus with stronger French, arXiv, and scan coverage, improved LaTeX handling, and cleaner normalization. LightOnOCR-2 achieves state-of-the-art performance on OlmOCR-Bench while being ~9× smaller and significantly faster than competing approaches.
Merged bbox variant: This model combines OCR-improving RLVR signals with bounding-box-focused RLVR updates via joint merging, preserving OCR quality while providing image localization.
Usage Tips
Do not include a system prompt or user prompt as the model has a tendency to repeat the prompt in its answer.
payload = {
"model": "lightonai/LightOnOCR-2-1B-bbox-soup",
"messages": [{
"role": "user",
"content": [{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"}
}]
}],
"max_tokens": 4096,
"temperature": 0.2,
"top_p": 0.9,
}Rendering and Preprocessing Tips
- Render PDFs to PNG or JPEG at a target longest dimension of 1540px
- Maintain aspect ratio to preserve text geometry
- Use one image per page
Pricing
| Priority | Input Tokens (per 1M) | Output Tokens (per 1M) |
|---|---|---|
| High (1h) | $0.08 | $0.08 |
| Standard (24h) | $0.05 | $0.05 |
Playground
Open this model in the Playground.