OCR Pipeline
OCR Engine Selection
Routing logic for selecting the right OCR engine based on file type, size, and content
OCR Engine Selection
Selection Logic
PDF Files
- Has text layer (textual PDF): Extract text directly — no OCR needed
- Scanned PDF with tables: Use DocTR
- Scanned PDF, no tables,
>5MB: Use Google Vision - Scanned PDF, no tables,
<5MB: Use PaddleOCR
Image Files (PNG, JPEG)
>10MB: Use Google Vision (cloud handles large images)<10MB: Use PaddleOCR (fast default)- Simple text fallback: Use Tesseract
CSV Files
- Use CSV Loader (deterministic parsing, confidence = 1.0)
Manual Override
User can specify engine explicitly in the request. If specified engine is unavailable, returns 400 with list of available engines.
Priority Order (Auto-detect)
DocTR → EasyOCR → Google Vision → PaddleOCR → Tesseract → Omniparser
Response Format
{
"text": "Extracted text content...",
"confidence": 0.95,
"engine_used": "paddleocr"
}
Configuration
- Google Vision requires
GOOGLE_VISION_API_KEYenv var - PaddleOCR runs on VPS at
76.13.123.120:8866 - DocTR runs on VPS via mindee/doctr Docker image
- Tesseract optional — not installed by default on Render
Was this page helpful?
Last updated today
Built with Documentation.AI