⌘K

⌘K

OCR PipelineEngine Selection

OCR Pipeline

OCR Engine Selection

Routing logic for selecting the right OCR engine based on file type, size, and content

OCR Engine Selection

Selection Logic

PDF Files

Has text layer (textual PDF): Extract text directly — no OCR needed
Scanned PDF with tables: Use DocTR
Scanned PDF, no tables, >5MB: Use Google Vision
Scanned PDF, no tables, <5MB: Use PaddleOCR

Image Files (PNG, JPEG)

>10MB: Use Google Vision (cloud handles large images)
<10MB: Use PaddleOCR (fast default)
Simple text fallback: Use Tesseract

CSV Files

Use CSV Loader (deterministic parsing, confidence = 1.0)

Manual Override

User can specify engine explicitly in the request. If specified engine is unavailable, returns 400 with list of available engines.

Priority Order (Auto-detect)

DocTR → EasyOCR → Google Vision → PaddleOCR → Tesseract → Omniparser

Response Format

{
  "text": "Extracted text content...",
  "confidence": 0.95,
  "engine_used": "paddleocr"
}

Configuration

Google Vision requires GOOGLE_VISION_API_KEY env var
PaddleOCR runs on VPS at 76.13.123.120:8866
DocTR runs on VPS via mindee/doctr Docker image
Tesseract optional — not installed by default on Render

Was this page helpful?

Last updated today

Built with Documentation.AI