Extract text from images and PDF documents using OCR. Fast, accurate, with optional LLM vision fallback for low-confidence results.
API endpoints require authentication when the server is configured with allowed API keys.
Extract text from an uploaded image or PDF document.
{
"data": "Le texte extrait du document...",
"metadata": {
"processing_time_ms": 342,
"method": "ocr_tesseract"
}
}
| Method | Description |
|---|---|
ocr_tesseract | Text extracted via Tesseract OCR engine |
text_extraction | Text extracted directly from a text-based PDF |
ocr_tesseract_with_llm_fallback | Tesseract result refined by LLM vision (low confidence) |
Analyze an uploaded image or PDF document into useful text, retained blocks, removed noise, and PDF image metadata.
{
"useful_text": "Titre du document\nParagraphe utile...",
"blocks": [
{
"id": "block-1",
"kind": "title",
"text": "Titre du document",
"page": 1,
"confidence": 1.0,
"bbox": null,
"heading_level": 1
}
],
"images": [
{
"id": "image-1-1",
"page": 1,
"width": 640,
"height": 480,
"bbox": null,
"mime_type": "image/jpeg",
"caption": null,
"caption_confidence": 0.0,
"nearby_text": "",
"alt_text": null
}
],
"removed_blocks": [
{
"kind": "page_number",
"text": "Page 1 sur 12",
"page": 1,
"reason": "pagination_pattern"
}
],
"metadata": {
"processing_time_ms": 512,
"method": "text_extraction",
"pages": 1
}
}
heading_level is set when the heading level is available from PDF outlines or can be inferred. The service does not generate alt text yet. Bounding boxes, captions, and nearby text remain empty when reliable source data is not available.
Health check endpoint. Returns service status.
{
"status": "ok"
}
All responses are JSON. Successful responses return the result directly. Error responses use a consistent envelope.
{
"error": "Description of what went wrong"
}
| Status | Reason |
|---|---|
| 400 | Unsupported file format, file too large, or invalid PDF |
| 401 | Missing or invalid API key |
| 422 | Text extraction failed on a valid file |
| 500 | Internal server error (OCR engine failure) |
$ curl -X POST https://your-domain.com/api/v1/ocr \ -H "X-API-Key: your-api-key" \ -F "file=@scan.jpg"
$ curl -X POST https://your-domain.com/api/v1/ocr \ -H "X-API-Key: your-api-key" \ -F "file=@document.pdf"
$ curl -X POST https://your-domain.com/api/v1/analyze \ -H "X-API-Key: your-api-key" \ -F "file=@document.pdf"
$ curl https://your-domain.com/health