Tesseract OCR Try It
Open source text recognition (OCR) Engine to extract printed text from images.
Version
Latest known: 5.5.0.20241111 (2024-11-11)
Latest known: 5.5.0.20241111 (2024-11-11)
Examples
1. Outputs raw text based on the text identified in the image
tesseract.exe d:\images\image-with-text.png - -l eng
2. Extracts all text from a image file to a text file.
tesseract.exe input_file.tiff output_file pdf
Try it
- OCR an image to plain text
Run 'tesseract <input> - -l eng' to read printed text from sample_page.png and stream the recognized text to stdout. Output language is set to English. - OCR with bounding boxes (TSV)
Use 'tesseract <input> - tsv' to emit recognized words with per-word bounding boxes, page/block/paragraph IDs, and confidence scores in tab-separated form. Useful for layout-aware extraction.
Agree to terms to run demos.