Tesseract OCR

Tesseract OCR

Open source text recognition (OCR) Engine to extract printed text from images.

Category
Image
License
Apache 2
Platforms
Windows macOS Linux
Released
2005
Path
c:\tesseract\tesseract.exe
Benefits
Supports a wide variety of languages.
Notes
Latest downloadable Windows build is 5.5.0 (UB Mannheim, 2024-11-11). Newer source tags (5.5.1, 5.5.2) have no published Windows binary yet.
Version
Latest known: 5.5.0.20241111 (2024-11-11)

Examples

1. Outputs raw text based on the text identified in the image

tesseract.exe d:\images\image-with-text.png - -l eng

2. Extracts all text from a image file to a text file.

tesseract.exe input_file.tiff output_file pdf

Try it

Agree to terms to run demos.