The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available. The source code will read a binary, grey or color image and output text, ALTO
... [More], hOCR or PDF. Tesseract can read most common image formats.
Since 2020 the Internet Archive uses Tesseract to get text for its scanned documents. [Less]
OCRopus is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.
The OCRopus engine is based on two research projects: a high-performance handwriting
... [More] recognizer developed in the mid-90s and deployed by the US Census bureau, and novel high-performance layout analysis methods.
OCRopus development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications. [Less]
Cuneiform is an OCR system originally developed and open sourced by Cognitive technologies. This project aims to create a fully portable version of Cuneiform.
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
This site uses cookies to give you the best possible experience.
By using the site, you consent to our use of cookies.
For more information, please see our
Privacy Policy