The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV Accuracy test. Between 1995 and 2006 it had little work done on it, but it is probably one of the most accurate open source OCR engines available.
Tesseract will read a binary, grey or color image and output text, ALTO, PAGE
... [More] XML, hOCR or PDF. It can read most common image formats.
Since 2020 the Internet Archive uses Tesseract to get text for its scanned documents. [Less]
Wayback Everywhere is a browser extension/addon that allows you to automatically redirect all pages to Internet Archive's Wayback Machine except the site (domains) that are in 'Excludes' List.
Automatic Detection of Wayback Machine Site's Error Messages - Addon tries to detect messages displayed
... [More] by Wayback Machine and either Saves a available page to Wayback or Adds the site to Exclude list based on the error message.
Auto-enable Reader mode for Archived pages
Load all links in a archived page as new tabs - From Settings page, enable the option to open all links of an archived page based on a "selector" that user enters in popup menu - intended to be used for opening all "chapters" in new tabs when reading a html book format pages. Example : Wikisource or wikibook [Less]
This site uses cookies to give you the best possible experience.
By using the site, you consent to our use of cookies.
For more information, please see our
Privacy Policy