An (optical character recognition) engine for creating editable and searchable electronic files from scanned paper documents, PDFs and digital photographs.
Features:
Recognition of Digital Camera and Mobile Phone Camera Images
Comprehensive Language Support
Complete Integration with Popular Office Applications
TXM is a free and open-source cross-platform Unicode, XML & TEI based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in.