The National Language Toolkit (NLTK) is a set of open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics.
TXM is a free and open-source cross-platform Unicode, XML & TEI based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in.