The goal of the Alpheios project is to help people learn how to learn languages as efficiently and enjoyably as possible, and in a way that best helps them understand their own literary heritage and culture, and the literary heritage and culture of other peoples throughout history. One of the principal tools, a Firefox plugin, allows a reader to browse a web page with Latin, ancient Greek, or Arabic, click on a word, and get a definition and morphological analysis of the word.
ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, has been designed to provide access to the data of the SFB 632 ("Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts").
The purpose of ATLAS.ti is to help researchers uncover and systematically analyze complex phenomena hidden in text and multimedia data. The program provides tools that let the user locate, code, and annotate findings in primary data material, to weigh and evaluate their importance, and to visualize complex relations between them.
This online tool can be used for a wide variety of annotation tasks, including visualization and collaboration.
brat is designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and "interpreted" by a computer. brat also supports the annotation of n-ary associations that can link together any number of other annotations participating in specific roles. brat also implements a number of features relying on natural language processing techniques to support human annotation efforts.
CHET-C, or Chapel Hill Electronic Text-Converter, is a software tool designed to convert digital texts that employ standard typographic editorial conventions into EpiDoc-compliant XML files. A series of regular expressions are applied to the source document and the results are placed into a “shell file” (i.e., a template), which can be customised in advance. The regular expressions are stored in a separate configuration file to permit customisation, but this file is generated from regular expressions embedded in the EpiDoc Guidelines source files.
Parts-of-Speech (POS) tagging software - the classification of words into one or more categories based upon its definition, relationship with other words, or other context. CLAWS (Constituent Likelihood Automatic Word-tagging System) uses several methods to identify parts of speech., most notably a system called Hidden Markov models (HMMs) which involve counting cases and making a table of the probabilities of certain sequences of words.
CollateX-based text collation client. CollateX, run on an server independent from the URL above, is a powerful, fully automatic, baseless text collation engine for multiple witnesses. A second collation technique, ncritic, provides a slightly different baseless text collation. Each engine complements each other nicely. The user can use different files, even URLs, then output the result in GraphML, TEI, JSON, HTML, or SVG. Fuzzy matching is an option.
A software tool for performing concordance – the analysis of a set of words within its immediate context - on a body of text. The tool performs full concordance, reading and analysing each and every word in a text. It was initially written for the analysis of English texts, but has since been extended to cater for other Western languages. Limited support is also provided for text in East Asian scripts, such as Chinese and Korean.
After creating a free account, users can submit requests for mining and analyzing JSTOR content. By submitting a query, a user will receive a random sample of 1,000 of JSTOR's 4.6 million documents; more documents can be received by contacting JSTOR directly. Users can choose to receive the following results:
Citations Only (all requests come with citations by default)
DiscoverText allows users to import data from a variety of sources (including Facebook & Twitter feeds, plain text, Word, Excel, public YouTube comments, blogs/wikis, PDF, etc.), code them, and generate tag clouds and reports.
EPPT allows users to encode image-based scholarly editions without having to know XML syntax. It automates or semi-automates repeating attributes, and provides templates to reduce errors and accelerate the encoding process.