The goal of the Alpheios project is to help people learn how to learn languages as efficiently and enjoyably as possible, and in a way that best helps them understand their own literary heritage and culture, and the literary heritage and culture of other peoples throughout history. One of the principal tools, a Firefox plugin, allows a reader to browse a web page with Latin, ancient Greek, or Arabic, click on a word, and get a definition and morphological analysis of the word.
ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, has been designed to provide access to the data of the SFB 632 ("Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts").
This online tool can be used for a wide variety of annotation tasks, including visualization and collaboration.
brat is designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and "interpreted" by a computer. brat also supports the annotation of n-ary associations that can link together any number of other annotations participating in specific roles. brat also implements a number of features relying on natural language processing techniques to support human annotation efforts.
Parts-of-Speech (POS) tagging software - the classification of words into one or more categories based upon its definition, relationship with other words, or other context. CLAWS (Constituent Likelihood Automatic Word-tagging System) uses several methods to identify parts of speech., most notably a system called Hidden Markov models (HMMs) which involve counting cases and making a table of the probabilities of certain sequences of words.
CollateX-based text collation client. CollateX, run on an server independent from the URL above, is a powerful, fully automatic, baseless text collation engine for multiple witnesses. A second collation technique, ncritic, provides a slightly different baseless text collation. Each engine complements each other nicely. The user can use different files, even URLs, then output the result in GraphML, TEI, JSON, HTML, or SVG. Fuzzy matching is an option.
A software tool for performing concordance – the analysis of a set of words within its immediate context - on a body of text. The tool performs full concordance, reading and analysing each and every word in a text. It was initially written for the analysis of English texts, but has since been extended to cater for other Western languages. Limited support is also provided for text in East Asian scripts, such as Chinese and Korean.
CorpusSearch 2 allows users to construct and search syntactically annotated corpora, including finding and counting lexical and syntactic patterns, correcting systemic errors, and coding linguistic features.
EXMARaLDA (Extensible Markup Language for Discourse Annotation) is a system of concepts, data formats and tools for the computer assisted transcription and annotation of spoken language, and for the construction and analysis of spoken language corpora.
GATE (General Architecture for Text Engineering) is a sophisticated framework that allows manual and automatic annotation as well as the processing of all kinds of language resources. GATE has a broad community of users and developers, and comes with diverse plugins for specific linguistic tasks.
The term "lexomics" was originally coined to describe the computer-assisted detection of "words" (short sequences of bases) in genomes. When applied to literature as we do here, lexomics is the analysis of the frequency, distribution, and arrangement of words in large-scale patterns. The current suite of lexomics tools are:
MorphAdorner is a Java command-line program which acts as a pipeline manager for processes performing morphological adornment of words in a text. Currently MorphAdorner provides methods for adorning text with standard spellings, parts of speech and lemmata. MorphAdorner also provides facilities for tokenizing text, recognizing sentence boundaries, and extracting names and places.
"Protégé is a free, open source ontology editor and knowledge-base framework
The Protégé platform supports two main ways of modeling ontologies via the Protégé-Frames and Protégé-OWL editors. Protégé ontologies can be exported into a variety of formats including RDF(S), OWL, and XML Schema." -- Protege Home Page (viewed 30 October 2012)
QDA Miner is an easy-to-use mixed-methods qualitative data analysis software package for coding, annotating, retrieving and analyzing small and large collections of documents and images. QDA Miner may be used to analyze interview or focus-group transcripts, legal documents, journal articles, even entire books, as well as drawing, photographs, paintings, and other types of visual documents.
A software application for the playback of audio recordings. SoundScriber offers specific functionality for researchers that wish to transcribe a recording. It was originally developed for use in the Michigan Corpus of Academic Spoken English (MICASE) project and released for use by academics performing similar work.
Audio playback via installed audio codecs (e.g. Wav, MP3)