(Text) Annotation Tools

(Text) Annotation Tools for Convenience

BRAT– brat is a web-based tool for text annotation; that is, for adding notes to existing text documents. brat is designed in particular for structured annotation, where the notes are not freeform text but have a fixed form that can be automatically processed and interpreted by a computer.

GATE– GATE is over 15 years old and is in active use for all types of computational task involving human language. GATE excels at text analysis of all shapes and sizes. From large corporations to small startups, from €multi-million research consortia to undergraduate projects, our user community is the largest and most diverse of any system of this type, and is spread across all but one of the continents1.

GATE is open source free software; users can obtain free support from the user and developer community via GATE.ac.uk or on a commercial basis from our industrial partners. We are the biggest open source language processing project with a development team more than double the size of the largest comparable projects (many of which are integrated with GATE2). More than €5 million has been invested in GATE development3; our objective is to make sure that this continues to be money well spent for all GATE’s users.

This note summarises the GATE software and process and gives examples of some of their uses. We believe that GATE is the leading system of its type, but as scientists we have to advise you not to take our word for it; that’s why we’ve measured our software in many of the competitive evaluations over the last decade-and-a-half (MUC, TREC, ACE, DUC, …). We invite you to give it a try, to get involved with the GATE community, and to contribute to human language science, engineering and development.
Knowtator –Knowtator is a general-purpose text annotation tool that is integrated with the Protégé knowledge representation system. Knowtator facilitates the manual creation of training and evaluation corpora for a variety of biomedical language processing tasks. Building on the strengths of the widely used Protégé knowledge representation system, we have developed Knowtator as a Protégé plugin that leverages Protégé’s knowledge representation capabilities to specify annotation schemas. Knowtator’s unique advantage over other annotation tools is the ease with which complex annotation schemas (e.g. schemas which have constrained relationships between entity types) can be defined and incorporated into use. Additionally, because annotation schemas are defined using a Protégé ontology, it is straightforward to incorporate domain knowledge into an annotation schema for semantic annotation.

MATEThe MATE project [Telematics Project LE4-8370] “aims to facilitate re-use of language resources by addressing the problems of creating, acquiring, and maintaining language corpora. The problems are addressed along two lines: (1) through the development of a standard for annotating resources; (2) through the provision of tools which will make the processes of knowledge acquisition and extraction more efficient. Specifically, MATE will treat spoken dialogue corpora at multiple levels, focusing on prosody, (morpho-) syntax, co-reference, dialogue acts, and communicative difficulties, as well as inter-level interaction. The results of the project will be of particular benefit to developers of spoken language dialogue systems but will also be directly useful for other applications of language engineering.”

Callisto– The Callisto annotation tool was developed to support linguistic annotation of textual sources for any Unicode-supported language. Information Extraction (IE) systems are increasingly easy to adapt to varying domains, and by using machine learning techniques, this process is becoming largely automatic. However, adaptive/adaptable systems require training and test data against which to measure and improve their performance. Hand annotation can be an arduous task, but a well designed user interface can greatly ease the burden. This is the function of Callisto.

Callisto has been built with a modular design, and utilizes standoff-annotation, allowing for unique tag-set definitions and domain dependent interfaces. Standoff-annotation support, provided by jATLAS, allows for nearly any annotation task to be represented.

The modular design of Callisto allows it to be extended with user interface components specific to a domain. Default tag editing capabilities are provided through a highlighted text display, and tag attribute tables. As domain specific extension components are developed, they may be integrated into the core of Callisto, to become part of the standard suite of available components.

Callisto is written in Java, taking advantage of its portability, and language support. Java v1.5 or greater is required. Java 6 is recommended, except on Macintosh, where Java 5 is recommended. Callisto has not been tested with Java 7.
Callisto is no longer being actively supported, and is provided as-is. The Callisto users mailing list is not very active, but you may be able to get some help there.

tagtog is a multi-user and web-based text annotation tool that supports manually and automatic annotation. The type of annotations supported are: entities, relations, document labels, entity labels and normalizations.
For the automatic annotations, users can use dictionaries or train machine-learning models to annotate at scale and semi-supervised using the web editor or the API.
Text can be imported in different formats: raw text, PDFs, TXT, HTML, XML. There are also specific shortcuts in place to import articles from known repositories such as PubMed. The main output is a standoff format for text annotations consisting of two files: annotations (JSON) and content (XML-compatible HTML5).

More specific details can be found in the documentation.

You may also like...