MASC, as a part of the OANC, is a community-based effort. The goal is to provide completely open data and linguistic annotations for language processing research and development. Annotations are distributed in a common format. We also provide means to download the data and selected annotations in several formats suitable for different applications, and modules to read MASC annotations into widely-used platforms such as GATE, UIMA, and NLTK. See the downloads page for more information.
We solicit annotations for any linguistic phenomenon, for any portion of the MASC data. To contribute, you may first download the MASC data (text only) from the downloads page for purposes of annotation.
To contribute annotations, send them via email to anc [at] anc.org.
All annotations are distributed in both their original format (as contributed) and in GrAF format. The ANC project performs the transduction to GrAF.
Whenever possible, contributed annotations should be provided in standoff form. Annotations produced using GATE may be exported using the “save as XML” option, or may be rendered directly into GrAF format using the ANC GATE plugins. The primary data, as distributed in MASC, should not be modified in any way. If annotations are produced in-line, the primary data should not be modified for normalization, correction, etc. Corrections and normalizations should ideally be provided as annotations linked to the original text, using a feature named “corrected” or “normalized” with a value providing the corrected or normalized text.
Data contributions are made through the OANC interface or may be sent via email to anc [at] anc.org. Please see the ANC Contribution page for information about the types and formats of texts that are acceptable.