15em 7em
first release second release open anc
about contents encoding frequency data using xaira bugs & caveats
obtaining contributing contents encoding frequency data using Xaira bugs & caveats
contents using annotations download
home overview masc I download
annotations software source code frequency data publications contributor's FAQ
project people anc mailing list contact us site map

Open ANC

Contents
Annotations
Using ANC Annotations
Download

Contents

The Open ANC includes over 14 million words from the Second Release that can be freely distributed. Please see the OANC license for more details.

The OANC includes the following data from the ANC Second Release:

Spoken
Name
Domain
No. files
No. words
charlotte face to face
93
198,295
switchboard telephone
2,307
3,019,477
Spoken Totals 
2,410
3,217,772
Written
Name
Domain
No. files
No. words
911 report government, technical
17
281,093
berlitz travel guides
179
1,012,496
biomed technical
837
3,349,714
eggan fiction
1
61,746
icic letters
245
91,318
oup non-fiction
45
330,524
plos technical
252
409,280
slate journal
4,531
4,238,808
verbatim journal
32
582,384
web data government
285
1,048,792
Written Totals 
6424
11,406,155
Corpus Totals 
8,832
14,623,927

Back to the top.

Annotations

The file organization and encoding conventions for the OANC is the same as in the ANC Second Release. Please consult the Second Release document encoding conventions for a full description.

The OANC data is distributed with the following annotations:

All annotations were originally produced automatically using our enhancements to GATE's ANNIE system. Some of the texts in the OANC include manually validated sentence boundaries (the list of texts validated for sentence boundaries is here). Note that the validated sentence boundaries are not included in the ANC Second Release.

In addition to the annotations distributed with the OANC, we distribute contributed annotations of the OANC, including BBN named entitites and several different syntactic parses. Please consult the annotations page.


USING THE ANC ANNOTATIONS

All ANC annotations are in stand-off format--that is, each annotation type is stored in a separate file and linked to the primary data, which is contained in a plain text (UTF-8) file. Annotations are represented as a graph of feature structures according to the specifications of the ISO Linguistic Annotation Format (LAF) (ISO 24612). Please download the LAF/GrAF standard specification; see also Ide and Suderman 2012, Ide and Suderman 2007, Ide and Romary 2007, and Ide and Suderman 2006.

A version of all, or part, of the ANC data with annotations merged in-line can be generated using ANC2Go. Several output options are provided, including XML and non-XML formats that can be input to a variety of other software. In addition, GrAF annotations can be loaded into annotation tools such as GATE and UIMA; see the tools page for details.

Back to the top.


DOWNLOAD THE CURRENT VERSION

The OANC is a community resource that is freely available for download and use for research and development, including commercial development.

We ask that you provide us with any of the following that may have resulted from your use of the OANC, which we will make freely available to the user community on this website:

  • Download the Open ANC in GrAF format: zip | tgz
  •  


    PREVIOUS VERSIONS

    Download the Open ANC in the original XML format as a zip file. (326 MB)

    Download the Open ANC in the original XML format as a self installing jar file. (316 MB) See below for installation instructions.

    The OANC will unpack to approximately 4.8 GB.

    Installation via the Jar file

    The Java installers are executable jar files that can be used to install the Open ANC and the ANC Tool. On most operating systems you should be able double click on the .jar file. If that does not work, open a command prompt (Windows) shell (Linux), or terminal window (Max OS X) and run the command:

    java -jar OANC-installer.jar

    Installation Notes

    File dialog boxes in Java are implemented slightly differently on different platforms. For instance, the "Open File" dialog box in Mac OS X does not allow the user to create a directory from within the dialog. Therefore on Mac OS X, users must do one of the following:

    1. Create the installation directory before running the installer. In this case the installer will warn you that the directory already exists when you select it. It is ok to ignore this warning.
    2. Select the directory where you want the OANC directory created from within the installer, and then manually append the name of the directory to be created.
    3. Type the full path to the installation directory manually.

    Back to the top.