15em 7em
first release second release open anc
about contents encoding frequency data using xaira bugs & caveats
obtaining contributing contents encoding frequency data using Xaira bugs & caveats
contents using annotations download
home overview masc I download
annotations software source code frequency data publications contributor's FAQ
project people anc mailing list contact us site map

Open ANC

Contents
Annotations
Using ANC Annotations
Download

Contents

The Open ANC includes over 14 million words from the Second Release that can be freely distributed. Please see the OANC license for more details.

The OANC includes the following data from the ANC Second Release:

Spoken
Name
Domain
No. files
No. words
charlotte face to face
93
198,295
switchboard telephone
2,307
3,019,477
Spoken Totals 
2,410
3,217,772
Written
Name
Domain
No. files
No. words
911 report government, technical
17
281,093
berlitz travel guides
179
1,012,496
biomed technical
837
3,349,714
eggan fiction
1
61,746
icic letters
245
91,318
oup non-fiction
45
330,524
plos technical
252
409,280
slate journal
4,531
4,238,808
verbatim journal
32
582,384
web data government
285
1,048,792
Written Totals 
6424
11,406,155
Corpus Totals 
8,832
14,623,927

Back to the top.

Annotations

The file organization and encoding conventions for the OANC is the same as in the ANC Second Release. Please consult the Second Release document encoding conventions for a full description.

The OANC data is distributed with the following annotations:

All annotations were originally produced automatically using GATE's ANNIE system. Some of the texts in the OANC include manually validated sentence boundaries (the list of texts validated for sentence boundaries is here). Note that the validated sentence boundaries are not included in the ANC Second Release.


USING THE ANC ANNOTATIONS

All ANC annotations are in stand-off format--that is, each annotation type is stored in a separate file and linked to the primary data, which is contained in a plain text (UTF-8) file. Annotations are represented as a graph of feature structures according to the specifications of the ISO Linguistic Annotation Format (LAF) (Ide and Romary 2007 and Ide and Suderman 2006).

A version of all, or part, of the ANC data with annotations merged in-line can be generated using the ANC Tool. Several output options are provided, including XML and non-XML formats that can be input to a variety of other software.

Please Note: The OANC is distributed with UTF-8 and UTF-16 character encoded text files while the ANC Second Release uses UTF-16 only. All of the software tools provided by the ANC assume a UTF-16 character encoding as the default encoding.

Be sure to specify the correct character encoding for the text files when processing the OANC with any of the ANC tools.

Back to the top.

Download

The OANC is a community resource that is freely available for download. Please see the OANC license for details.

We ask that you provide us with any of the following that may have resulted from your use of the OANC, which we will make freely available to the user community on this website:

UTF-8 Files

Download the Open ANC as a self installing jar file. (316 MB) See below for installation instructions.
Download the Open ANC as a zip file. (326 MB)

The OANC will unpack to approximately 4.8 GB.

Download the ANC Tool (2.5 MB) (required to process the standoff annotations)

Installation via the Jar file

The Java installers are executable jar files that can be used to install the Open ANC and the ANC Tool. On most operating systems you should be able double click on the .jar file. If that does not work, open a command prompt (Windows) shell (Linux), or terminal window (Max OS X) and run the command:

java -jar OANC-installer.jar

Installation Notes

File dialog boxes in Java are implemented slightly differently on different platforms. For instance, the "Open File" dialog box in Mac OS X does not allow the user to create a directory from within the dialog. Therefore on Mac OS X, users must do one of the following:

  1. Create the installation directory before running the installer. In this case the installer will warn you that the directory already exists when you select it. It is ok to ignore this warning.
  2. Select the directory where you want the OANC directory created from within the installer, and then manually append the name of the directory to be created.
  3. Type the full path to the installation directory manually.

Back to the top.