Document Format
We accept documents in almost any format. However, because of the massive amount of data we are processing, it is essential that we process documents automatically rather than by hand. In our case, “processing” means rendering the document in an XML format, where, ideally, titles, headings, words in italics, etc. are marked with specific tags identifying them as such. So, in addition to needing texts that are easy to process, we prefer texts in which things such as titles and italicized words are clearly identified. Any document produced with a word processor or marked up in HTML as a web page will usually contain this information (1) if the markup is, where possible, descriptive rather than presentational (i.e., tags that say what the content is rather than how it should look, as when you use <em> (emphasis) instead of <i> for italic); and (2) if markup is used consistently.
The following are some rules of thumb concerning formats. Documents that are very difficult to process automatically will likely not be included in the ANC, so we ask that if you have a choice, please submit your document in a format as near the top of the following list as possible:
Your document(s) will be very easy to process if
- they are marked up with well-formed XML and use a “standard” vocabulary such as the XCES, TEI, or DocBook.
- they are Word doc or docx files, or rtf files, and you have made consistent use of the styles defined by Word or that you defined yourself.
- they are marked up with well-formed XHTML and use the “strict” XHTML DTD.
- The documents are “plain text” with blank lines between titles, headings, and paragraphs. Note: only send plain text files as a last resort. Also double check that all characters in your document display correctly when saved as a plain text file. Try to use UTF-8 or UTF-16 (if that is an option) when submitting text files. But note that a lot of information is lost when we get plain text format–it is harder for us to identify a title as a title if your document does not contain this information explicitly (as would a word processor or HTML document
Your document(s) will be relatively easy to process if
- they are marked up with well-formed XML
- you (or someone else) marked them by hand in HTML–i.e., they were not produced by a web-page generating program such as Dreamweaver or FrontPage
Your document(s) will be harder to process if
- they were machine generated in HTML by a program like FrontPage, DreamWeaver, etc.
- they are in PDF
Your document(s) will be virtually impossible for us to process if
- they are in Quark, InDesign, or some other “publishing” software format
- they are in double-column PDF
- they contain very non-standard fonts
Contributing Multiple Documents
If you are contributing multiple documents that contain the same kind of texts (for example, several essays, a group of stories, etc.) you can do so in a single upload as follows:
- Put all the documents you wish to contribute in a folder;
- Compress the folder contents using any standard compression software such as Winzip, gzip, etc.;
- Navigate to the upload page (see below) and fill in the information. In the space on the upload page for entering the title, indicate something like “various essays, “various fiction”, etc.;
- Upload the compressed folder where indicated.