Collaborative Resource Development and Delivery

LREC workshop

Istanbul, Turkey

May 27, 2012

 
 

A confluence of needs and activities points to a new emphasis in computational linguistics to address lexical, propositional, and discourse semantics through corpora. A few examples are:

(1)the demand for high quality linguistic annotations of corpora representing a wide range of phenomena, especially at the semantic level, to support machine learning and computational linguistics research in general;

(2)the demand for high quality annotated corpora representing a broad range of genres that are flexible and extensible as need demands;

(3)the demand for high quality lexical and semantic resources to incorporate into the annotation process, and for the annotation process to produce;

(4)the need for easy-to-use, open access to all of these resources for everyone.

 

The workshop includes a special session devoted to means and considerations for community-based linguistic annotation, with a special emphasis on the collaborative community effort surrounding the Manually Annotated Sub-Corpus (MASC) (http://www.anc.org/MASC).

Such resources can be very costly to produce, due to the need for manual creation or validation to ensure quality. Therefore, to answer the growing need and lower the costs of resource creation and enhancement, there is a movement within the community toward collaborative resource development, including collaborative corpus annotation and collective creation/enhancement of lexical resources and knowledge bases. Collaborative development encompasses both engaging the community in annotation and development of common resources, as well as crowd-sourcing and similar solutions.

Technological advances now enable development of web-based environments for collaborative annotation and enhancement of language resources, including annotated corpora, lexicons, and others; and platforms to support web services that deliver data, annotations, and other resources as well as high-quality automated linguistic annotations of language data. At the same time, crowd sourcing is being explored as a viable means of producing high quality resources.  Given the recent advancements in technology plus novel methods to collect manually annotated data, it is important to develop new methods of quality control, hopefully ones that permit rapid acquisition and sharing of resources. This workshop covers all dimensions of collaborative resource development and delivery, with a specific focus on case studies and lessons learned. Topics include:

•Web services and platforms for collaborative resource development and distribution;

•Crowd sourcing for resource development, including studies of efficacy;

•Strategies and issues for open resource distribution;

•Evaluation of collaboratively developed resources;

•Position papers outlining issues and proposing solutions for community-based collaborative resource development and/ or delivery.