[Sigmetrics] "Computational Linguistics Scientific Summarization" Task at CL-SciSumm, BIRNDL 2016

Tue Mar 1 02:28:11 EST 2016

== Call for Participation in a Shared Task ==

The 2nd "Computational Linguistics Scientific Summarization" Shared Task

http://wing.comp.nus.edu.sg/cl-scisumm2016/

You are invited to participate in the CL-SciSumm 2016 Shared Task, as
part of the Joint Workshop of Bibliometric-enhanced IR and NLP for
Digital Libraries (BIRNDL) at JCDL 2016
on 23 June 2016

http://wing.comp.nus.edu.sg/birndl-jcdl2016/

As a part of the Joint workshop on Bibliometric-enhanced Information
Retrieval and Natural Language Processing for Digital Libraries
(BIRNDL) at JCDL 2016, we are pleased to announce the 2nd CL-SciSumm
Shared Task on scientific paper summarization.  This task follows up
on the successful CL Pilot Task conducted as a part of the BiomedSumm
Track at the Text Analysis Conference 2014 (TAC 2014). Nine teams from
four countries expressed an interest in participating in the shared
task; three teams submitted system descriptions and findings.

The current shared task will be on automatic paper summarization in
the Computational Linguistics (CL) domain. The output summaries will
be of two types: faceted summaries of the traditional self-summary
(the abstract) and the community summary (the collection of citation
sentences ‘citances’). We also propose to group the citances by the
facets of the text that they refer to.

=== The Task ===

Given: A topic consisting of a Reference Paper (RP) and up to ten
Citing Papers (CPs) that all contain citations to the RP. In each CP,
the text spans (i.e., citances) have been identified that pertain to a
particular citation to the RP.

Task 1a: For each citance, identify the spans of text (cited text
spans) in the RP that most accurately reflect the citance. These are
of the granularity of a sentence fragment, a full sentence, or several
consecutive sentences (no more than 5).

Task 1b: For each cited text span, identify what facet of the paper it
belongs to, from a predefined set of facets.

Evaluation: Task 1 will be scored by overlap of text spans in the
system output vs the gold standard created by human annotators.

=== The Corpus ===

The CL-SciSumm corpus is created by randomly sampling documents from
the ACL Anthology corpus and selecting their citing papers.  For
CL-SciSumm 2016, we have selected three portions of this source
collection to be annotated and serve as training, development and test
collections. The training set of 10 articles is available for download
at GitHub https://github.com/WING-NUS/scisumm-corpus and can be used
by participants to pilot their systems.  Watch for updates to the
GitHub repository, as we are still updating a few training files. We
will finalise the training set by 29 February. The development set of
10 articles, an additional part of the same corpus, will be released
in April, which participants can add to the training set to tune their
system parameters. Finally the test set of 10 articles will be
released in May. The system outputs from the test set should be
submitted to the task organizers, for the collation of the final
results to be presented at the workshop.

=== Registration ===

Organizations wishing to participate in the CL Shared Task track at
BIRNDL 2016 are invited to register on EasyChair:
https://easychair.org/conferences/?conf=birndl2016 by 30 March 2016.
Participants are advised to register as soon as possible in order to
receive timely access to evaluation resources, including training
development and testing data. Registration for the task does not
commit you to participation - but is helpful to know for planning. All
participants who submit system runs are welcome to present their
system at the BIRNDL Workshop.

Dissemination of CL-SciSumm work and results other than in the
workshop proceedings is welcomed, but the conditions of participation
specifically preclude any advertising claims based on these results.
Any questions about conference participation may be sent to the
organizers mentioned below.

=== Important Dates ===

February 2016: Training set posted
March 30, 2016: Deadline for expression of interest and short system
descriptions due
April 8, 2016: Development set posted
April 22, 2016: Notification of acceptance of presentation proposals
April 29, 2016: Test set posted
May 20, 2016: System reports and system runs from the test set due
June 3, 2016: Camera ready contributions due
June 23 2016: Participants present at BIRNDL 2016 workshop in Newark,
New Jersey, USA

The CLSciSumm16 Task is expected to be of interest to a broad
community including those working in computational linguistics and
natural language processing, text summarization, discourse structure
in scholarly discourse, paraphrase, textual entailment and text
simplification.