[Asis-l] Automated Document Genre Classification Workshop: Supporting Digital Curation, Information Retrieval, and Knowledge Extraction

Tue Jul 21 07:50:15 EDT 2009

***Apologies for cross-posting***

DCC and Robert Gordon Joint Workshop:
Automated Document Genre Classification - Supporting Digital Curation,
Information Retrieval, and Knowledge Extraction
9 September 2009
Microsoft Research, Cambridge, UK
http://www.dcc.ac.uk/events/genre-classification-2009/ 

In co-operation with the International Conference on the Theory of
Information Retrieval (ICTIR) and Microsoft Research, Cambridge, UK, the
Digital Curation Centre (DCC) and Robert Gordon University are holding a one
day workshop on Automated Document Genre Classification. This workshop is
intended as a brainstorming session for building a research agenda for
automated genre classification, identification and recognition that will
enhance and support work flows within:

-Digital curation and preservation 
-Information management 
-Information seeking, search, and retrieval 
-Information extraction and knowledge discovery 

There is a lack of consensus in the genre classification research community
on methods of genre taxonomy generation, evaluation, and applications of the
study in existing systems. This event is intended to open up a discussion
forum and identify:

-How to constructively establish a useful genre taxonomy 
-How to integrate and apply genre classification within existing information

 systems 
-How to evaluate and consolidate its usefulness and effectiveness within 
 these target systems. 

This workshop will bring together core people within genre classification
research and the areas of research mentioned above to establish a research
road map for bringing genre classification research to applicable maturity.

Motivation
The automation of metadata extraction is crucial to digital curation
activities, as information deluge is likely to result in enormous costs in
manual extraction. The organisation of documents into their genre classes
that indicate the physical and conceptual structure of the text, could serve
as a starting point for both automatic and manual extraction by narrowing
down the possible areas within the text from which to extract the required
information.

Collection profiling is an important aspect of risk assessment and data
audit within organisational collections. Each organisation focuses on
document genres strongly associated to the activities and services central
to the organisation: e.g. a research article as a part of experimental
research at a research centre; a report as part of a news coverage in a
newspaper corporation; a financial budget report as part of a business
venture in a company. The identification of core document genres could form
building blocks for defining criteria for identifying risks to the
collection that are cognizant of procedural context of the organisation.

Information retrieval techniques mostly rely on relevance measures
calculated on the basis of the document's topical content. However, a
document with the same topic may be created with different objectives and as
part of different processes (e.g. research as opposed to product promotion)
resulting in different levels of relevance, depth, usefulness, and
reliability as a source of information. Genre classification (i.e.
distinguishing an advertisement about a camera from a product review of the
same camera) may be an effective method of supporting finer levels of
granularity in relevance judgements.

Tentative Programme
The workshop will consist of four sessions. The first three sessions will
comprise three presentations each from selected speakers, followed by
discussion. The fourth session will take the format of open discussion.

09:00 – 09:30 Registration 

09:30 – 11:00 Session I: Understanding genre classification — building a
taxonomy 

11:00 – 11:15 Coffee 

11:15 – 12:45 Session II: Role of genre classification in existing
information systems 

12:45 – 14:00 Lunch 

14:00 – 15:30 Session III: Viability of evaluating the effectiveness and
usefulness of genre classification 

15:30 – 15:45 Coffee 

15:45 – 16:45 Session IV: Building a research road map — open discussion and
summary of previous sessions 

16:45 – 17:00 Close 

Costs
This event will cost £75.00. 

Registration
Registration is available at
http://www.dcc.ac.uk/events/genre-classification-2009/register 

Best regards,
Joy Davidson
DCC Training Coordinator and ERPANET British Editor
Humanities Advanced Technology and Information Institute (HATII)
George Service House, 11 University Gardens,
University of Glasgow
Glasgow G12 8QJ
Scotland
Tel: +44(0)141 330 8592
Fax: +44(0)141 330 3788
http://www.dcc.ac.uk
http://www.digitalpreservationeurope.eu
british.editor at erpanet.org