[Asis-l] ACM SIGIR 2012 Workshop on Open Source Information Retrieval

Andrew Trotman andrew at cs.otago.ac.nz
Mon May 14 18:38:55 EDT 2012


ACM SIGIR 2012 WORKSHOP ON OPEN SOURCE INFORMATION RETRIEVAL
16 August 2012, Portland, Oregon, USA
http://opensearchlab.otago.ac.nz/

OVERVIEW (The short version)
This workshop is providing a venue for users and authors of open source IR
tools to get together and discuss their joint future. Of particular interest
is how to work together to build OpenSearchLab, an open source, live and
functioning, online web search engine for research purposes.

Short papers (posters) and Demonstrations are encouraged.

Schedule
July 2, 2012 Deadline for Paper, Poster, and Demo Submissions
August 16, 2012 SIGIR 2012 Workshop on Open Source Information Retrieval 

Submission Details
Submission must be original work, not previously published, and not
currently submitted elsewhere. 8 page full papers, 4 page short papers and
demos submitted as PDFs in the ACM format. Submitted works will be fully
peer-reviewed by an international Program Committee.



DETAILS (The long version)
INTRODUCTION
The open source IR community has been strong for many years. Early search
engines (such as MG) continue to be used in larger open source projects
(such as Greenstone). More recent open source search engines (such as Apache
Lucene) are used to power the search facilities of some of the largest
technology companies (such as IBM, AOL, and Apple). In the academic
community, such search engines are routinely used to test ranking functions,
compression algorithms, user interfaces, and so on. Open Source IR is now an
essential component of research and commerce. This workshop is providing a
venue for users and authors of open source IR tools to get together and
discuss their joint future.

Of particular interest is how to work together to build OpenSearchLab, an
open source, live and functioning, online web search engine for research
purposes. We believe that the tools to build it mostly exist and by working
together it can be built and that it will transform the future of research
in IR.

TOPICS OF INTEREST
Position papers and posters on open source IR as well as demos of exciting
packages are sought. Topics include, but are not limited to: Software
Engineering; Hardware Engineering; Evaluation; Needs, Desires, and Demos;
Protocols.

The selection process will give extra consideration to papers on search
engines, particularly building a web scale live search engine, however all
aspects of Information Retrieval will be considered.

Software Engineering
Software engineering is not normally discussed at the SIGIR conference, but
this workshop will provide such an opportunity. Writing a scalable search
engine is not a trivial task. The author must consider HTML parsing,
stemming, index compression, relevance feedback, and so on. Designing
maintainable software of this magnitude is outside the skill set of most
graduate students, but there is likely to be consensus on the design. This
workshop will provide a venue to discuss proven designs.

Software maintenance, including the use of: version control; regression
tests; release schedules; bug tracking, and so on are an important aspect of
any software project. IR has some special issues. For example, parallel
indexing may lead to non-deterministic indexes, which lead to regression
test problems. 

Selection and implementation of algorithms and data structures can lead to
significant differences in the performance of large-scale IR systems, yet
retrieval efficiency issues are not well represented at SIGIR. For example,
dynamic pruning techniques such as MaxScore and WAND (which allow efficient
document scoring without decreasing effectiveness at rank K) have
implementation intricacies rarely properly described in the literature.
While "implementation details" are not generally interesting in a research
paper, they are critical when designing and evaluating new experiments.
Discussion on these topics will be sought.

Hardware Engineering
The open source community has several approaches to hardware. The Hadoop
community believes that hardware is inexpensive and easily obtainable. The
smart phone community believes that hardware is expensive and a user will
have only one machine. The cloud community believes that hardware is
infinite an only need be paid for if used. Each philosophy brings a
different approach to the architecture and design of a search engine. Open
debate on these philosophies could result in increased efficiency for cloud
services as well as increased scalability of smart phone services.

Evaluation
There is variation in the performance of the same algorithms implemented in
different systems. It is well understood that two different BM25
implementations are unlikely to produce the same MAP scores, but the
variance is unknown. It is academically important to understand this
variance, and bringing together a number of open source systems that
(purportedly) implement the same algorithms is one way to understand this -
it will also allow us to explore the best way to implement each search
engine component.

Needs, Desires, and Demos
This workshop will provide a venue for members of the community to
demonstrate and discuss their tools and their direction for their tools. It
will allow users to discuss requirements, and for developers and users to
work together on a future for those tools.

Protocols
This is the first time the open source IR community has come together with
the intention of working together and interoperating. To do so requires
standard protocols, and these are solicited. Such protocols include
network-based protocols as well as object interfaces. As an example, the
standard model of a distributed search engine has three components: the
client; brokers; and search engines. The client issues a search request to a
broker, which distributes to several search engines, which search in
parallel. The broker then merges results so that the set of search engines
appear as one. If the communications protocols in this distributed
environment were standardized, then it would be possible to mix-and-match
search engines, brokers, and clients. This would permit parallel development
of systems by those who are expert in each part and it would allow those
with search engines to easily provide a distributed search engine. Moreover,
it would allow for the in-place comparison of different functional
components.

Agreeing on object level interfaces will help increase the interoperability
of source code for standard IR tasks such as stemming, relevance feedback,
ranking, compression, and so on.

OpenSearchLab
Many of the open source IR research tools have been developed in isolation
with separate goals. At the same time many software authors have raised the
issue of open global web search (a common goal). The next step in open
source IR is a fully-functional online web search engine. Such an engine
would provide research opportunities in several areas including: web
crawling, document parsing (including decoration removal), searching, user
interfaces, click-log mining, and so on. This will advance academic research
in search engines beyond the silos of isolated individuals and into a global
community of researchers working together.

PLANNED ACTIVITIES
Submitted Papers, Posters, and Demos will be fully peer-reviewed by an
international Program Committee. The program committee will select
submissions for presentation in the form most appropriate for that
submission. Time will be allocated for presentation by full paper (with
discussion), by demonstration, or by poster discussion.

Discussion on OpenSearchLab will form the bases of one session.

GOALS
We will discuss the issues of Open Source IR in an open forum. This
face-to-face discussion is invaluable when considering the future direction
of the movement. It will provide an opportunity to agree on standards, on
unaddressed issues (gaps), and on ways to share engineering. It will provide
an opportunity for those who use open source IR to work with software
authors. Importantly, it will allow us to work as a community to discuss the
viability of OpenSearchLab and to plan a future.

SCHEDULE
July 2, 2012 Deadline for Paper, Poster, and Demo Submissions 
July 23, 2012 Notification of Acceptance 
July 30, 2012 Deadline for Camera Ready Copies 
August 16, 2012 SIGIR 2012 Workshop on Open Source Information Retrieval 

SUBMISSION DETAILS
Submissions must be original work, not previously published elsewhere, and
not currently submitted to any other conference, workshop, or journal.
Submission of a paper should be regarded as an undertaking that, should the
paper be accepted, at least one of the authors will attend the workshop to
present the work.

PDFs should be submitted by the deadline. Full papers are expected to be 8
pages in length in the ACM format. Posters and demos are expected to be 4
pages in length in the ACM format.

Submitted Papers, Posters, and Demos will be fully peer-reviewed by an
international Program Committee. The selection process will give extra
consideration to papers on search engines, particularly building a web scale
live search engine, however all aspects of Information Retrieval will be
considered.




More information about the Asis-l mailing list