Tutorial on Web mining at CIKM 2008

Tue Aug 19 14:08:08 EDT 2008

Call for participation:

A half-day tutorial on October 26 P.M. on
Web search log analysis and user behavior modeling
by Peiling Wang, Lei Wu, Dietmar Wolfram

at the ACM Seventeenth Conference on Information and Knowledge Management CIKM 
2008
(October 26-30, 2008, Napa Valley Marriott Hotel & Spa, California, USA)

Abstract:
Web search logs capture valuable user-generated data as users naturally search 
the Website. These log data can reveal what users were searching for and how 
they searched. However, despite rich and informative, these transactional log 
records are unstructured and messy. The current IR strategies for handling 
structured documents (e.g., tf-idf, vector space) are not readily applicable to 
studying user query log data. The query corpora include large amount of search 
formulations that are short linguistic expressions, which reflects how the 
majority of the users interact with the Web). With server-side logs, search 
session boundaries are undefined, which makes individual search sessions 
difficult to identify. Even though individual users can be identified in an 
intranet environment or using client-side logs, identifying individual search 
sessions remains a big challenge. Using data mining strategies and 
technologies, we can process data once into a data model that is simple and 
uniformed to allow intensive exploration. We can explore the data in different 
ways to build models of Web search behaviors. However, current data mining 
tools developed for business applications do not apply to transactional query 
logs. Transforming unstructured log data into a relational database for mining 
requires a deep understanding of both IR and data mining. In addition, 
innovative tools must be developed to support ongoing analysis because new 
questions often emerge when the current hypotheses are being studied. Although 
the literature on Web transaction log analysis is growing fast over the past 
decade, the published research works, with few exceptions, tend to focus on 
presenting analytical results with insufficient coverage of technical details 
to enable later researchers to duplicate the study using the same data or 
different data. Many tools developed by individual projects are not shared 
outside of its research context. This gap must be filled so that findings can 
endure cross-examination and can be systematically compared.

This tutorial is built on research that the instructors have conducted on 
studying Web search behaviors over the past decade. Through a series of 
programmed intensive research projects of analyzing large amount of 
transactional logs from different search environments, one of which is 
supported by a National Leadership on Research grant from the Institute for 
Museum and Library Services (IMLS), the instructors have gained in-depth 
knowledge of and insight into Web search behaviors and unique experiences and 
skills on processing and analyzing large Web search transactional logs to model 
these behaviors. This tutorial will teach the algorithms and technical 
implementations that participants can use for their own research design and for 
Web applications that incorporate user observation and effective search 
support.

**Early registration deadline closes 22 August 2008**
**Conference hotel is filling up - BOOK NOW**

Since 1992, the ACM Conference on Information and Knowledge Management
(CIKM) has successfully brought together leading researchers and
developers from the database, information retrieval, and knowledge
management communities. The purpose of the conference is to identify
challenging problems facing the development of future knowledge and
information systems, and to shape future research directions through the
publication of high quality, applied and theoretical research findings.
In CIKM 2008, we will continue the tradition of promoting collaboration
among multiple areas. CIKM 2008 topics are in the broad areas of
Databases, Information Retrieval, and Knowledge Management. CIKM 2008
also includes an Industry track.

WEBSITE:  http://www.cikm2008.org/