Workshop on Frequent Itemset Mining Implementations (FIMI'04)

November 1, 2004, Brighton, UK
in conjunction with ICDM'04

FIMI Award

The FIMI'04 'diapers and beer' best implementation award was granted to Takeaki Uno, Masashi Kiyomi and Hiroki Arimura for their LCM implementation described in "LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets".

Proceedings

The proceedings are published online in the CEUR Workshop Proceedings.

Scope and Objectives

Frequent itemset mining (FIM) is a core problem in many data mining tasks, and varied approaches to the problem appear in numerous papers across all data mining conferences. While the problem was introduced in the context of market basket analysis, the scope of the problem is much broader. Generally speaking, the problem involves the identification of items, products, symptoms, characteristics, and so forth, that often occur together in a given dataset. As a fundamental operation in data mining, algorithms for FIM can be used as a building block for other, more sophisticated data mining processes.

The first Frequent Itemset Mining Implementations workshop (FIMI) , held at ICDM-2003 , provided many new and surprising insights. Many of these insights, coupled with the online availability of all source code for every participating implementation, have inspired several followup investigations. Therefore, we envisage that this second edition of this successful workshop will provide further insight into real problems related to the FIM task.

Call for Implementations

Submissions consist of code implementing any or all of the following three main tasks:

all frequent itemset mining,
closed frequent itemset mining, and
maximal frequent itemset mining.

In addition to the implementations, each submission must include a paper that describes the implemented algorithms, and provides a performance study on publicly provided datasets. Each paper should also provide a qualitative explanation of why the submitted algorithm performs well when compared to other known approaches.

The submissions will be tested independently by the co-chairs and other members of the organizing committee. All submissions will also be tested on test datasets which will not be made public until after all submissions have been received.

The workshop participants will be required to come and discuss the submissions; there will be a heavy focus on critical evaluation, i.e., what are the limitations, under what conditions does the algorithm work well, why it fails in other cases, and what are the open areas. One outcome of the workshop will be to outline the focus for research on new problems in the field.

The conditions for acceptance of the submissions will consist of a correct implementation for the given task along with either of the two criteria: (1) an efficient implementation compared with other submissions in the same category, or (2) a submission that provides new insight into the FIM problem. The idea is to highlight both successful and unsuccessful but interesting ideas.

Source code that is accepted will be made publicly available (via a web link or source code) on the FIMI repository (with flexible licensing). Each implementation should adhere to the following rules.

Call for Datasets

The data mining community unfortunately lacks publicly available real life datasets which can be used for benchmarking purposes. Each accepted dataset submission is allowed a one page description in the workshop proceedings.

Submission Guidelines

All submissions should be sent electronically to fimi@cs.helsinki.fi. The email should contain exactly 2 files:

the tar or zip file of the entire source code directory (including the Makefile), and
the accompanying paper describing the implementation.

The body of the email should contain the title of the paper and the name of the implemented algorithm, the list of authors with their respective affiliations and email-addresses.

Each implementation should adhere to the following rules.

The accompanying paper should be in Pdf or Postscript format only, not exceeding 25 pages, double spaced, 12pt font, including all figures, tables and references.

Important dates

Submission Deadline: September 3, 2004
Notification: October 4, 2004
Camera-ready Copies: October 11, 2004
Workshop date: November 1, 2004

Workshop Committee

Program co-chairs

Roberto Bayardo, IBM Almaden, USA
Bart Goethals, Helsinki Institute for Information Technology, Finland
Mohammed J. Zaki, Rensselaer Polytechnic Institute, USA

Program Committee

Charu Aggarwal, IBM Watson, USA
Johannes Gehrke, Cornell University, USA
Jiawei Han, University of Illinois at Urbana-Champaign, USA
Ramakrishnan Srikant, IBM Almaden, USA
Hannu Toivonen, University of Helsinki, Finland