Keynote Speakers

 

Surajit Chaudhuri, Microsoft Research


Data Debugger: An Operator-Centric Approach
Microsoft Research

Abstract:
Data Cleaning aids data analysis and thus compliments downstream analysis using Data Mining and SQL Query processing. In this talk, I will try to summarize past developments in data cleaning and give my personal perspective on the state of the art this far, especially on implications on the data analysis infrastructure (e.g., Data Mining and SQL query processing had rather different impact on the data analysis architecture). In this context, I will describe the direction we are now pursuing in the Data Debugger project at Microsoft Research which is closely aligned with the query processing paradigm.




William Cohen, Carnegie Mellon University


A Framework for Learning to Query Heterogeneous Data
Machine Learning Department, Carnegie Mellon University

Abstract:
A long-term goal of research on data integration is to develop data models and query languages that make it easy to answer structured queries using heterogeneous data. In this talk, I will describe a very simple query language, based on typed similarity queries, which are answered based on a graph containing a heterogeneous mixture of textual and non-textual objects. The similarity metric proposed is based on a lazy graph walk, which can be approximated efficiently using methods related to particle filtering. Machine learning techniques can be used to improve this metric for specific tasks, often leading to performance far better than plausible task-specific baseline methods. We experimentally evaluate several classes of similarity queries from the domains of analysis of biomedical text and personal information management: for instance, in one set of experiments, a user's personal information is represented as a graph containing messages, calendar information, social network information, and a timeline, and similarity search is used to find people likely to attend a meeting.

This is joint work with Einat Minkov and Andrew Ng.

Short Bio:
William Cohen received his bachelor's degree in Computer Science from Duke University in 1984, and a PhD in Computer Science from Rutgers University in 1990. From 1990 to 2000 Dr. Cohen worked at AT&T Bell Labs and later AT&T Labs-Research, and from April 2000 to May 2002 Dr. Cohen worked at Whizbang Labs, a company specializing in extracting information from the web. Dr. Cohen is member of the board of the International Machine Learning Society, and has served as an action editor for the Journal of Machine Learning Research, the journal Machine Learning and the Journal of Artificial Intelligence Research. He co-organized the 1994 International Machine Learning Conference, is the co-Program Committee Chair for the 2006 International Machine Learning Conference, and has served on more than 20 program committees or advisory committees.

Dr. Cohen's research interests include information integration and machine learning, particularly information extraction, text categorization and learning from large datasets. He holds seven patents related to learning, discovery, information retrieval, and data integration, and is the author of more than 100 publications.