|
William Cohen, Carnegie Mellon University
|
A Framework for Learning to Query Heterogeneous Data Machine Learning Department, Carnegie Mellon University
Abstract:
A long-term goal of research on data integration is to develop data
models and query languages that make it easy to answer structured
queries using heterogeneous data. In this talk, I will describe a
very simple query language, based on typed similarity queries, which
are answered based on a graph containing a heterogeneous mixture of
textual and non-textual objects. The similarity metric proposed is
based on a lazy graph walk, which can be approximated efficiently
using methods related to particle filtering. Machine learning
techniques can be used to improve this metric for specific tasks,
often leading to performance far better than plausible task-specific
baseline methods. We experimentally evaluate several classes of
similarity queries from the domains of analysis of biomedical text and
personal information management: for instance, in one set of
experiments, a user's personal information is represented as a graph
containing messages, calendar information, social network information,
and a timeline, and similarity search is used to find people likely to
attend a meeting.
This is joint work with Einat Minkov and Andrew Ng.
|
|
Short Bio:
William Cohen received his bachelor's degree in Computer Science from
Duke University in 1984, and a PhD in Computer Science from Rutgers
University in 1990. From 1990 to 2000 Dr. Cohen worked at AT&T Bell
Labs and later AT&T Labs-Research, and from April 2000 to May 2002 Dr.
Cohen worked at Whizbang Labs, a company specializing in extracting
information from the web. Dr. Cohen is member of the board of the
International Machine Learning Society, and has served as an action
editor for the Journal of Machine Learning Research, the journal
Machine Learning and the Journal of Artificial Intelligence Research.
He co-organized the 1994 International Machine Learning Conference, is
the co-Program Committee Chair for the 2006 International Machine
Learning Conference, and has served on more than 20 program committees
or advisory committees.
|
|
Dr. Cohen's research interests include information integration and
machine learning, particularly information extraction, text
categorization and learning from large datasets. He holds seven
patents related to learning, discovery, information retrieval, and
data integration, and is the author of more than 100 publications.
|