Department of Computer Science,
Room# 3302, 10 King's College Road
Toronto, ON M5S 3G4, Canada
E-MAIL:
WEB:
http://queens.db.toronto.edu/~amit/
AREAS OF INTEREST
Data Cleaning, Fuzzy Matching, Database Systems, Information Retrieval and Data Mining
EDUCATION
2005-2007
Master of Science, Advisor: Prof Nick Koudas
Department of Computer Science,
University of Toronto, Toronto, Canada
2001-2005
Bachelor of Technology, Advisor: Prof Sunita Sarawagi
Department of Computer Science and Engineering,
Indian Institute of Technology, Bombay, INDIA
2001
Senior Secondary School Certificate,
Emmanuel Mission School, Kota, INDIA
PUBLICATIONS Conferences
Matei A. Zaharia, Amit Chandel, Stefan Saroiu, and Srinivasan Keshav, Finding Content in File-Sharing Networks When You Can't Even Spell,
To appear in the Proceedings of the Sixth International Peer-to-Peer Workshop (IPTPS), Bellevue, WA, February 2007.
Amit Chandel, Nick Koudas, Ken Pu and Divesh Srivastava, Fast Identification of Relational Constraint Violations,
To appear in the 23rd IEEE Int'l Conference on Data Engineering (ICDE), 2007.
Technical Reports Batch Top-k Searches on Text Columns, Advisor: Prof. Sunita Sarawagi
Senior Undergraduate Thesis Report, IIT Bombay, May 2004.
Summarizing Tree Structured XML Data Quantitatively
Amit Chandel, Nilesh Bansal, Laks V. S. Lakshmanan and Raymond T. Ng
Technical Report, University of British Columbia, Vancouver, July 2004
Keyword Search in Databases, Advisor: Prof. S. Sudarshan
Junior Undergraduate Thesis Report, IIT Bombay, April 2004.
RESEARCH EXPERIENCE SPIDER, July 2006 - Jan 2007
Advisor: Prof. Nick Koudas, University of Toronto
Data quality is a serious concern in every organization that relies on data.
The quality of data is commonly poor due to a multitude of reasons including,
but not limited to, spelling mistakes, abbreviations, lack of standards and
inconsistent notations.
SPIDER is a declarative data cleaning tool. It incorporates a set of algorithms
that can be used to aid the improvement of data quality on any relational data source
SPIDER can be used for flexible querying, approximate joins, schema matching and
data exploration.
Fast Identification of Relational Constraint Violations, Jan - July 2006
Advisor: Prof. Nick Koudas, University of Toronto
Built and maintained specialized BDD-based
logical indices on the relational tables and described
query re-write rules for efficient utilization of logical indices
to quickly identify the violating relational constraint.
Implemented this approach in C++ on top of Postgres
database using ODBC and tested it on large collections
of real and synthetic data sets.
Efficient Batch Top-k Search for Dictionary-based Entity Recognition, Aug 2004 - Aug 2005
Advisor: Prof. Sunita Sarawagi, IIT Bombay
We consider the problem of speeding up Entity Recognition
systems that exploit existing large databases of structured
entities to improve extraction accuracy. These systems
require the computation of the maximum similarity scores of
several overlapping segments of the input text with the entity
database. We formulate a Batch-Top-K problem with
the goal of sharing computations across overlapping segments.
Our proposed algorithm performs a factor of three
faster than independent Top-K queries and only a factor of
two slower than an unachievable lower bound on total cost.
We then propose a novel modification of the popular Viterbi
algorithm for recognizing entities so as to work with easily
computable bounds on match scores, thereby reducing the
total inference time by a factor of eight compared to stateof-
the-art methods.
Data Integration from Web-Pages, Feb 2005 - Apr 2005
Advisor: Prof. Soumen Chakrabarti, IIT Bombay
Designed a technique to extract publication
entries from web-pages and storing these entries
into a structured database. The creation of structured database is performed in two steps: first
step identifies individual publication entry and second step performs fine grained information
extraction. For the first step, we implemented a classifier on DOM nodes, while for the second
step we implemented an efficient inference algorithm using A* technique.
Network Intrusion Detection using Stide Methodology, Feb 2005 - Apr 2005
Advisor: Prof. Sunita Sarawagi, IIT Bombay
Designed an intelligent system to automatically detect possible
events of network intrusion. The system monitored network logs
generated by tcpdump (per-packet activity) for anomalies and
raised a flag whenever observed behavior deviated significantly from normal.
We employed stide-methodology for classifying,
where we used sequences of consecutive log-records (over a sliding window of fixed size) to
represent activity. The basic approach is to construct a normal dictionary from data collected
when there was no intrusion. This dictionary is used to compute anomaly count of incoming
log-data. Stide-methodology has been previously shown to be effective in system intrusion detection
problems. We proposed a novel encoding scheme for sequences
in network activity log that enabled us to use same technique in this domain as well.
Our results were verified by experimenting on real world datasets.
Summarizing Tree Structured XML Data Quantitatively, May - Nov 2004
Advisor: Prof. Laks Lakshmanan and Prof. Raymond Ng, UBC, Vancouver
We have developed an algorithm for constructing a summary of an XML
document to discover the structural aspect of its schema, and to use
the summary for other tasks like - query result size estimation,
structural compression and exploration. The summary is capable of preserving
various kinds of quantificational information, which can be used to extract
knowledge on number of edges or paths following a certain label pattern.
Managing Database Snapshots in Mobile Environment, Aug - Nov 2004
Advisor: Prof. Krithi Ramamritham, IIT Bombay
We designed methods and tools to assist the building of database applications
to be used on mobile devices keeping in view their frequent communication
breakdowns. The key idea is to maintain partial weakly consistent view of the
central database on the mobile device during disconnectivity and
synchronize the data when the connection is available.
TALKS AND SEMINARS Keyword Search in Databases, April 2004
Dept. of CSE, IIT Bombay
Presented an in-depth study of various systems like DBXplorer, DISCOVER,
DTL's DataSpot, BanKS and XRank, which enable keyword search over relational
databases highlighting important features like relavence ranking and proximity
search.
Stock Market Prediction Using Neural Networks, March 2004
Dept. of CSE, IIT Bombay
Presented a talk to demonstrate the use of Back Propagation Neural Networks
for stock market prediction with an overview of Back Propagation NN and the
design of the prediction model.
SELECTED PROJECTS IITB Navigator, Aug - Nov 2003
Advisor: Prof. S. Sudarshan, IIT Bombay
Developed a GUI with web front-end to locate different people, places and
locations of various ongoing events in a region, showing the shortest path to
the destination on a map.
CMS: Course Management System, May - Dec 2003
Advisor: Prof. S. Sudarshan, IIT Bombay
Provided a common web based interface between instructors, students and teaching
assistants in an institute for doing mundane tasks such as giving and submitting
assignments, assigning projects and demo scheduling, course information, notices,
grading and messages. Implemented using servlets, JDBC, SQL and Java.
ACADEMIC HONORS
* Selected for the University Fellowship Program at University of
Toronto, Toronto (Sep. 2005).
* Selected for the summer internship program at the
University of British Columbia, Vancouver, BC, Canada (May 2004).
* Awarded the Institute Scholarship for Academic Excellence (2001)
by Indian Institute of Technology, Bombay.