My research interests are Database Systems, Information Retrieval and Data Mining in general and
data cleaning, fuzzy matching and keyword search over databases in particular.
Data Cleaning:
In a data warehouse, collection of data over various independent data-sources most often introduce
duplicate entries for the same entity due to inconsistent conventions used by the data-sources.
For example, a student's name is present as "Amit Chandel" in the department's database where
as it is "Chandel, Amit" in the university's database.
Another source of inconsistency is the data entry mistake.
For example, "Amit Chandel" can be entered as "Ameet Chandel" due to entry mistake.
These errors and inconsistencies are undesirable for decision support application, and hence
they need to be detected and corrected.
The process of cleaning dirty data is often referred to as data cleaning. The data-entry mistakes
are detected using fuzzy matching. Developing robust data cleaning technique is challenging since
the errors and inconsistencies are domain-specific.
Amit Chandel, Nick Koudas, Ken Pu and Divesh Srivastava, Fast Identification of Relational Constraint Violations,
Submitted to the 23rd IEEE Int'l Conference on Data Engineering (ICDE), 2007.
Technical Reports Batch Top-k Searches on Text Columns, Advisor: Prof. Sunita Sarawagi
Senior Undergraduate Thesis Report, IIT Bombay, May 2004.
Keyword Search in Databases, Advisor: Prof. S. Sudarshan
Junior Undergraduate Thesis Report, IIT Bombay, April 2004.