Amit Chandel



home
about me
academics
projects
resume
research
pics
miscellaneous
contact



Projects
Research Projects

SPIDER, (July 2006 - Present)
Data quality is a serious concern in every organization that relies on data. The quality of data is commonly poor due to a multitude of reasons including, but not limited to, spelling mistakes, abbreviations, lack of standards and inconsistent notations.
SPIDER is a declarative data cleaning tool. It incorporates a set of algorithms that can be used to aid the improvement of data quality on any relational data source SPIDER can be used for flexible querying, approximate joins, schema matching and data exploration.
Advisor: Prof. Nick Koudas, University of Toronto

Fast Identification of Relational Constraint Violations, (Jan - July 2006)
Logical constraints, (e.g., phone numbers in toronto can have prefixes 416, 647, 905 only), are ubiquitous in relational databases. Traditional integrity constraints ,such as functional dependencies, are examples of such logical constraints as well. However, under frequent database updates, schema evolution and transformations, they can be easily violated. As a result, tables become inconsistent and data quality is degraded.
We study the problem of validating collections of user defined constraints on a number of relational tables. Our primary goal is to quickly identify which tables violate such constraints. Logical constraints are potentially complex logical formuli, and we demonstrate that they cannot be efficiently evaluated by SQL queries. In order to enable fast identification of constraint violations, we propose to build and maintain specialized logical indices on the relational tables. We choose Boolean Decision Diagrams (BDD) as the index structure to aid in this task. We first propose efficient algorithms to construct and maintain such indices in a space efficient manner. We then describe a set of query re-write rules that aid in the efficient utilization of logical indices during constraint validation.
We have implemented our approach on top of a relational database and tested our techniques using large collections of real and synthetic data sets. Our results indicate that utilizing our techniques in conjunction with logical indices during constraint validation offers very significant performance advantages.
Advisor: Prof. Nick Koudas, University of Toronto

Efficient Batch Top-k Search for Dictionary-based Entity Recognition, (Aug 2004 - Aug 2005)
We consider the problem of speeding up Entity Recognition systems that exploit existing large databases of structured entities to improve extraction accuracy. These systems require the computation of the maximum similarity scores of several overlapping segments of the input text with the entity database. We formulate a Batch-Top-K problem with the goal of sharing computations across overlapping segments. Our proposed algorithm performs a factor of three faster than independent Top-K queries and only a factor of two slower than an unachievable lower bound on total cost. We then propose a novel modification of the popular Viterbi algorithm for recognizing entities so as to work with easily computable bounds on match scores, thereby reducing the total inference time by a factor of eight compared to stateof- the-art methods.
Advisor: Prof. Sunita Sarawagi, IIT Bombay

Data Integration from Web-Pages, (Feb-Apr, 2005)
Designed a technique to extract publication entries from web-pages and storing these entries into a structured database. The creation of structured database is performed in two steps: first step identifies individual publication entry and second step performs fine grained information extraction. For the first step, we implemented a classifier on DOM nodes, while for the second step we implemented an efficient inference algorithm using A* technique.
Advisor: Prof. Soumen Chakrabarti, IIT Bombay

Network Intrusion Detection using Stide Methodology, (Feb-Apr, 2005)
Designed an intelligent system to automatically detect possible events of network intrusion. The system monitored network logs generated by tcpdump (per-packet activity) for anomalies and raised a flag whenever observed behavior deviated significantly from normal. We employed stide-methodology for classifying, where we used sequences of consecutive log-records (over a sliding window of fixed size) to represent activity. The basic approach is to construct a normal dictionary from data collected when there was no intrusion. This dictionary is used to compute anomaly count of incoming log-data. Stide-methodology has been previously shown to be effective in system intrusion detection problems. We proposed a novel encoding scheme for sequences in network activity log that enabled us to use same technique in this domain as well. Our results were verified by experimenting on real world datasets.
Advisor: Prof. Sunita Sarawagi, IIT Bombay

Summarizing Tree Structured XML Data Quantitatively, (May-Nov 2004)
Developed an algorithm for constructing a summary of an XML document to discover the structural aspect of its schema, and to use the summary for other tasks like - query result size estimation, structural compression and exploration. The summary is capable of preserving various kinds of quantificational information, which can be used to extract knowledge on number of edges or paths following a certain label pattern.
Advisor: Prof. Laks Lakshmanan and Prof. Raymond Ng, UBC, Vancouver

Managing Database Snapshots in Mobile Environment,(Aug - Nov 2004)
Designed methods and tools to assist the building of database applications to be used on mobile devices keeping in view their frequent communication breakdowns. The key idea is to maintain partial weakly consistent view of the central database on the mobile device during disconnectivity and synchronize the data when the connection is available.
Advisor: Prof. Krithi Ramamritham, IIT Bombay


Development Projects

IITB Navigator, (Aug-Nov, 2003):
Developed a GUI with web front-end to locate different people, places and locations of various ongoing events in a region, showing the shortest path to the destination on a map.
Advisor: Prof. S. Sudarshan, IIT Bombay

CMS: Course Management System, (May-Dec, 2003):
Provided a common web based interface between instructors, students and teaching assistants in an institute for doing mundane tasks such as giving and submitting assignments, assigning projects and demo scheduling, course information, notices, grading and messages. Implemented using servlets, JDBC, SQL and Java.
Advisor: Prof. S. Sudarshan, IIT Bombay


Other Projects
Cricket Animation, Oct. 2005 - Dec 2005
CSC2504H Computer Graphics Course Project, University of Toronto

We designed an animation for a cricket match between India and Australia using OpenGL. The main attraction of the animation was beautiful Toronto city, the cricket stadium with a lots of people and the lighting effects. It consisted of various complex 3D objects, which were designed from scratch using OpenGL. Object oriented C++ design was used to provide many interactive functionalities to assist the modeling. This project won the second prize in the Wooden Monkey Hall of Fame, Fall 2005.

Moving Object Segmentation To Optimize Video Transmission, (Feb-Apr, 2004):
Implemented a background registration technique to segment a given video stream spatially, that is to separate the foreground region (moving objects) from the background region. Such techniques are useful for the applications like video conferencing where the camera is stationary and so is the background, and hence only the speaker's face needs to be transmitted.
Advisor: Prof. S. Arunkumar

VNC-Server, (Feb-Apr, 2003):
Modified the vnc code (version 3.3.7) to add new functionalities, remove existing bugs, support sound export on remote desktop. The pixel depth was also modified. A GUI was provided to make it user friendly. IP restriction on vnc-server was another achievement of the project.
Advisor: Prof. G. Sivakumar, IIT Bombay

Train Scheduling Optimization, (Feb-Apr, 2003):
Simulated a train-network having several tracks and stations on a FPGA kit, with the aim to move the trains in such a way that each train should pick up the maximum number of passengers on the route. It was programmed in VHDL.
Advisor: Prof. M. R. Bhujade, IIT Bombay

Mail application, (Feb-Mar, 2003):
Designed mail application using sun and java packages.
Advisor: Prof. G. Sivakumar, IIT Bombay

Image-Compression, (July-Nov, 2002):
Implemented a image compression algorithm using C++ and achieved an efficient encoding scheme thereby compressing a ppm image to its one-fourth. Decompression was also very efficient as per the need of the client.
Advisor: Prof. S. Arunkumar, IIT Bombay

Electrical Circuit Analysis, (Feb-Apr, 2002):
Designed and implemented a technique in "scheme" to analyse a resistive circuit, i.e. calculating all the circuit variables at a given time and to draw graphs relating these variables.
Advisor: Prof. Abhiram Ranade, IIT Bombay

ScoreBoard Maintainence, (July-Nov, 2001):
Implemented a "fortran" application to maintain scoreboard for an ongoing cricket match. For each player, it keeps track of individual statistics of runs, balls etc. In second innings it also displayed requirements to win the match.
Advisor: Prof. Ajit Deewan, IIT Bombay

2006 Amit Chandel