Albert Angel
 
Contact:
Office: #5160 Bahen Centre for Information Technology
40 St.George Str., Toronto, ON M5S 2E4, Canada
 
 
About
About

I am currently a PhD Candidate at the Department of Computer Science at the University of Toronto, working with Prof. Nick Koudas.

My main research interests lie in the intersection of Database Systems and Information Retrieval. I am partial to problems dealing with ranking structured information, in the context of user generated content, such as:

  • Text mining
  • Keyword search on structured (graph) data
  • Diversified information retrieval
  • Top-k query processing

To find out more, keep on reading... you can also have a look at my résumé

Research
Research

Motivation

Using services such as blogs, micro-blogs, and online social-networks, people around the globe generate millions of documents (posts, articles, status updates, etc.) every day.

This user-generated content is more than just a massive stream of text. On many dimensions, it embeds a significant amount of structural information, such as social links, mentions of real-world entities (people, locations, products, etc.), user profile information, etc., giving rise to large graph structures.

Exploiting this structure, as opposed to considering the textual aspect of user-generated content alone, enables valuable applications, as diverse as:

  • Real-time news identification
  • Understanding the zeitgeist (e.g. which current events is a target demographic group most engaging with ?)
  • Entity or product search, based on public opinions thereof (e.g. which hotels are considered ideal for family vacations?)

and so on. At the core of such computations, is the necessity of efficiently and effectively prioritizing/ranking information.

Research

My research focuses on devising efficient algorithms to enable such computations, i.e. ranking in the context of user-generated content.

Existing algorithmic techniques, such as the Threshold Algorithm, can be adapted to the structured setting of user-generated content, by interleaving richer computations into the algorithm's core.

At the same time, traditional assumptions need to be re-examined, such as how to schedule data accesses for optimal performance.

In my work, I strive for balance between designing novel algorithms, analytically exploring their properties, and experimentally validating them.

Problems I have been dealing with include the following:

Grapevine
What's on the Grapevine?

Grapevine is a system that conducts large scale data analysis on the social media collective, extracting information in real time. The goal of Grapevine is to distill information and provide insights, by capturing popular trends as they emerge.

Grapevine facilitates the interactive exploration of content, allowing users to discover interesting or surprising stories, optionally narrowed down on a specific demographic of interest (e.g. "What are Torontonian teens talking about on blogs?"). Stories of interest can be explored in a variety of ways, such as modifying their scope (e.g. "How is Barack Obama related to this story?"), viewing related content (blog posts, news, videos, etc.), and examining their temporal evolution.

Grapevine, currently live at www.onthegrapevine.ca, was developed in collaboration with fellow graduate student Nikos Sarkas. Supporting this functionality has led us to consider exciting research questions, such as

  • How to efficiently identify high-impact stories, across all demographics, in real time
  • How to efficiently present a diverse set thereof to the user
  • How to understand and present the temporal evolution of a story
  • How to efficiently and effectively provide real-time trends to the user, exploiting named entities, and any hierarchical information they carry

Publications

What's on the Grapevine ? [Demo Paper], in SIGMOD 2009
Albert Angel, Nick Koudas, Nikos Sarkas, Divesh Srivastava

Efficient Identification of Coupled Entities in Document Collections, to appear in ICDE 2010
Nikos Sarkas, Albert Angel, Nick Koudas, Divesh Srivastava

More works under submission, and in progress.

Other
Other ranking problems
I have also worked on other problems in the context of ranking (structured) user-generated information; to name a couple:
  • Social network update streams can be mined for valuable nuggets of information concerning people, products or events of interest. Combining such facts can yield new, useful knowledge, the quality of which depends heavily on the sources of each fact. Efficiency is a prime concern, given the scale of the problem. We proposed an efficient algorithm for querying this type of information, and characterized the instances where our techniques yield the greatest benefit, both analytically and experimentally.
  • Aggregating user-generated information can also yield important insights; interestingly, queries for finding "packages" of entities (e.g. holiday packages), based on imprecise, textual user requirements (e.g. "I want a hotel that is ideal for elderly couples"). We examined this general class of entity package finder queries, and derived a novel, efficient algorithm for answering them.

Publications

Ranking Objects Based on Relationships and Fixed Associations [Paper | Tech. Report ], in EDBT 2009
Albert Angel, Surajit Chaudhuri, Gautam Das, Nick Koudas

other works under submission

Publications
Publications

Chronological list of publications (see also DBLP)

Conferences

Efficient Identification of Coupled Entities in Document Collections, to appear in ICDE 2010
Nikos Sarkas, Albert Angel, Nick Koudas, Divesh Srivastava

Ranking Objects Based on Relationships and Fixed Associations [Paper | Tech. Report ], in EDBT 2009
Albert Angel, Surajit Chaudhuri, Gautam Das, Nick Koudas

What's on the Grapevine ? [Demo Paper], in SIGMOD 2009
Albert Angel, Nick Koudas, Nikos Sarkas, Divesh Srivastava

Qualitative Geocoding of Persistent Web Pages [Paper], in ACM GIS 2008
Albert Angel, Alexandros Efentakis, Chara Lontou, Dieter Pfoser

More coming soon... (currently under submission)

Theses

MSc Research Paper (in lieu of MSc Thesis), University of Toronto
Supervisor: Prof. Nick Koudas
Available upon request (currently under submission)

Geographic Information Extraction from Text [abstract | full text (in Greek)]
Diploma Thesis, National Technical University of Athens
Supervisor: Prof. Timos Sellis. Co-supervisor: Dieter Pfoser

More
More
In the context of my graduate studies, I have also taken a number of courses, and assisted in the teaching of others. I have strived to balance my research with my personal life, and have been fortunate to have a number of great colleagues.
 
The views expressed herein are solely those of the author, and do not necessarily reflect the views of the University of Toronto, or the Department of Computer Science thereof.
Last Updated: Dec, 2009