Motivation
Using services such as blogs, micro-blogs, and online social-networks, people around the globe generate millions of documents (posts, articles, status updates, etc.) every day.
This user-generated content is more than just a massive stream of text. On many dimensions, it embeds a significant amount of structural information, such as social links, mentions of real-world entities (people, locations, products, etc.), user profile information, etc., giving rise to large graph structures.
Exploiting this structure, as opposed to considering the textual aspect of user-generated content alone, enables valuable applications, as diverse as:
- Real-time news identification
- Understanding the zeitgeist (e.g. which current events is a target demographic group most engaging with ?)
- Entity or product search, based on public opinions thereof (e.g. which hotels are considered ideal for family vacations?)
and so on. At the core of such computations, is the necessity of efficiently and effectively prioritizing/ranking information.
Research
My research focuses on devising efficient algorithms to enable such computations, i.e. ranking in the context of user-generated content.
Existing algorithmic techniques, such as the Threshold Algorithm, can be adapted to the structured setting of user-generated content, by interleaving richer computations into the algorithm's core.
At the same time, traditional assumptions need to be re-examined, such as how to schedule data accesses for optimal performance.
In my work, I strive for balance between designing novel algorithms, analytically exploring their properties, and experimentally validating them.
Problems I have been dealing with include the following: