• Publications
  • Influence
Snowball: extracting relations from large plain-text collections
TLDR
We build on this idea and present our Snowball system. Expand
Efficient IR-Style Keyword Search over Relational Databases
TLDR
We develop query-processing strategies that build on a crucial characteristic of IR-style keyword search: only the few most relevant matches -according to some definition of "relevance"- are generally of interest. Expand
Approximate String Joins in a Database (Almost) for Free
TLDR
We develop a technique for building approximate string join capabilities on top of commercial databases by exploiting facilities already available in them. Expand
Beyond Trending Topics: Real-World Event Identification on Twitter
TLDR
We identify real-world event content on Twitter using a rich family of aggregatestatistics of topically similar message clusters. Expand
STHoles: a multidimensional workload-aware histogram
TLDR
We introduce STHoles, a “workload-aware” histogram that allows bucket nesting to capture data regions with reasonably uniform tuple density, which leads to accurate query selectivity estimations. Expand
k-Shape: Efficient and Accurate Clustering of Time Series
TLDR
We propose k-Shape, a novel algorithm for time-series clustering, which is domain-independent, highly accurate, and efficient clustering approach for time series with broad applications. Expand
Learning similarity metrics for event identification in social media
TLDR
Social media sites (e.g., Flickr, YouTube, and Facebook) are a popular distribution outlet for users looking to share their experiences and interests on the Web. Expand
k-Shape: Efficient and Accurate Clustering of Time Series
TLDR
The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Expand
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
TLDR
We present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. Expand
Top-k selection queries over relational databases: Mapping strategies and performance evaluation
TLDR
We study the advantages and limitations of processing a top-k query by translating it into a single range query that a traditional database management system (RDBMS) can process efficiently. Expand
...
1
2
3
4
5
...