Share This Author
Snowball: extracting relations from large plain-text collections
This paper develops a scalable evaluation methodology and metrics for the task, and presents a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents.
Efficient IR-Style Keyword Search over Relational Databases
Approximate String Joins in a Database (Almost) for Free
- L. Gravano, Panagiotis G. Ipeirotis, H. Jagadish, N. Koudas, S. Muthukrishnan, D. Srivastava
- Computer ScienceVLDB
- 11 September 2001
This paper develops a technique for building approximate string join capabilities on top of commercial databases by exploiting facilities already available in them, and demonstrates experimentally the benefits of the technique over the direct use of UDFs.
Beyond Trending Topics: Real-World Event Identification on Twitter
This paper explores approaches for analyzing the stream of Twitter messages to distinguish between messages about real-world events and non-event messages, and relies on a rich family of aggregatestatistics of topically similar message clusters.
k-Shape: Efficient and Accurate Clustering of Time Series
The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data mining…
STHoles: a multidimensional workload-aware histogram
STHoles is introduced, a “workload-aware” histogram that allows bucket nesting to capture data regions with reasonably uniform tuple density and outperform the best multidimensional histogram techniques that require access to and processing of the full data sets during histogram construction.
Learning similarity metrics for event identification in social media
A variety of techniques for learning multi-feature similarity metrics for social media documents in a principled manner are explored and evaluation results suggest that the approach identifies events more effectively than the state-of-the-art strategies on which they are built.
Top-k selection queries over relational databases: Mapping strategies and performance evaluation
This paper studies how to determine a range query to evaluate a top-k query by exploiting the statistics available to an RDBMS, and the impact of the quality of these statistics on the retrieval efficiency of the resulting scheme.
GlOSS: text-source discovery over the Internet
This article describes GlOSS, Glossary of Servers Server, with two versions: bGloss, which provides a Boolean query retrieval model, and vGlOSS, which providing a vector-space retrieval model and extensively describes the methodology for measuring the retrieval effectiveness of these systems.
Answering General Time-Sensitive Queries
- Wisam Dakka, L. Gravano, Panagiotis G. Ipeirotis
- Computer ScienceIEEE Transactions on Knowledge and Data…
- 26 October 2008
This paper proposes a more general framework for handling time-sensitive queries and automatically identifies the important time intervals that are likely to be of interest for a query and builds scoring techniques that seamlessly integrate the temporal aspect into the overall ranking mechanism.