• Publications
  • Influence
NAGA: Searching and Ranking Knowledge
TLDR
We propose NAGA, a new semantic search engine based on a knowledge base, which is organized as a graph with typed edges, and consists of millions of entities extracted from Web-based corpora. Expand
  • 258
  • 25
  • PDF
Combining linguistic and statistical analysis to extract relations from web documents
TLDR
The World Wide Web provides a nearly endless source of knowledge, which is mostly given in natural language. Expand
  • 198
  • 5
  • PDF
Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering
TLDR
We present a method for topic detection in Twitter streams, based on aggressive tweet/term filtering and two stage hierarchical clustering, first of tweets and second of resulting headlines. Expand
  • 91
  • 5
  • PDF
The Bag-of-Opinions Method for Review Rating Prediction from Sparse Text Patterns
TLDR
The problem addressed in this paper is to predict a user's numeric rating in a product review from the text of the review. Expand
  • 172
  • 4
  • PDF
Time Series Classification by Sequence Learning in All-Subsequence Space
TLDR
We propose new structure-based time series classification methods that are built on the popular SAX transformation and two new adaptations of an efficient linear sequence classifier, SEQL. Expand
  • 20
  • 4
  • PDF
NAGA: harvesting, searching and ranking knowledge
TLDR
The presence of encyclopedic Web sources, such as Wikipedia, the Internet Movie Database (IMDB), World Factbook, etc. calls for new querying techniques that are simple and yet more expressive than those provided by standard keyword-based search engines. Expand
  • 49
  • 3
  • PDF
Fast logistic regression for text categorization with variable-length n-grams
TLDR
A common representation used in text categorization is the bag of words model (aka. unigram model). Expand
  • 69
  • 3
  • PDF
Bounded coordinate-descent for biological sequence classification in high dimensional predictor space
TLDR
We present a framework for discriminative sequence classification where linear classifiers work directly in the explicit high-dimensional predictor space of all subsequences in the training set (as opposed to kernel-induced spaces). Expand
  • 28
  • 3
  • PDF
LEILA: Learning to Extract Information by Linguistic Analysis
TLDR
We present L EILA, a system that can extract instances of arbitrary given binary relations from natural language Web documents without human interaction. Expand
  • 77
  • 2
  • PDF
Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations
TLDR
The time series classification literature has expanded rapidly over the last decade, with many new classification approaches published each year. Expand
  • 20
  • 2
  • PDF
...
1
2
3
4
5
...