• Publications
  • Influence
Unsupervised named-entity extraction from the Web: An experimental study
TLDR
An overview of KnowItAll's novel architecture and design principles is presented, emphasizing its distinctive ability to extract information without any hand-labeled training examples, and three distinct ways to address this challenge are presented and evaluated. Expand
Web-scale information extraction in knowitall: (preliminary results)
TLDR
KnowItAll, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner, is introduced. Expand
Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
TLDR
A semantic parser for Freebase is developed based on a reduction to standard supervised training algorithms, schema matching, and pattern learning that is capable of parsing questions with an F1 that improves by 0.42 over a purely-supervised learning algorithm. Expand
TextRunner: Open Information Extraction on the Web
TLDR
The TextRunner system demonstrates a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and extracts a large set of relational tuples, without requiring any human input. Expand
Re-ranking for joint named-entity recognition and linking
TLDR
A joint model for NER and EL is presented, called NEREL, that takes a large set of candidate mentions from typical NER systems and a largeSet of candidate entity links from EL systems, and ranks the candidate mention-entity pairs together to make joint predictions. Expand
Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability
TLDR
The paper shows how a strong semantic model coupled with "light re-training" enables PRECISE to overcome parser errors, and correctly map from parsed questions to the corresponding SQL queries. Expand
Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling
TLDR
It is demonstrated that distributional representations of word types, trained on unannotated text, can be used to improve performance on rare words and reduces the sample complexity of sequence labeling. Expand
Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison
TLDR
Three distinct ways to improve KNOWITALL's recall and extraction rate without sacrificing precision are presented and evaluated and their performance is evaluated. Expand
Unsupervised Methods for Determining Object and Relation Synonyms on the Web
TLDR
This paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K, and introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. Expand
To buy or not to buy: mining airfare data to minimize ticket purchase price
TLDR
A pilot study in the domain of airline ticket prices suggests that mining of price data available over the web has the potential to save consumers substantial sums of money per annum. Expand
...
1
2
3
4
5
...