Share This Author
Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
- Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld
- Computer ScienceACL
- 19 June 2011
A novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts is presented.
Fine-Grained Entity Recognition
A fine-grained set of 112 tags is defined, the tagging problem is formulates as multi-class, multi-label classification, an unsupervised method for collecting training data is described, and the FIGER implementation is presented.
Design Challenges for Entity Linking
This work analyzes differences between several versions of the EL problem and presents a simple yet effective, modular, unsupervised system, called Vinculum, for entity linking, and elucidate key aspects of the system that include mention extraction, candidate generation, entity type prediction, entity coreference, and coherence.
Spectral domain-transfer learning
This paper formulate this domain-transfer learning problem under a novel spectral classification framework, where the objective function is introduced to seek consistency between the in-domain supervision and the out-of-domain intrinsic structure through optimization of the cost function.
Effective Crowd Annotation for Relation Extraction
- Angli Liu, S. Soderland, Jonathan Bragg, C. H. Lin, Xiao Ling, Daniel S. Weld
- Computer ScienceNAACL
- 1 June 2016
This paper demonstrates that a much larger boost is possible in crowdsourced annotation of training data boost performance for relation extraction over methods based solely on distant supervision, thanks to a simple, generalizable technique, Gated Instruction.
Temporal Information Extraction
TIE is presented, a novel, information-extraction system, which distills facts from text while inducing as much temporal information as possible, and performs global inference, enforcing transitivity to bound the start and ending times for each event.
Synthesizing Union Tables from the Web
This paper defines the notion of stitchable tables and identifies collections of tables that can be stitched, and designs an effective algorithm for extracting hidden attributes that are essential for the stitching process and for aligning values of those attributes across tables to synthesize new columns.
Can chinese web pages be classified with english data source?
- Xiao Ling, Gui-Rong Xue, Wenyuan Dai, Yun Jiang, Qiang Yang, Yong Yu
- Computer ScienceWWW
- 21 April 2008
This paper proposes an information bottleneck based approach to address the cross-language classification problem of Chinese and English Web pages, and significantly improves several existing supervised and semi-supervised classifiers.
Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation
This work defines core reasoning patterns for disambiguation, creates a learning procedure to encourage the self-supervised model to learn the patterns, and shows how to use weak supervision to enhance the signals in the training data.
Evaluating Entity Disambiguation and the Role of Popularity in Retrieval-Based NLP
It is found that the retrievers exhibit popularity bias, significantly under-performing on rarer entities that share a name, e.g., they are twice as likely to retrieve erroneous documents on queries for the less popular entity under the same name.