Corpus ID: 5477884

For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia

@article{Yatskar2010ForTS,
  title={For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia},
  author={Mark Yatskar and Bo Pang and Cristian Danescu-Niculescu-Mizil and Lillian Lee},
  journal={ArXiv},
  year={2010},
  volume={abs/1008.1986}
}
We report on work in progress on extracting lexical simplifications (e.g., "collaborate" → "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield… Expand
Learning a Lexical Simplifier Using Wikipedia
TLDR
This paper extracts over 30K candidate lexical simplifications by identifying aligned words in a sentencealigned corpus of English Wikipedia with Simple English Wikipedia using a feature-based ranker trained on a set of labeled simplifications collected using Amazon’s Mechanical Turk. Expand
Putting it Simply: a Context-Aware Approach to Lexical Simplification
TLDR
Results show that the method outperforms an established simplification baseline for both meaning preservation and simplification, while maintaining a high level of grammaticality. Expand
Aligning Sentences from Standard Wikipedia to Simple Wikipedia
TLDR
This work improves monolingual sentence alignment for text simplification, specifically for text in standard and simple Wikipedia by using a greedy search over the document and a word-level semantic similarity score based on Wiktionary that also accounts for structural similarity through syntactic dependencies. Expand
Identifying targets for syntactic simplification
TLDR
This study uses a variety of lexical and parse features, as well as a score of the relatedness of a sentence to the topic of its document, to predict a range of sentence changes, including the standard problems of splitting and shortening. Expand
A Simple BERT-Based Approach for Lexical Simplification
TLDR
This work presents a simple BERT-based LS approach that makes use of the pre-trained unsupervised deep bidirectional representations BERT, and experimental results show that this approach obtains obvious improvement than these baselines leveraging linguistic databases and parallel corpus. Expand
An Analysis of Crowdsourced Text Simplifications
TLDR
The aim is to understand whether a complex-simple parallel corpus involving this version of Wikipedia is appropriate as data source to induce simplification rules, and whether the authors can automatically categorise the different operations performed by humans. Expand
Simplifying Lexical Simplification: Do We Need Simplified Corpora?
TLDR
This work presents an unsupervised approach to lexical simplification that makes use of the most recent word vector representations and requires only regular corpora, and is as effective as systems that rely on simplified corpora. Expand
SimpleScience: Lexical Simplification of Scientific Terminology
TLDR
This work uses word embeddings to extract simplification rules from a parallel corpora containing scientific publications and Wikipedia, and finds that the approach outperforms prior context-aware approaches at generating simplifications for scientific terms. Expand
WordNet-based lexical simplification of a document
We explore algorithms for the automatic generation of a limited-size lexicon from a document, such that the lexicon covers as much as possible of the semantic space of the original document, asExpand
Simple English Wikipedia: A New Text Simplification Task
TLDR
A new data set is introduced that pairs English Wikipedia with Simple English Wikipedia and is orders of magnitude larger than any previously examined for sentence simplification and contains the full range of simplification operations including rewording, reordering, insertion and deletion. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
Syntactic Simplification for Improving Content Selection in Multi-Document Summarization
TLDR
It is shown how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information. Expand
Text Simplification for Information-Seeking Applications
TLDR
The notion of Easy Access Sentence is defined – a unit of text from which the information it contains can be retrieved by a system with modest text-analysis capabilities, able to process single verb sentences with named entities as constituents. Expand
Extracting Lexical Reference Rules from Wikipedia
TLDR
The extraction from Wikipedia of lexical reference rules, identifying references to term meanings triggered by other terms is described, and the rule-base yields comparable performance to Word-Net while providing largely complementary information. Expand
Sentence Simplification for Semantic Role Labeling
TLDR
A general method for learning how to iteratively simplify a sentence, thus decomposing complicated syntax into small, easy-to-process pieces and achieving near-state-of-the-art performance across syntactic variation. Expand
Automatic induction of rules for text simplification
TLDR
An algorithm and an implementation are described by which generalized rules for simplification are automatically induced from annotated training material using a novel partial parsing technique which combines constituent structure and dependency information. Expand
Mining Wikipedia's Article Revision History for Training Computational Linguistics Algorithms
TLDR
A novel paradigm for obtaining large amounts of training data for computational linguistics tasks by mining Wikipedia’s article revision history is presented and it is proposed to use a sentence's persistence throughout a document's evolution as an indicator of its fitness as part of an extractive summary. Expand
Towards Robust Context-Sensitive Sentence Alignment for Monolingual Corpora
TLDR
A new monolingual sentence alignment algorithm is presented, combining a sentence-based TF*IDF score, turned into a probability distribution using logistic regression, with a global alignment dynamic programming algorithm, achieving a substantial improvement in accuracy over existing systems. Expand
Mining a Lexicon of Technical Terms and Lay Equivalents
We present a corpus-driven method for building a lexicon of semantically equivalent pairs of technical and lay medical terms. Using a parallel corpus of abstracts of clinical studies andExpand
Extracting Lay Paraphrases of Specialized Expressions from Monolingual Comparable Medical Corpora
TLDR
This study builds comparable corpora of specialized and lay texts in order to detect equivalent lay and specialized expressions and demonstrates that simple paraphrase acquisition methods can also work on texts with a rather small degree of similarity, once similar text segments are detected. Expand
Extracting lay paraphrase s of specialized expressions from monolingual comparable medical
  • corpora.Workshop on Building and Using Comparable Corpora,
  • 2009
...
1
2
...