Bootstrapping Lexical Choice via Multiple-Sequence Alignment

@inproceedings{Barzilay2002BootstrappingLC,
  title={Bootstrapping Lexical Choice via Multiple-Sequence Alignment},
  author={Regina Barzilay and Lillian Lee},
  booktitle={EMNLP},
  year={2002}
}
An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing multiple-sequence alignment, a technique commonly used in bioinformatics. Crucially, our method lever-ages latent information contained in multi… 
Expanding Paraphrase Lexicons by Exploiting Generalities
TLDR
This article presents a method for systematically expanding an initial seed lexicon made up of high-quality paraphrases by automatically capturing morpho-semantic and syntactic generalizations within the lexicon and using them to leverage the power of large-scale monolingual data.
Statistical Acquisition of Content Selection Rules for Natural Language Generation
TLDR
This paper presents a method to acquire content selection rules automatically from a corpus of text and associated semantics and evaluated by comparing its output with information selected by human authors in unseen texts, where it was able to filter half the input data set without loss of recall.
Adding Syntax to Dynamic Programming for Aligning Comparable Texts for the Generation of Paraphrases
TLDR
This paper describes an algorithm for incorporating syntactic features in the alignment process for non-parallel texts with the goal of generating novel paraphrases of existing texts using dynamic programming with alignment decision based on the local syntactic similarity between two sentences.
Curate and Generate: A Corpus and Method for Joint Control of Semantics and Style in Neural NLG
TLDR
YelpNLG is presented, a corpus of 300,000 rich, parallel meaning representations and highly stylistically varied reference texts spanning different restaurant attributes, and a novel methodology that can be scalably reused to generate NLG datasets for other domains is described.
Structural alignment for finite-state syntactic processing
In this technical report, we present some preliminary experiments on using multiple sequence alignment (MSA) techniques for inducing monolingual finite-state tagging models that capture some global
Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
TLDR
This work examines a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach and proposes a straightforward exact decoding technique based on integer linear programming that yields order- of-magnitude improvements in decoding speed.
Prenominal Modifier Ordering via Multiple Sequence Alignment
TLDR
A novel approach to producing a fluent ordering for a set of prenominal modifiers in a noun phrase is presented, adapting multiple sequence alignment techniques used in computational biology to the alignment of modifiers.
Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods
TLDR
A comprehensive and application-independent survey of data-driven phrasal and sentential paraphrase generation methods is conducted, while also conveying an appreciation for the importance and potential use of paraphrases in the field of NLP research.
A Metric for Paraphrase Detection
  • J. Cordeiro, G. Dias, P. Brazdil
  • Computer Science
    2007 International Multi-Conference on Computing in the Global Information Technology (ICCGI'07)
  • 2007
TLDR
This paper proposes a new metric for unsupervised detection of paraphrases and test it over a set of standard paraphrase corpora and the results are promising as they outperform state-of-the-art measures developed for similar tasks.
A Survey of Paraphrasing and Textual Entailment Methods
TLDR
Key ideas from the two areas of paraphrasing and textual entailment are summarized by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Bootstrapping Syntax and Recursion using Alginment-Based Learning
  • M. Zaanen
  • Computer Science, Mathematics
    ICML
  • 2000
TLDR
A new type of unsupervised learning algorithm, based on the alignment of sentences and Harris’s (1951) notion of interchangeability is introduced, which results in a labelled, bracketed version of the corpus of natural language sentences.
Trainable Methods for Surface Natural Language Generation
TLDR
Three systems for surface natural language generation that are trainable from annotated corpora that attempt to produce a grammatical natural language phrase from a domain-specific semantic representation are presented.
Natural language understanding using statistical machine translation
TLDR
This paper investigates an approach to NLU, which is derived from the field of statistical machine translation, and describes the problem of NLU as a translation from a source sentence to a formallanguage target sentence.
Models of translation equivalence among words
TLDR
This article presents methods for biasing statistical translation models to reflect bitext properties, and shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs.
Generation that Exploits Corpus-Based Statistical Knowledge
We describe novel aspects of a new natural language generator called Nitrogen. This generator has a highly flexible input representation that allows a spectrum of input from syntactic to semantic
Extracting Paraphrases from a Parallel Corpus
TLDR
This work presents an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text that yields phrasal and single word lexical paraphrasing as well as syntactic paraphrase.
The Mathematics of Statistical Machine Translation: Parameter Estimation
TLDR
It is reasonable to argue that word-by-word alignments are inherent in any sufficiently large bilingual corpus, given a set of pairs of sentences that are translations of one another.
Finding consensus in speech recognition: word error minimization and other applications of confusion networks
We describe a new framework for distilling information from word lattices to improve the accuracy of the speech recognition output and obtain a more perspicuous representation of a set of alternative
Proof Verbalization as an Application of NLG
TLDR
The linguistic part of a system called PROVERB, which transforms, abstracts, and verbalizes machine-found proofs into formatedtexts, is described, which works fully automatically and performs particularly well for textbook size examples.
Exploiting a Probabilistic Hierarchical Model for Generation
TLDR
Initial results are presented showing that a tree-based model derived from aTree-annotated corpus improves on a tree modelderived from an unannotated Corpus, and that a Tree-based stochastic model with a hand-crafted grammar outperforms both.
...
1
2
3
...