Learn More
We train a decision tree inducer (CART) and a memory-based classifier (MBL) on predicting prosodic pitch accents and breaks in Dutch text, on the basis of shallow , easy-to-compute features. We train the algorithms on both tasks individually and on the two tasks simultaneously. The parameters of both algorithms and the selection of features are optimized(More)
In this paper, we investigate the usefulness of normalized alignment of dependency trees for entailment prediction. Overall, our approach yields an accuracy of 60% on the RTE2 test set, which is a significant improvement over the baseline. Results vary substantially across the different subsets, with a peak performance on the summarization data. We conclude(More)
We propose to analyse semantic similarity in comparable text by matching syntactic trees and labeling the alignments according to one of five semantic similarity relations. We present a Memory-based Graph Matcher (MBGM) that performs both tasks simultaneously as a combination of exhaustive pairwise classification using a memory-based learner, followed by(More)
This paper * describes an eye tracking experiment to study the processing of diphone synthesis, unit selection synthesis, and human speech taking segmental and suprasegmental speech quality into account. The results showed that both factors influenced the processing of human and synthetic speech, and confirmed that eye tracking is a promising albeit time(More)
Sentence fusion is a text-to-text (revision-like) generation task which takes related sentences as input and merges these into a single output sentence. In this paper we describe our ongoing work on developing a sentence fusion module for Dutch. We propose a generalized version of alignment which not only indicates which words and phrases should be aligned(More)
We show that question-based sentence fusion is a better defined task than generic sentence fusion (Q-based fusions are shorter, display less variety in length, yield more identical results and have higher normalized Rouge scores). Moreover, we show that in a QA setting , participants strongly prefer Q-based fusions over generic ones, and have a preference(More)
For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article headlines are a rich source of paraphrases; they tend to describe the same event in various different ways, and can easily be obtained from the web. We compare two methods of aligning headlines(More)