Learn More
In this paper, we introduce EVALution 1.0, a dataset designed for the training and the evaluation of Distributional Semantic Models (DSMs). This version consists of almost 7.5K tuples, instantiating several semantic relations between word pairs (including hypernymy, synonymy, antonymy, meronymy). The dataset is enriched with a large amount of additional(More)
This paper reports our ongoing project for constructing an English multiword expression (MWE) dictionary and NLP tools based on the developed dictionary. We extracted functional MWEs from the English part of Wik-tionary, annotated the Penn Treebank (PTB) with MWE information, and conducted POS tagging experiments. We report how the MWE annotation is done on(More)
We propose a framework to model human comprehension of discourse connec-tives. Following the Bayesian pragmatic paradigm, we advocate that discourse con-nectives are interpreted based on a simulation of the production process by the speaker, who, in turn, considers the ease of interpretation for the listener when choosing connectives. Evaluation against the(More)
We propose a linguistically driven approach to represent discourse relations in Chinese text as sequences. We observe that certain surface characteristics of Chi-nese texts, such as the order of clauses, are overt markers of discourse structures, yet existing annotation proposals adapted from formalism constructed for English do not fully incorporate these(More)
Usage of discourse connectives (DCs) differs across languages, thus addition and omission of connectives are common in translation. We investigate how implicit (omitted) DCs in the source text impacts various machine translation (MT) systems, and whether a discourse parser is needed as a preprocessor to explicitate implicit DCs. Based on the manual(More)
Discourse relations can either be implicit or explicitly expressed by markers, such as 'therefore' and 'but'. How a speaker makes this choice is a question that is not well understood. We propose a psy-cholinguistic model that predicts whether a speaker will produce an explicit marker given the discourse relation s/he wishes to express. Based on the(More)
ii Preface We are very pleased to welcome you to the 1st Workshop on Semantics-Driven Statistical Machine Translation (S 2 MT) in conjunction with ACL, held on July 30, 2015 at Beijing, China. Over the last two decades, statistical machine translation (SMT) has made a substantial progress from word-based to phrase and syntax-based SMT. Recently the progress(More)
Professional human translators usually do not employ the concept of word alignments , producing translations 'sense-for-sense' instead of 'word-for-word'. This suggests that unalignable words may be prevalent in the parallel text used for machine translation (MT). We analyze this phenomenon in-depth for Chinese-English translation. We further propose a(More)
  • 1