Learn More
Syntactic features play an essential role in identifying relationship in a sentence. Previous neural network models directly work on raw word sequences or constituent parse trees, thus often suffer from irrelevant information introduced when subjects and objects are in a long distance. In this paper, we propose to learn more robust relation representations(More)
Traditional <i>n</i> -gram language models are widely used in state-of-the-art large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximum-likelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language(More)
Existing knowledge-based question answering systems often rely on small annotated training data. While shallow methods like relation extraction are robust to data scarcity, they are less expressive than the deep meaning representation methods like semantic parsing, thereby failing at answering questions involving multiple constraints. Here we alleviate this(More)
We present an approximation to the Bayesian hierarchical Pitman-Yor process language model which maintains the power law distribution over word tokens, while not requiring a computationally expensive approximate inference process. This approximation, which we term power law discounting, has a similar computational complexity to interpolated and modified(More)
In this paper, we investigate the use of bilingual parsing on parallel corpora to better estimate the rule parameters in a formal syntax-based machine translation system, which are normally estimated from the inaccurate heuristics. We use an Expectation-Maximization (EM) algorithm to re-estimate the parameters of synchronous context-free grammar (SCFG)(More)
In this paper, we propose a Weakly Supervised Matrix Factorization (WSMF) approach to the problem of image parsing with noisy tags, i.e., segmenting noisily tagged images and then classifying the regions only with image-level labels. Instead of requiring clean but expensive pixel-level labels as strong supervision in the traditional image parsing methods,(More)
We continue our previous work on the modeling of topic and role information from multiparty meetings using a hierarchical Dirichlet process (HDP), in the context of language model adaptation. In this paper we focus on three problems: 1) an empirical analysis of the HDP as a nonparametric topic model; 2) the mis-match problem of vocabularies of the baseline(More)