Learn More
This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many labels which increase the computational cost. To reduce the computational cost, we propose two techniques: the first is the use of feature forests, which enables(More)
In this paper, we propose a novel discrim-inative language model, which can be applied quite generally. Compared to the well known N-gram language models, dis-criminative language models can achieve more accurate discrimination because they can employ overlapping features and non-local information. However, discriminative language models have been used only(More)
Estimating the conditional mean of an input-output relation is the goal of regression. However, regression analysis is not sufficiently informative if the conditional distribution has multi-modality, is highly asymmetric, or contains heteroscedastic noise. In such scenarios, estimating the conditional distribution itself would be more useful. In this paper,(More)
Shallow parsing is one of many NLP tasks that can be reduced to a sequence labeling problem. In this paper we show that the latent-dynamics (i.e., hidden sub-structure of shallow phrases) constitutes a problem in shallow parsing, and we show that modeling this intermediate structure is useful. By analyzing the automatically learned hidden states, we show(More)
We propose a perceptron-style algorithm for fast discriminative training of structured latent variable model, and analyzed its convergence properties. Our method extends the perceptron algorithm for the learning task with latent dependencies, which may not be captured by traditional models. It relies on Viterbi decoding over latent variables, combined with(More)
To compute Burrows-Wheeler Transform (BWT), one usually builds a suffix array (SA) first, and then obtains BWT using SA, which requires much redundant working space. In previous studies to compute BWT directly [6, 13], one constructs BWT incrementally, which requires O(n log n) time where n is the length of the input text. We present an algorithm for(More)
We propose a novel type of document classification task that quantifies how much a given document (review) appreciates the target object using not binary polarity (good or bad) but a continuous measure called sentiment polarity score (sp-score). An sp-score gives a very concise summary of a review and provides more information than binary classification.(More)
A linear graph is a graph whose vertices are totally ordered. Biological and linguistic sequences with interactions among symbols are naturally represented as linear graphs. Examples include protein contact maps, RNA secondary structures and predicate-argument structures. Our algorithm, linear graph miner (LGM), leverages the vertex order for efficient(More)
The concept of synthetic gradient introduced by Jaderberg et al. (2016) provides an avant-garde framework for asynchronous learning of neural network. Their model, however, has a weakness in its construction, because the structure of their synthetic gradient has little relation to the objective function of the target task. In this paper we introduce virtual(More)