Learn More
This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many labels which increase the computational cost. To reduce the computational cost, we propose two techniques: the first is the use of feature forests, which enables(More)
In this paper, we propose a novel discrim-inative language model, which can be applied quite generally. Compared to the well known N-gram language models, dis-criminative language models can achieve more accurate discrimination because they can employ overlapping features and non-local information. However, discriminative language models have been used only(More)
Shallow parsing is one of many NLP tasks that can be reduced to a sequence labeling problem. In this paper we show that the latent-dynamics (i.e., hidden sub-structure of shallow phrases) constitutes a problem in shallow parsing, and we show that modeling this intermediate structure is useful. By analyzing the automatically learned hidden states, we show(More)
We propose a perceptron-style algorithm for fast discriminative training of structured latent variable model, and analyzed its convergence properties. Our method extends the perceptron algorithm for the learning task with latent dependencies, which may not be captured by traditional models. It relies on Viterbi decoding over latent variables, combined with(More)
Estimating the conditional mean of an input-output relation is the goal of regression. However, regression analysis is not sufficiently informative if the conditional distribution has multi-modality, is highly asymmetric, or contains heteroscedastic noise. In such scenarios, estimating the conditional distribution itself would be more useful. In this paper,(More)
To compute Burrows-Wheeler Transform (BWT), one usually builds a suffix array (SA) first, and then obtains BWT using SA, which requires much redundant working space. In previous studies to compute BWT directly [6, 13], one constructs BWT incrementally, which requires O(n log n) time where n is the length of the input text. We present an algorithm for(More)
A linear graph is a graph whose vertices are totally ordered. Biological and linguistic sequences with interactions among symbols are naturally represented as linear graphs. Examples include protein contact maps, RNA secondary structures and predicate-argument structures. Our algorithm, linear graph miner (LGM), leverages the vertex order for efficient(More)
BACKGROUND Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and(More)