Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings

  title={Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings},
  author={Bernd Bohnet and Ryan T. McDonald and Gonçalo Sim{\~o}es and Daniel Andor and Emily Pitler and Joshua Maynez},
The rise of neural networks, and particularly recurrent neural networks, has produced significant advances in part-of-speech tagging accuracy. One characteristic common among these models is the presence of rich initial word encodings. These encodings typically are composed of a recurrent character-based representation with dynamically and pre-trained word embeddings. However, these encodings do not consider a context wider than a single word and it is only through subsequent recurrent layers… 

Figures and Tables from this paper

Reproducing a Morphosyntactic Tagger with a Meta-BiLSTM Model over Context Sensitive Token Encodings

This work reproduces the work reported on by Bohnet et al. (2018) on morphosyntactic tagging by integrating sentence-level and single-word context through synchronized training by a meta-model and suggests that different reporting choices could improve the interpretability of the results.

Morphosyntactic Label Disambiguation

A variety of recurrent neural network models that use a combination of bi-directional Long Short-Term Memory and Conditional Random Fields to disambiguate morphosyntactic labels for Portuguese are presented.

Linguistic Knowledge and Transferability of Contextual Representations

It is found that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge.

Part-of-Speech Tagging Using Multiview Learning

Two additional extended methods are proposed: a multihead-attention character-level representation for capturing several aspects of subword information, and an optimal structure for training two different character- level embeddings based on joint learning.

Pre-trained Contextualized Character Embeddings Lead to Major Improvements in Time Normalization: a Detailed Analysis

This work derives character-level contextual embeddings from Flair, and applies them to a time normalization task, yielding major performance improvements over the previous state-of-the-art: 51% error reduction in news and 33% in clinical notes.

Predictive Representation Learning for Language Modeling

This work shows that explicitly encoding a simple predictive task facilitates the search for a more effective language model, and proposes Predictive Representation Learning (PRL), which explicitly constrains LSTMs to encode specific predictions, like those that might need to be learned implicitly.

82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models

The Uppsala system is a pipeline consisting of three components: the first performs joint word and sentence segmentation; the second predicts part-of-speech tags and morphological features; the third predicts dependency trees from words and tags.

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

This work uses the multilingual OSCAR corpus, extracted from Common Crawl via language classification, filtering and cleaning, to train monolingual contextualized word embeddings (ELMo) for five mid-resource languages and shows that the benefit of a larger, more diverse corpus surpasses the cross-lingual benefit of multilingual embedding architectures.

On the Importance of Delexicalization for Fact Verification

This work investigates the importance that a model assigns to various aspects of data while learning and making predictions, specifically, in a recognizing textual entailment (RTE) task, and finds that most of the weights are assigned to noun phrases.

A neural joint model for Vietnamese word segmentation, POS tagging and dependency parsing

This work proposes the first multi-task learning model for joint Vietnamese word segmentation, part-of-speech (POS) tagging and dependency parsing that extends the BIST graph-based dependency parser with BiLSTM-CRF-based neural layers.



Learning Character-level Representations for Part-of-Speech Tagging

A deep neural network is proposed that learns character-level representation of words and associate them with usual word representations to perform POS tagging and produces state-of-the-art POS taggers for two languages.

Improved Transition-Based Parsing and Tagging with Neural Networks

This research introduces set-valued features to encode the predicted morphological properties and part-ofspeech confusion sets of the words being parsed in neural network transition-based dependency parsing.

Stack-propagation: Improved Representation Learning for Syntax

This work proposes a simple method for learning a stacked pipeline of models which it calls “stack-propagation”, and applies this to dependency parsing and tagging, where the hidden layer of the tagger network is used as a representation of the input tokens for the parser.

Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss

This work presents a novel bi-LSTM model, which combines the POS tagging loss function with an auxiliary loss function that accounts for rare words, which obtains state-of-the-art performance across 22 languages, and works especially well for morphologically complex languages.

Stanford's Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task

This paper describes the neural dependency parser submitted by Stanford to the CoNLL 2017 Shared Task on parsing Universal Dependencies, which was ranked first according to all five relevant metrics for the system.

A Joint Model for Word Embedding and Word Morphology

A joint model for performing unsupervised morphological analysis on words, and learning a character-level composition function from morphemes to word embeddings, which is comparable to dedicated morphological analyzers at the task of morpheme boundary recovery and performs better than word-based embedding models at thetask of syntactic analogy answering.

A unified architecture for natural language processing: deep neural networks with multitask learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic

Globally Normalized Transition-Based Neural Networks

We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity