• Corpus ID: 52041633

Syntree2Vec - An algorithm to augment syntactic hierarchy into word embeddings

  title={Syntree2Vec - An algorithm to augment syntactic hierarchy into word embeddings},
  author={Shubham Bhardwaj},
Word embeddings aims to map sense of the words into a lower dimensional vector space in order to reason over them. Training embeddings on domain specific data helps express concepts more relevant to their use case but comes at a cost of accuracy when data is less. Our effort is to minimise this by infusing syntactic knowledge into the embeddings. We propose a graph based embedding algorithm inspired from node2vec. Experimental results have shown that our algorithm improves the syntactic… 

Figures and Tables from this paper

An Unsupervised Approach to Structuring and Analyzing Repetitive Semantic Structures in Free Text of Electronic Medical Records

This work presents an unsupervised approach to medical data annotation and shows on a validation dataset that the proposed labeling method generates meaningful labels correctly for 92.7% of groups.



Modeling Order in Neural Word Embeddings at Scale

A new neural language model incorporating both word order and character order in its embedding is proposed, which produces several vector spaces with meaningful substructure, as evidenced by its performance on a recent word-analogy task.

How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks

It is proposed that evaluation of word representation evaluation should focus on data efficiency and simple supervised tasks, where the amount of available data is varied and scores of a supervised model are reported for each subset (as commonly done in transfer learning).

Two/Too Simple Adaptations of Word2Vec for Syntax Problems

We present two simple modifications to the models in the popular Word2Vec tool, in order to generate embeddings more suited to tasks involving syntax. The main issue with the original models is the

Evaluation methods for unsupervised word embeddings

A comprehensive study of evaluation methods for unsupervised embedding techniques that obtain meaningful representations of words from text, calling into question the common assumption that there is one single optimal vector representation.

SyntaxNet Models for the CoNLL 2017 Shared Task

A baseline dependency parsing system for the CoNLL2017 Shared Task, which is called "ParseySaurus," which uses the DRAGNN framework to combine transition-based recurrent parsing and tagging with character-based word representations.

Distributed Representations of Words and Phrases and their Compositionality

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

node2vec: Scalable Feature Learning for Networks

In node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks, a flexible notion of a node's network neighborhood is defined and a biased random walk procedure is designed, which efficiently explores diverse neighborhoods.

A Neural Probabilistic Language Model

This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.

Evaluating Generative Models for Text Generation

Here it is hoped to extend the evaluation presented for the SeqGAN model in Yu et al. (2016) using two additional datasets and an additional perplexity evaluation metric.

Character-level Convolutional Networks for Text Classification

This article constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results in text classification.