• Publications
  • Influence
SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings
TLDR
We propose word alignment methods that require no parallel data. Expand
  • 10
  • 3
  • PDF
Identifying Necessary Elements for BERT's Multilinguality
TLDR
We propose an experimental setup with small BERT models and a mix of synthetic and natural data that allows for fast experimentation. Expand
  • 2
  • 1
  • PDF
Embedding Learning Through Multilingual Concept Induction
TLDR
We present a new method for estimating vector space representations of words: embedding learning by concept induction. Expand
  • 10
  • PDF
Modeling Graph Structure via Relative Position for Better Text Generation from Knowledge Graphs
TLDR
We present a novel encoder-decoder architecture for graph-to-text generation based on Transformer, called the Graformer, which achieves strong performance while using significantly less parameters than other approaches. Expand
  • 5
Analytical Methods for Interpretable Ultradense Word Embeddings
TLDR
In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier, linear SVMs and DensRay, a new method we propose. Expand
  • 5
  • PDF
Multilingual Embeddings Jointly Induced from Contexts and Concepts: Simple, Strong and Scalable
TLDR
We propose Co+Co, a simple and scalable multilingual embedding learner that combines context-based and concept-based learning. Expand
  • 2
  • PDF
A Stronger Baseline for Multilingual Word Embeddings
TLDR
We propose SC-ID, an extension to S-ID: given a sentence aligned corpus, we use sampling to extract concepts that are then processed in the same manner as S-IDs. Expand
  • 2
  • PDF
Quantifying the Contextualization of Word Representations with Semantic Class Probing
TLDR
Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well. Expand
  • 2
  • PDF
Quantifying the Contextualization of Word Representations with Semantic Class Probing
TLDR
We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embedding. Expand
  • 1
Identifying Elements Essential for BERT's Multilinguality
TLDR
We propose a multilingual pretraining setup that modifies the masking strategy using VecMap, i.e., unsupervised embedding alignment. Expand
  • 2
  • PDF