Share This Author
Hard-Coded Gaussian Attention for Neural Machine Translation
A “hard-coded” attention variant without any learned parameters is developed, which offers insight into which components of the Transformer are actually important, which it is hoped will guide future work into the development of simpler and more efficient attention-based models.
Do Long-Range Language Models Actually Use Long-Range Context?
- Simeng Sun, Kalpesh Krishna, Andrew Mattarella-Micke, Mohit Iyyer
- Computer ScienceConference on Empirical Methods in Natural…
- 19 September 2021
This paper performs a fine-grained analysis of two long-range Transformer language models (including the Routing Transformer, which achieves state-of-the-art perplexity on the PG-19 long-sequence LM benchmark dataset) that accept input sequences of up to 8K tokens and discovers that long-ranging context helps most for literary novels.
Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models
- Subhajit Naskar, Pedram Rooshenas, Simeng Sun, Mohit Iyyer, A. McCallum
- Computer ScienceAnnual Meeting of the Association for…
- 20 September 2020
The discrepancy between maximum likelihood estimation (MLE) and task measures such as BLEU score has been studied before for autoregressive neural machine translation (NMT) and this work trains an energy-based model to mimic the behavior of the task measure, which is resulted in a re-ranking algorithm based on the samples drawn from NMT.
How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature
- Simeng Sun, Ori Shapira, Ido Dagan, A. Nenkova
- Computer ScienceProceedings of the Workshop on Methods for…
A new method is proposed which normalizes the ROUGE F1 scores of a system by that of a random system with same average output length, to alleviate the effect of length during evaluation.
System architecture for high-performance permissioned blockchains
This paper proposes a novel architecture, called Dual-Channel Parallel Broadcast model (DCPB), which could address the inefficient transaction processing speed of BC to a greater extent by using three methods which are dual communication channels, parallel pipeline processing and block broadcast strategy.
ChapterBreak: A Challenge Dataset for Long-Range Language Models
- Simeng Sun, Katherine Thai, Mohit Iyyer
- Computer ScienceNorth American Chapter of the Association for…
- 22 April 2022
This work introduces C HAPTER B REAK, a challenge dataset that provides an LRLM with a long segment from a narrative that ends at a chapter boundary and asks it to distinguish the beginning of the ground-truth next chapter from a set of negative segments sampled from the same narrative.
Name Disambiguation for Chinese Scientific Authors with Multi-Level Clustering
- Simeng Sun, Hui Zhang, Ning Li, Yong Chen
- Computer Science22017 IEEE International Conference on…
- 1 July 2017
This paper proposes an entirely unsupervised framework to achieve well-performed disambiguation, specifically, a multilevel clustering algorithm that builds a discipline tree in which paper and author entities are matched.
The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization
The experimental results show that the max value over each dimension of the summary ELMo word embeddings is a good representation that results in high correlation with human ratings, and averaging the cosine similarity of all encoders the authors tested yieldsHigh correlation with manual scores in reference-free setting.
Revisiting Simple Neural Probabilistic Language Models
This paper revisits the neural probabilistic language model of Bengio et al. (2003), which simply concatenates word embeddings within a fixed window and passes the result through a feed-forward network to predict the next word, and results in small but consistent perplexity decreases across three word-level language modeling datasets.
IGA: An Intent-Guided Authoring Assistant
An interactive writing assistant that generates and rephrases text according to fine-grained author specifications and fine-tune a language model on a dataset heuristically-labeled with author intent is built.