Skip to search form
Skip to main content
Skip to account menu
Semantic Scholar
Semantic Scholar's Logo
Search 218,237,293 papers from all fields of science
Search
Sign In
Create Free Account
Byte pair encoding
Known as:
Byte pair compression
, Digram coding
, Dual tile encoding
Byte pair encoding or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with…
Expand
Wikipedia
(opens in a new tab)
Create Alert
Alert
Related topics
Related topics
8 relations
Byte
Data compression
Dictionary coder
Grammar induction
Expand
Papers overview
Semantic Scholar uses AI to extract papers important to this topic.
Highly Cited
2020
Highly Cited
2020
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Qiaolin Xia
,
Haoyang Huang
,
+7 authors
Ming Zhou
Natural Language Processing and Chinese Computing
2020
Corpus ID: 211817758
While many BERT-based cross-modal pre-trained models produce excellent results on downstream understanding tasks like image-text…
Expand
2020
2020
Code Completion using Neural Attention and Byte Pair Encoding
Youri Arkesteijn
,
Nikhil Saldanha
,
Bastijn Kostense
arXiv.org
2020
Corpus ID: 215754320
In this paper, we aim to do code completion based on implementing a Neural Network from Li et. al.. Our contribution is that we…
Expand
2019
2019
An Attention Ensemble Based Approach for Multilabel Profanity Detection
Pratik Ratadiya
,
Deepak Mishra
International Conference on Data Mining Workshops…
2019
Corpus ID: 210695222
The amount of user-generated content in the cyberspace keeps increasing in the 21st century. However, it has also meant an…
Expand
2017
2017
On invariant random subgroups of block-diagonal limits of symmetric groups
A. Dudko
,
K. Medynets
Proceedings of the American Mathematical Society
2017
Corpus ID: 119668690
We classify the ergodic invariant random subgroups of block-diagonal limits of symmetric groups in the cases when the groups are…
Expand
2017
2017
Controlling byte pair encoding for neural machine translation
Alfred John Tacorda
,
Marvin John Ignacio
,
Nathaniel Oco
,
R. Roxas
International Conference on Asian Language…
2017
Corpus ID: 3494673
Byte pair encoding(BPE) is an approach that segments the corpus in such a way that frequent sequence of characters are combined…
Expand
2017
2017
Improving Password Guessing Using Byte Pair Encoding
Xingxing Wang
,
Dakui Wang
,
Xiaojun Chen
,
Rui Xu
,
Jinqiao Shi
,
Li Guo
Information Security Conference
2017
Corpus ID: 32872434
Recent many password guessing algorithms based on the Probabilistic Context-Free Grammars (PCFGs) model brought significant…
Expand
2017
2017
Neural Machine Translation by Generating Multiple Linguistic Factors
Mercedes García-Martínez
,
Loïc Barrault
,
Fethi Bougares
International Conference on Statistical Language…
2017
Corpus ID: 4708595
Factored neural machine translation (FNMT) is founded on the idea of using the morphological and grammatical decomposition of the…
Expand
2016
2016
Neural Machine Translation with Characters and Hierarchical Encoding
alexander rosenberg johansen
,
Jonas Meinertz Hansen
,
Elias Khazen Obeid
,
C. Sønderby
,
O. Winther
arXiv.org
2016
Corpus ID: 1808124
Most existing Neural Machine Translation models use groups of characters or whole words as their unit of input and output. We…
Expand
2016
2016
Learning variable length units for SMT between related languages via Byte Pair Encoding
Anoop Kunchukuttan
,
P. Bhattacharyya
SWCN@EMNLP
2016
Corpus ID: 15140656
We explore the use of segments learnt using Byte Pair Encoding (referred to as BPE units) as basic units for statistical machine…
Expand
2010
2010
ISSDC: Digram Coding Based Lossless Data Compression Algorithm
A. Mesut
,
A. Carus
Computing and informatics
2010
Corpus ID: 16915566
In this paper, a new lossless data compression method that is based on digram coding is introduced. This data compression method…
Expand
By clicking accept or continuing to use the site, you agree to the terms outlined in our
Privacy Policy
(opens in a new tab)
,
Terms of Service
(opens in a new tab)
, and
Dataset License
(opens in a new tab)
ACCEPT & CONTINUE