Corpus ID: 237485423

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

  title={What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers},
  author={Boseop Kim and Hyoungseok Kim and Sang-Woo Lee and Gichang Lee and Donghyun Kwak and Dong Hyeon Jeon and Sunghyun Park and Sungju Kim and Seonhoon Kim and Dong Hyung Seo and Heungsub Lee and Minyoung Jeong and Sungjae Lee and Minsub Kim and SukHyun Ko and Seok Min Kim and Taeyong Park and Jinuk Kim and Soyoung Kang and Na-Hyeon Ryu and Kang Min Yoo and Minsuk Chang and Soobin Suh and Sookyo In and Jinseong Park and Kyungduk Kim and Hiun Kim and Jisu Jeong and Yong Goo Yeo and Dong-hyun Ham and Dongju Park and Min Young Lee and Jaewoo Kang and Inho Kang and Jung-Woo Ha and Woomyoung Park and Nako Sung},
GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a nonEnglish LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean… Expand
PAGnol: An Extra-Large French Generative Model
PAGnol, a collection of French GPT models, is introduced, and a scaling law for compute for the French language is fit, and it is found the pre-training dataset significantly conditions the quality of the outputs, with common datasets leading to low-quality offensive text. Expand
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
P-Tuning v2 is a novel empirical finding that properly-optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks, where it matches the performance of finetuning while having only 0.1%-3% tuned parameters. Expand
Intent-based Product Collections for E-commerce using Pretrained Language Models
  • Hiun Kim, Jisu Jeong, +7 authors Rak Yeong Kim
  • Computer Science
  • ArXiv
  • 2021
A pretrained language model (PLM) is used that leverages textual attributes of web-scale products to make intent-based product collections and significantly outperforms the search-based baseline model for intent- based product matching in offline evaluations. Expand
Is the Number of Trainable Parameters All That Actually Matters?
This work emulates an increase in effective parameters, using efficient approximations: either by doping the models with frozen random parameters, or by using fast structured transforms in place of dense linear layers, and finds that the scaling relationship between test loss and compute depends only on the actual number of trainable parameters. Expand


GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
This paper proposes a novel data augmentation technique that leverages large-scale language models to generate realistic text samples from a mixture of real samples and utilizes soft-labels predicted by the language models, effectively distilling knowledge from the large- scale language models and creating textual perturbations simultaneously. Expand
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-tuning is proposed, a lightweight alternative to fine- Tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Expand
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
It is suggested that the function of few-shot examples in these cases is better described as locating an already learned task rather than meta-learning, which motivates rethinking the role of prompts in controlling and evaluating powerful language models. Expand
PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
The experimental results demonstrate the superior capabilities of PanGu-α in performing various tasks under few-shot or zero-shot settings and investigate the effect of model scales on the few- shot performances across a broad range of Chinese NLP tasks. Expand
Language Models are Unsupervised Multitask Learners
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
A simple, efficient intra-layer model parallel approach that enables training transformer models with billions of parameters and shows that careful attention to the placement of layer normalization in BERT-like models is critical to achieving increased performance as the model size grows. Expand
The Power of Scale for Parameter-Efficient Prompt Tuning
This work explores “prompt tuning”, a simple yet effective mechanism for learning “soft prompts” to condition frozen language models to perform specific downstream tasks, and shows that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning. Expand
CTRL: A Conditional Transformer Language Model for Controllable Generation
CTRL is released, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior, providing more explicit control over text generation. Expand
An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks
Experimental results demonstrate that a hybrid approach of morphological segmentation followed by BPE works best in Korean to/from English machine translation and natural language understanding tasks such as KorNLI, KorSTS, NSMC, and PAWS-X. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand