Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language

  title={Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language},
  author={Avia Efrat and Uri Shaham and Dan Kilman and Omer Levy},
Current NLP datasets targeting ambiguity can be solved by a native speaker with relative ease. We present Cryptonite, a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced. Each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and phonetic wordplays, as well as world knowledge. Cryptic clues pose a challenge even for experienced… 

Figures and Tables from this paper

Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP

A novel curriculum approach, in which the model is first fine-tuned on related tasks such as unscrambling words, and investigates model systematicity by perturbing the wordplay part of clues, showing that T5 exhibits behavior partially consistent with human solving strategies.

Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training

While simple character-level tokenization approaches still perform best on purely form-based tasks like string reversal, this method is superior for more complex tasks that blend form, meaning, and context, such as spelling correction in context and word search games.

A Major Obstacle for NLP Research: Let's Talk about Time Allocation!

It is demonstrated that, in recent years, subpar time allocation has been a major obstacle for NLP research and multiple concrete problems are out-line together with their negative consequences and remedies to improve the status quo are suggested.

Crossword Puzzle Resolution via Monte Carlo Tree Search

This paper is the first to model the crossword puzzle resolution problem as a Markov Decision Process and apply the MCTS to solve it, and can achieve an accuracy of 97% on the dataset.

What do tokens know about their characters and how do they know it?

The mechanisms through which PLMs acquire English-language character information during training are investigated and it is argued that this knowledge is acquired through multiple phenomena, including a systematic relationship between particular characters and particular parts of speech, as well as natural variability in the tokenization of related strings.

CC-Riddle: A Question Answering Dataset of Chinese Character Riddles

A Chinese character riddle dataset covering the majority of common simplified 007 Chinese characters by crawling riddles from the Web and generating brand new ones and it is found that the existing 019 models struggle to solve such tricky questions.

Automated Crossword Solving

The Berkeley Crossword Solver is presented, a state-of-the-art approach for automatically solving crossword puzzles that improves exact puzzle accuracy from 57% to 82% on crosswords from The New York Times and obtains 99.9% letter accuracy on themeless puzzles.



Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

rdeits/crypticcrosswords.jl: v0.1.1. 7Compare Florida fruit

  • 2021

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

This work demonstrates empirically that adaptive methods can produce larger-than-desired updates when the decay rate of the second moment accumulator is too slow, and proposes update clipping and a gradually increasing decay rate scheme as remedies.

Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets

A detailed study of the test sets of three popular open-domain benchmark datasets finds that 30% of test-set questions have a near-duplicate paraphrase in their corresponding train sets, and that simple nearest-neighbor models outperform a BART closed-book QA model.

Transformers: State-of-the-Art Natural Language Processing

Transformers is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community.

iSarcasm: A Dataset of Intended Sarcasm

Examining the state-of-the-art sarcasm detection models on the iSarcasm dataset showed low performance compared to previously studied datasets, which indicates that these datasets might be biased or obvious and sarcasm could be a phenomenon under-studied computationally thus far.

Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets

It is shown that model performance improves when training with annotator identifiers as features, and that models are able to recognize the most productive annotators and that often models do not generalize well to examples from annotators that did not contribute to the training set.

WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale

This work introduces WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset, and establishes new state-of-the-art results on five related benchmarks.

Automatic Sarcasm Detection: A Survey

This paper is the first known compilation of past work in automatic sarcasm detection, observing three milestones in the research so far: semi-supervised pattern extraction to identify implicit sentiment, use of hashtag-based supervision, and use of context beyond target text.

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, finds that it is possible to achieve comparable accuracy to direct subword training from raw sentences.