Corpus ID: 237453543

MATE: Multi-view Attention for Table Transformer Efficiency

@inproceedings{Eisenschlos2021MATEMA,
  title={MATE: Multi-view Attention for Table Transformer Efficiency},
  author={Julian Martin Eisenschlos and Maharshi Gor and Thomas M{\"u}ller and William W. Cohen},
  booktitle={EMNLP},
  year={2021}
}
This work presents a sparse-attention Transformer architecture for modeling documents that contain large tables. Tables are ubiquitous on the web, and are rich in information. However, more than 20% of relational tables on the web have 20 or more rows (Cafarella et al., 2008), and these large tables present a challenge for current Transformer models, which are typically limited to 512 tokens. Here we propose MATE, a novel Transformer architecture designed to model the structure of web tables… Expand
Iterative Hierarchical Attention for Answering Complex Questions over Long Documents
TLDR
A new model that iteratively attends to different parts of long, heirarchically structured documents to answer complex questions, DOCHOPPER, which achieves state-of-the-art results on three of the datasets and is efficient at inference time, being 3–10 times faster than the baselines. Expand

References

SHOWING 1-10 OF 37 REFERENCES
DoT: An efficient Double Transformer for NLP tasks with tables
TLDR
This work proposes a new architecture, DoT, a double transformer model, that decomposes the problem into two sub-tasks: A shallow pruning transformer that selects the top-K tokens, followed by a deep task-specific transformer that takes as input those K tokens. Expand
Linformer: Self-Attention with Linear Complexity
TLDR
This paper demonstrates that the self-attention mechanism of the Transformer can be approximated by a low-rank matrix, and proposes a new self-Attention mechanism, which reduces the overall self-ATTention complexity from $O(n^2)$ to $O (n)$ in both time and space. Expand
Reformer: The Efficient Transformer
TLDR
This work replaces dot-product attention by one that uses locality-sensitive hashing and uses reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of several times, making the model much more memory-efficient and much faster on long sequences. Expand
Understanding tables with intermediate pre-training
TLDR
This work adapts TAPAS (Herzig et al., 2020), a table-based BERT model, to recognize entailment, and creates a balanced dataset of millions of automatically created training examples which are learned in an intermediate step prior to fine-tuning. Expand
Efficient Transformers: A Survey
TLDR
This paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models, providing an organized and comprehensive overview of existing work and models across multiple domains. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
TaPas: Weakly Supervised Table Parsing via Pre-training
TLDR
TaPas is presented, an approach to question answering over tables without generating logical forms that outperforms or rivals semantic parsing models by improving state-of-the-art accuracy on SQA and performing on par with the state of theart on WikiSQL and WikiTQ, but with a simpler model architecture. Expand
TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data
TLDR
TaBERT is a pretrained LM that jointly learns representations for NL sentences and (semi-)structured tables that achieves new best results on the challenging weakly-supervised semantic parsing benchmark WikiTableQuestions, while performing competitively on the text-to-SQL dataset Spider. Expand
Are Transformers universal approximators of sequence-to-sequence functions?
TLDR
It is established that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models. Expand
...
1
2
3
4
...