AxCell: Automatic Extraction of Results from Machine Learning Papers

  title={AxCell: Automatic Extraction of Results from Machine Learning Papers},
  author={Marcin Kardas and Piotr Czapla and Pontus Stenetorp and Sebastian Ruder and Sebastian Riedel and Ross Taylor and Robert Stojnic},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
Tracking progress in machine learning has become increasingly difficult with the recent explosion in the number of papers. In this paper, we present AxCell, an automatic machine learning pipeline for extracting results from papers. AxCell uses several novel components, including a table segmentation subtask, to learn relevant structural knowledge that aids extraction. When compared with existing methods, our approach significantly improves the state of the art for results extraction. We also… 

Figures and Tables from this paper

Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

This work tackles the problem of table extraction by exploiting Graph Neural Networks, enriched with suitably designed representation embeddings, and experimentally evaluated the proposed approach on a new dataset obtained by merging the information provided in the PubLayNet and PubTables-1M datasets.

Data augmentation on graphs for table type classification

This work addresses the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use, and proposes data augmentation techniques directly on the table graph structures.

ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science

This work exploits data-rich sources, such as tables within documents, and presents a new model concept that enables data extraction for chemical and physical properties with the ability to organize hierarchical data as nested information.

Task Definition and Integration For Scientific-Document Writing Support

This paper defines a series of tasks related to scientific-document writing that can be pipelined and evaluates the tasks of citation worthiness and citation recommendation as well as both of these tasks integrated, showing that the proposed approach is promising.

A New Neural Search and Insights Platform for Navigating and Organizing AI Research

An overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations are given.

STable: Table Generation Framework for Encoder-Decoder Models

A framework for text-to-table neural models applicable to problems such as extraction of line items, joint entity and relation extraction, or knowledge base population is proposed, which establishes state-of-the-art results on several challenging datasets.

Learning to Reason for Text Generation from Scientific Tables

SciGen is the first dataset that assesses the arithmetic reasoning capabilities of generation models on complex input structures, i.e., tables from scientific articles and their corresponding descriptions, and one of the main bottlenecks for this task is the lack of proper automatic evaluation metrics.

Automated Mining of Leaderboards for Empirical AI Research

This study investigates the problem of automated Leaderboard construction using state-of-the-art transformer models, viz.

KIETA: Key-insight extraction from scientific tables

Experiments show promising results that signal the possibility of an automated system, while also indicating limits of extracting knowledge from tables without any context.

DUE: End-to-End Document Understanding Benchmark

The Document Understanding Evaluation (DUE) benchmark consisting of both available and reformulated datasets is intro-duce to measure the end-to-end capabilities of systems in real-world scenarios to empower research in the NLP research community.



Identification of Tasks, Datasets, Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction

This model is a first step towards automatic leaderboard construction, e.g., in the NLP domain, aimed at automatically extracting task, dataset, metric and score from NLP papers, towards the automatic construction of leaderboards.

Universal Language Model Fine-tuning for Text Classification

This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model.

Table extraction using conditional random fields

Unlike HMMs, CRFs support the use of many rich and overlapping layout and language features, and as a result, they perform significantly better, and are compared with hidden Markov models (HMMs).

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.

TaPas: Weakly Supervised Table Parsing via Pre-training

TaPas is presented, an approach to question answering over tables without generating logical forms that outperforms or rivals semantic parsing models by improving state-of-the-art accuracy on SQA and performing on par with the state of theart on WikiSQL and WikiTQ, but with a simpler model architecture.

A framework for information extraction from tables in biomedical literature

This research is examining the methods for extracting numerical and textual information from tables in the clinical literature with an integrated approach for mining that would consider all complexities and challenges of a table.

fastai: A Layered API for Deep Learning

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides

TabVec: Table Vectors for Classification of Web Tables

There are hundreds of millions of tables in Web pages that contain useful information for many applications. Leveraging data within these tables is difficult because of the wide variety of

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Automated Early Leaderboard Generation from Comparative Tables

This work presents a new system to automatically discover and maintain leaderboards in the form of partial orders between papers, based on performance reported therein, and proposes a novel performance improvement graph with papers as nodes, where edges encode noisy performance comparison information extracted from tables.