A framework for information extraction from tables in biomedical literature

@article{Milosevic2019AFF,
  title={A framework for information extraction from tables in biomedical literature},
  author={Nikola Milosevic and Cassie Gregson and Robert Hernandez and G. Nenadic},
  journal={International Journal on Document Analysis and Recognition (IJDAR)},
  year={2019},
  volume={22},
  pages={55-78}
}
The scientific literature is growing exponentially, and professionals are no more able to cope with the current amount of publications. Text mining provided in the past methods to retrieve and extract information from text; however, most of these approaches ignored tables and figures. The research done in mining table data still does not have an integrated approach for mining that would consider all complexities and challenges of a table. Our research is examining the methods for extracting… Expand
Auto-CORPus: Automated and Consistent Outputs from Research Publications
TLDR
An automated pipeline that cleans HTML files from biomedical literature using the Auto-CORPus package and developed a model to standardize the section headers based on the Information Artifact Ontology. Expand
Opportunities and challenges of text mining in aterials research
TLDR
This review is directed at the broad class of researchers aiming to learn the fundamentals of TM as applied to the materials science publications. Expand
Opportunities and challenges of text mining inmaterials research
Research publications are the major repository of scientific knowledge. However, their unstructured and highly heterogenous format creates a significant obstacle to large-scale analysis of theExpand
A Structure-Based Method for Building a Database of Extracted Figures from Scientific Documents: A Case Study of Iran Scientific Information Database (GANJ)
TLDR
A structure based method is proposed that extracts the figures and their descriptions by analyzing the file layout and is saved in a database with a specific structure and is indexed for retrieval in the search engine. Expand
CREGEX: A Biomedical Text Classifier Based on Automatically Generated Regular Expressions
TLDR
CREGEX (Classifier Regular Expression), a biomedical text classifier based on an automatically generated regular-expressions-based feature space, which outperformed both the SVM and NB classifiers in terms of accuracy and F-measure but used a fewer amount of training examples to achieve the same performance. Expand
Table understanding approaches for extracting knowledge from heterogeneous tables
TLDR
This survey is to provide a comprehensive analysis of the research efforts so far devoted to the problem of table understanding and to describe systems that support the transformation of heterogeneous tables into meaningful information. Expand
Automatic Information Extraction and Inferencing System from Online News Sources for Substance Abuse Cases
The rising number of substance abuse cases is a serious situation that demands significant attention. Gaining insights from the reported substance abuse cases will greatly help law enforcementExpand
Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML and PDF Documents: Improving Data Access and Visualization for Veterinarians
TLDR
A data-mining method for automatically extracting rapid assay data from electronic documents that includes a software package module, a developed pattern recognition tool, and a data mining engine is developed. Expand
Change in Format, Register and Narration Style in the Biomedical Literature: A 1948 Example
TLDR
The present commentary set out to review a 1948 scientific report by I.L. Bennett Jr, entitled “A study on the relationship between the fevers caused by bacterial pyrogens and by the intravenous injection of the sterile exudates of acute inflammation”, which appeared in the Journal of Experimental Medicine in September 1948. Expand
AxCell: Automatic Extraction of Results from Machine Learning Papers
TLDR
AxCell, an automatic machine learning pipeline for extracting results from papers using several novel components, including a table segmentation subtask, to learn relevant structural knowledge that aids extraction significantly improves the state of the art for results extraction. Expand
...
1
2
...

References

SHOWING 1-10 OF 56 REFERENCES
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program
The UMLS Metathesaurus, the largest thesaurus in the biomedical domain, provides a representation of biomedical knowledge consisting of concepts classified by semantic type and both hierarchical andExpand
A Scalable Hybrid Approach for Extracting Head Components from Web Tables
TLDR
A preprocessing method for determining the meaningfulness of a table to allow for information extraction from tables on the Internet and obtained an F-measure of 95.0 percent in distinguishing meaningful tables from decorative tables and an accuracy of 82.1 percent in extracting the table head from the meaningful tables. Expand
Table extraction for answer retrieval
TLDR
To retrieve answers, the approach creates a cell document, which contains the cell and its metadata (headers, titles) for each table cell, and the retrieval model ranks the cells of the extracted tables using a language-modeling approach. Expand
Converting and Annotating Quantitative Data Tables
TLDR
New disambiguation strategies based on an ontology are introduced, which allows to improve performance on "sloppy" datasets not yet targeted by existing systems. Expand
Automating the extraction of data from HTML tables with unknown structure
TLDR
Experimental results show that the solution entails elements of table understanding, data integration, and wrapper creation and can successfully locate data of interest in tables and map the data from source HTML tables with unknown structure to a given target database schema. Expand
Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles
TLDR
This study presents a two-step approach by combining table classification and relationship extraction to extract drug-SE pairs from a large number of high-profile oncological full-text articles, and systematically analyzed relationships between anti-cancer drug-associated side effects and drug- associated gene targets, metabolism genes, and disease indications. Expand
Learning Table Extraction from Examples
TLDR
A new approach to automated table extraction that exploits formatting cues in semi-structured HTML tables, learns lexical variants from training examples and uses a vector space model to deal with non-exact matches among labels is presented. Expand
The Unified Medical Language System (UMLS): integrating biomedical terminology
TLDR
The Unified Medical Language System is a repository of biomedical vocabularies developed by the US National Library of Medicine and includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap). Expand
Extraction of Named Entities from Tables in Gene Mutation Literature
TLDR
This work investigates the challenge of extracting information about genetic mutations from tables, and shows how classifying tabular information can be leveraged for the task of named entity detection for mutations. Expand
A machine learning based approach for table detection on the web
TLDR
A machine learning based approach to classify each given table entity as either genuine or non-genuine, and designed a novel web document table ground truthing protocol and used it to build a large table ground truth database. Expand
...
1
2
3
4
5
...