Advancing Science through Mining Libraries, Ontologies, and Communities*

  title={Advancing Science through Mining Libraries, Ontologies, and Communities*},
  author={James A. Evans and A. Rzhetsky},
  journal={The Journal of Biological Chemistry},
  pages={23659 - 23666}
Life scientists today cannot hope to read everything relevant to their research. Emerging text-mining tools can help by identifying topics and distilling statements from books and articles with increased accuracy. Researchers often organize these statements into ontologies, consistent systems of reality claims. Like scientific thinking and interchange, however, text-mined information (even when accurately captured) is complex, redundant, sometimes incoherent, and often contradictory: it is… 

Figures from this paper

Exploiting Latent Features of Text and Graphs
This dissertation focuses on information available within biomedical science, including human-written abstracts of scientific papers, as well as machinegenerated graphs of biomedical entity relationships, and presents the Moliere system, and a deep-learning approach to hypothesis generation.
MOLIERE: Automatic Biomedical Hypothesis Generation System
This work model hypotheses using Latent Dirichlet Allocation applied on abstracts found near shortest paths discovered within a multi-modal and multi-relational network of biomedical objects extracted from several heterogeneous datasets from the National Center for Biotechnology Information (NCBI).
Predicting research trends with semantic and neural networks with an application in quantum physics
The development of a semantic network for quantum physics, denoted SemNet, is demonstrated using 750,000 scientific papers and knowledge from books and Wikipedia, which is used to predict future trends in research and to inspire personalized and surprising seeds of ideas in science.
Supersemantics for Knowledge Extraction
This thesis introduces Supersemantics as an approach to integrate different linguistic and other adjacent fields and bridges the boundaries between typical units of linguistics as well as external knowledge.
Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications
This study has developed and validated a data mining approach for extraction of text fragments containing description of bioassays and used it to evaluate compounds and their biological activity reported in scientific publications and found that categorization of papers into relevant and irrelevant may be performed based on the machine learning analysis of the abstracts.
Lost and found in behavioral informatics.
Text mining applications in psychiatry: a systematic literature review
Text mining approaches are becoming essential to facilitate the automated extraction of useful biomedical information from unstructured text, and it is demonstrated that TM can contribute to complex research tasks in psychiatry.
Tradition and Innovation in Scientists’ Research Strategies
By studying prizewinners in biomedicine and chemistry, it is shown that occasional gambles for extraordinary impact are a compelling explanation for observed levels of risky innovation.
Data analysis and data mining: current issues in biomedical informatics.
Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context and will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers.


Strategic Reading, Ontologies, and the Future of Scientific Publishing
How scientists use new forms of the “literature” and how the ascendance of novel computing technologies will combine to revolutionize the way scientific data is accessed, synthesized, and turned to practical use are reviewed.
Infotopia: How Many Minds Produce Knowledge
This book explores the human potential to pool widely dispersed information, and to use that knowledge to improve both our institutions and our lives. Various methods for aggregating information are
A translation approach to portable ontology specifications
This paper describes a mechanism for defining ontologies that are portable over representation systems, basing Ontolingua itself on an ontology of domain-independent, representational idioms.
Microparadigms: chains of collective reasoning in publications about molecular interactions.
It is found that published statements, regardless of their verity, tend to interfere with interpretation of the subsequent experiments and, therefore, can act as scientific "microparadigms," similar to dominant scientific theories.
How citation distortions create unfounded authority: analysis of a citation network
Citation is both an impartial scholarly method and a powerful form of social communication that can be used to generate information cascades resulting in unfounded authority of claims.
GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles
A system is presented that extracts and structures information about cellular pathways from the biological literature in accordance with a knowledge model that was developed earlier and implemented by modifying an existing medical natural language processing system.
Biomedical Discovery Acceleration, with Applications to Craniofacial Development
A novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data, is described and demonstrated on a large-scale gene expression array dataset relevant to craniofacial development.
The outcomes of pathway database computations depend on pathway ontology
Compared KEGG and BioCyc pathways are compared using genome context methods, which determine the functional relatedness of pairs of genes, supporting the conclusion that theBioCyc pathway conceptualization is closer to a single conserved biological process than is that of K EGG.