Using machine learning to disentangle homonyms in large text corpora

  title={Using machine learning to disentangle homonyms in large text corpora},
  author={Uri Roll and Ricardo A Correia and Oded Berger‐Tal},
  journal={Conservation Biology},
Systematic reviews are an increasingly popular decision-making tool that provides an unbiased summary of evidence to support conservation action. These reviews bridge the gap between researchers and managers by presenting a comprehensive overview of all studies relating to a particular topic and identify specifically where and under which conditions an effect is present. However, several technical challenges can severely hinder the feasibility and applicability of systematic reviews, for… 
Systematic Homonym Detection and Replacement Based on Contextual Word Embedding
A novel approach for the detection of homonyms based on contextual word embedding that allows a word to be understood based on the context in which it appears is proposed.
Nomenclature instability in species culturomic assessments: Why synonyms matter
Abstract Culturomics is an emerging area of study that explores human culture through the quantitative analysis of large digital bodies of text. Culturomics shows great potential for the study of
revtools: bibliographic data visualization for evidence synthesis in R
‘revtools’, an R package for exploratory investigation of bibliographic data during reviews and evidence syntheses, provides tools for the import and de-duplication of biblographic data formats, and cluster analysis and visualization of article titles, abstracts and keywords using topics models.
A season for all things: Phenological imprints in Wikipedia usage and their relevance to conservation
It is shown that seasonality plays an important role in how and when people interact with plants and animals online, and seasonality is significantly more prevalent in pages for plants and animal than it is in a random selection of Wikipedia articles.
pyResearchInsights—An open‐source Python package for scientific text analysis
Abstract With an increasing number of scientific articles published each year, there is a need to synthesize and obtain insights across ever‐growing volumes of literature. Here, we present
revtools: An R package to support article screening for evidence synthesis.
  • M. Westgate
  • Medicine, Computer Science
    Research synthesis methods
  • 2019
Revtools is presented, an R package to support article screening during evidence synthesis projects, which provides tools for the import and de-duplication of bibliographic data, screening of articles by title or abstract, and visualization of article content using topic models.
Inferring public interest from search engine data requires caution
Front Ecol Environ doi:10.1002/fee.2048 © The Ecological Society of America of an increase in news media attention toward climatechange topics during these periods (Legagneux et al. 2018). Overall,
A season for all things
Phenology plays an important role in many human–nature interactions, but these seasonal patterns are often overlooked in conservation. Here, we provide the first broad exploration of seasonal
Identifying Knowledge Gaps Using a Graph-based Knowledge Representation
Schmidt, Daniel P. M.S., Department of Computer Science and Engineering, Wright State University, 2020. Identifying Knowledge Gaps Using a Graph-based Knowledge Representation. Knowledge integration
An analytical study of information extraction from unstructured and multidimensional big data
This research work addresses the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data and presents a systematic literature review of state-of-the-art techniques for a variety of big data.


The difficulties of systematic reviews.
  • M. Westgate, D. Lindenmayer
  • Computer Science, Medicine
    Conservation biology : the journal of the Society for Conservation Biology
  • 2017
Linguistics provides a unifying framework for understanding some key challenges of systematic review and highlights 2 useful directions for future research.
Applications of text mining within systematic reviews.
It is concluded that text mining technologies do have the potential to assist at various stages of the review process, however, they are relatively unknown in the systematic reviewing community, and substantial evaluation and methods development are required before their possible impact can be fully assessed.
Text analysis tools for identification of emerging topics and research gaps in conservation science.
This work shows how a common text-mining method (latent Dirichlet allocation, or topic modeling) and statistical tests familiar to ecologists can be used to investigate trends and identify potential research gaps in the scientific literature, increasing scientists' capacity for research synthesis.
Using text mining for study identification in systematic reviews: a systematic review of current approaches
Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in ‘live’ reviews, and the use of text mining as a ‘second screener’ may also be used cautiously.
Word Sense Disambiguation in the Biomedical Domain: An Overview
The current state of research in word sense disambiguation (WSD) is reviewed and the current direction of research points towards statistically based algorithms that use existing curated data and can be applied to large sets of biomedical literature.
Conservation culturomics
W are symbolic representations of concepts, places, or objects (Carlston 2013). Thus, the frequency with which words and phrases are used within a language provides information about their cultural
Text mining for market prediction: A systematic review
A comparative analysis of the systems based on market prediction based on online-text-mining expands onto the theoretical and technical foundations behind each and should help the research community to structure this emerging field and identify the exact aspects which require further research and are of special significance.
Machine Learning Methods Without Tears: A Primer for Ecologists
An introduction to three machine learning approaches that can be broadly used by ecologists: classification and regression trees, artificial neural networks, and evolutionary computation.
Automated content analysis: Addressing the big literature challenge in ecology and evolution
The goal is to introduce ecologists and evolutionary biologists to Automated Content Analysis and illustrate its capacity to synthesize overwhelming volumes of literature, and to fill an important methodological gap and to therefore contribute to the advancement of ecological and evolutionary research.
Using network science and text analytics to produce surveys in a scientific topic
A network-based methodology combined with text analytics to construct the taxonomy of science fields and the identification of two well-defined communities in PC area is highlighted, which is consistent with the known existence of two distinct communities of researchers in the area.