Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review
@article{Guo2013ActiveLI,
title={Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review},
author={Yufan Guo and Ilona Silins and Ulla Stenius and Anna Korhonen},
journal={Bioinformatics},
year={2013},
volume={29 11},
pages={
1440-7
}
}MOTIVATION
Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of…
Figures, Tables, and Topics from this paper
22 Citations
Unsupervised discovery of information structure in biomedical documents
- Computer ScienceBioinform.
- 2015
An unsupervised approach to IS analysis is investigated and the performance of several unsuper supervised methods on a large corpus of biomedical abstracts collected from PubMed is evaluated and it is demonstrated that un supervised learning brings novel insights into IS of biomedical literature and discovers information categories that are not present in any of the existing IS schemes.
Sequential Span Classification with Neural Semi-Markov CRFs for Biomedical Abstracts
- Computer ScienceFINDINGS
- 2020
This work proposes sequential span classification that assigns a rhetorical label, not to a single sentence but to a span that consists of continuous sentences, and introduces Neural Semi-Markov Conditional Random Fields to assign the labels to such spans by considering all possible spans of various lengths.
Contextual citation recommendation using scientific discourse annotation schemes
- Computer Science
- 2019
This thesis is vertebrated by this task: recommending contextually relevant citations to the author of a scientific paper, which is called Contextual Citation Recommendation (CCR), and frames CCR as an Information Retrieval task and evaluates the approach using existing publications.
A systematic review of automatic text summarization for biomedical literature and EHRs
- Computer ScienceJ. Am. Medical Informatics Assoc.
- 2021
It is found that current biomedical text summarization systems have achieved good performance using hybrid methods, and the majority of the works still focus on summarizing literature.
Unsupervised Declarative Knowledge Induction for Constraint-Based Learning of Information Structure in Scientific Documents
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2015
This model uses topic models to identify latent topics and their key linguistic features in input documents, induces constraints from this information and maps sentences to their dominant information structure categories through a constrained unsupervised model.
Evaluation of Scientific Elements for Text Similarity in Biomedical Publications
- Computer ScienceArgMining@ACL
- 2019
Comparison of the tools with two strong baselines shows that the predictions provided by the ArguminSci tool can support the use case of mining alternative methods for animal experiments.
Automatic Analysis of Arguments about Controversial Educational Topics in Web Documents
- Sociology
- 2014
Decision making in social communities, such as families, companies, or parties, builds on debates and discussions, where arguments on particular topics are exchanged. With this work, we contribute to…
Automatic zone identification in scientific papers via fusion techniques
- Computer ScienceScientometrics
- 2019
A two-level approach to zone identification within which the first level is in charge of classifying the sentences in a given paper based on some semantic and lexical features and the second level is responsible for applying fusion to the classification results obtained for consecutive sentences of the firstlevel in order to make the final decision.
A manual corpus of annotated main findings of clinical case reports
- Computer ScienceDatabase
- 2018
It is envisioned that case reports in PubMed may be automatically indexed by main finding, so that users can carry out information queries for specific main findings (rather than general topics)—and given one case report, a user can retrieve those having the most similar main findings.
Predicting protein-RNA interaction amino acids using random forest based on submodularity subset selection
- Biology, Computer ScienceComput. Biol. Chem.
- 2014
References
SHOWING 1-10 OF 53 REFERENCES
Weakly supervised learning of information structure of scientific abstracts - is it accurate enough to benefit real-world tasks in biomedicine?
- Computer ScienceBioinform.
- 2011
The results suggest that weakly supervised learning could be used to improve the practical usefulness of information structure for real-life tasks in biomedicine.
A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
- Computer ScienceBMC Bioinformatics
- 2010
It is shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine.
Automatic recognition of conceptualization zones in scientific articles and two life science applications
- Computer ScienceBioinform.
- 2012
The means to facilitate automatic access to the scientific discourse of articles by automating the recognition of 11 categories at the sentence level, which are called Core Scientific Concepts (CoreSCs), which provide the structure and context to all statements and relations within an article are presented.
A baseline feature set for learning rhetorical zones using full articles in the biomedical domain
- Computer ScienceSKDD
- 2005
This work presents results for several experiments in automatic zone identification on the ZAISA-1 dataset, a new dataset composed of full biomedical research papers hand-annotated for rhetorical zones, to provide a baseline feature set for modeling.
Using argumentation to extract key sentences from biomedical abstracts
- Computer ScienceInt. J. Medical Informatics
- 2007
Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users
- Computer ScienceBioinform.
- 2008
The issues involved in this task are discussed, the results strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice.
Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes
- Computer ScienceBioNLP@ACL
- 2010
This work takes three schemes of different type and granularity and investigates their applicability to biomedical abstracts, showing that even for the finest-grained of these schemes the majority of categories appear in abstracts and can be identified relatively reliably using machine learning.
The structural and content aspects of abstracts versus bodies of full text journal articles are different
- LinguisticsBMC Bioinformatics
- 2010
Aspects of structure and content differ markedly between article abstracts and article bodies, and a number of these differences may pose problems as the text mining field moves more into the area of processing full-text articles.
Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status
- Computer ScienceComputational Linguistics
- 2002
This article provides a gold standard for summaries of this kind consisting of a substantial corpus of conference articles in computational linguistics annotated with human judgments of the rhetorical status and relevance of each sentence in the articles.
Zone analysis in biology articles as a basis for information extraction
- Computer ScienceInt. J. Medical Informatics
- 2006







