Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review

@article{Guo2013ActiveLI,
  title={Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review},
  author={Yufan Guo and Ilona Silins and Ulla Stenius and Anna Korhonen},
  journal={Bioinformatics},
  year={2013},
  volume={29 11},
  pages={
          1440-7
        }
}
MOTIVATION Techniques that are capable of automatically analyzing the information structure of scientific articles could be highly useful for improving information access to biomedical literature. However, most existing approaches rely on supervised machine learning (ML) and substantial labeled data that are expensive to develop and apply to different sub-fields of biomedicine. Recent research shows that minimal supervision is sufficient for fairly accurate information structure analysis of… 
Unsupervised discovery of information structure in biomedical documents
TLDR
An unsupervised approach to IS analysis is investigated and the performance of several unsuper supervised methods on a large corpus of biomedical abstracts collected from PubMed is evaluated and it is demonstrated that un supervised learning brings novel insights into IS of biomedical literature and discovers information categories that are not present in any of the existing IS schemes.
Sequential Span Classification with Neural Semi-Markov CRFs for Biomedical Abstracts
TLDR
This work proposes sequential span classification that assigns a rhetorical label, not to a single sentence but to a span that consists of continuous sentences, and introduces Neural Semi-Markov Conditional Random Fields to assign the labels to such spans by considering all possible spans of various lengths.
Contextual citation recommendation using scientific discourse annotation schemes
TLDR
This thesis is vertebrated by this task: recommending contextually relevant citations to the author of a scientific paper, which is called Contextual Citation Recommendation (CCR), and frames CCR as an Information Retrieval task and evaluates the approach using existing publications.
A systematic review of automatic text summarization for biomedical literature and EHRs
TLDR
It is found that current biomedical text summarization systems have achieved good performance using hybrid methods, and the majority of the works still focus on summarizing literature.
Unsupervised Declarative Knowledge Induction for Constraint-Based Learning of Information Structure in Scientific Documents
TLDR
This model uses topic models to identify latent topics and their key linguistic features in input documents, induces constraints from this information and maps sentences to their dominant information structure categories through a constrained unsupervised model.
Evaluation of Scientific Elements for Text Similarity in Biomedical Publications
TLDR
Comparison of the tools with two strong baselines shows that the predictions provided by the ArguminSci tool can support the use case of mining alternative methods for animal experiments.
Automatic Analysis of Arguments about Controversial Educational Topics in Web Documents
Decision making in social communities, such as families, companies, or parties, builds on debates and discussions, where arguments on particular topics are exchanged. With this work, we contribute to
Automatic zone identification in scientific papers via fusion techniques
TLDR
A two-level approach to zone identification within which the first level is in charge of classifying the sentences in a given paper based on some semantic and lexical features and the second level is responsible for applying fusion to the classification results obtained for consecutive sentences of the firstlevel in order to make the final decision.
A manual corpus of annotated main findings of clinical case reports
TLDR
It is envisioned that case reports in PubMed may be automatically indexed by main finding, so that users can carry out information queries for specific main findings (rather than general topics)—and given one case report, a user can retrieve those having the most similar main findings.
...
1
2
3
...

References

SHOWING 1-10 OF 53 REFERENCES
Weakly supervised learning of information structure of scientific abstracts - is it accurate enough to benefit real-world tasks in biomedicine?
TLDR
The results suggest that weakly supervised learning could be used to improve the practical usefulness of information structure for real-life tasks in biomedicine.
A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
TLDR
It is shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine.
Automatic recognition of conceptualization zones in scientific articles and two life science applications
TLDR
The means to facilitate automatic access to the scientific discourse of articles by automating the recognition of 11 categories at the sentence level, which are called Core Scientific Concepts (CoreSCs), which provide the structure and context to all statements and relations within an article are presented.
A baseline feature set for learning rhetorical zones using full articles in the biomedical domain
TLDR
This work presents results for several experiments in automatic zone identification on the ZAISA-1 dataset, a new dataset composed of full biomedical research papers hand-annotated for rhetorical zones, to provide a baseline feature set for modeling.
Using argumentation to extract key sentences from biomedical abstracts
Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users
TLDR
The issues involved in this task are discussed, the results strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice.
Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes
TLDR
This work takes three schemes of different type and granularity and investigates their applicability to biomedical abstracts, showing that even for the finest-grained of these schemes the majority of categories appear in abstracts and can be identified relatively reliably using machine learning.
The structural and content aspects of abstracts versus bodies of full text journal articles are different
TLDR
Aspects of structure and content differ markedly between article abstracts and article bodies, and a number of these differences may pose problems as the text mining field moves more into the area of processing full-text articles.
Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status
TLDR
This article provides a gold standard for summaries of this kind consisting of a substantial corpus of conference articles in computational linguistics annotated with human judgments of the rhetorical status and relevance of each sentence in the articles.
...
1
2
3
4
5
...