Corpus ID: 233481964

Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users

  title={Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users},
  author={Lucy Lu Wang and Isabel Cachola and Jonathan Bragg and Evie (Yu-Yen) Cheng and Chelsea Hess Haupt and Matt Latzke and Bailey Kuehl and Madeleine van Zuylen and Linda M. Wagner and Daniel S. Weld},
The majority of scientific papers are distributed in PDF, which pose challenges for accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of this problem by assessing the accessibility of 11,397 PDFs published 2010--2019 sampled across various fields of study, finding that only 2.4% of these PDFs satisfy all of our defined accessibility criteria. We introduce the SciA11y system to offset some of the issues around inaccessibility. SciA11y incorporates… Expand
SciA11y: Converting Scientific Papers to Accessible HTML
SciA11y uses machine learning models to extract and understand the content of scientific PDFs, and reorganizes the resulting paper components into a form that better supports skimming and scanning for blind and low vision readers. Expand
Auto-CORPus: A Natural Language Processing Tool for Standardising and Reusing Biomedical Literature
This work presents Auto-CORPus (Automated pipeline for Consistent Outputs from Research Publications), a novel NLP tool for the standardisation and conversion of publication HTML and table image files to three convenient machine-interpretable outputs to support biomedical text analytics. Expand
Incorporating Visual Layout Structures for Scientific Text Classification
This work introduces new methods for incorporating VIsual LAyout (VILA) structures, e.g., the grouping of page texts into text lines or text blocks, into language models to further improve performance and designs a hierarchical model, H-VILA, that encodes the text based on layout structures. Expand


Extracting Scientific Figures with Distantly Supervised Neural Networks
This paper induces high-quality training labels for the task of figure extraction in a large number of scientific documents, with no human intervention, and uses this dataset to train a deep neural network for end-to-end figure detection, yielding a model that can be more easily extended to new domains compared to previous work. Expand
Making the field of computing more inclusive
More accessible conferences, digital resources, and ACM SIGs will lead to greater participation by more people with disabilities.
S2ORC: The Semantic Scholar Open Research Corpus
In S2ORC, a large corpus of 81.1M English-language academic papers spanning many academic disciplines is introduced, which is expected to facilitate research and development of tools and tasks for text mining over academic text. Expand
An Uninteresting Tour Through Why Our Research Papers Aren't Accessible
The context in which PDFs became their publication format, the difficulty in making PDF documents accessible given current tools, what the authors have tried to make their PDFs more accessible, and potential options for doing better in the future are overviewed. Expand
Creating accessible PDFs for conference proceedings
The accessibility of 1,811 papers in the technical program of several top conferences related to accessibility and human-computer interaction and thoughts on research challenges and future work that may make the community's research more accessible are reported on. Expand
Web Content Accessibility Guidelines (WCAG) 2.0
Web Content Accessibility Guidelines (WCAG) 2.0 covers a wide range of recommendations for making Web content more accessible to a wider range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning disabilities, limited movement, and more. Expand
Web content accessibility guidelines
  • Interactions
  • 2001
Use of Ranks in One-Criterion Variance Analysis
Abstract Given C samples, with n i observations in the ith sample, a test of the hypothesis that the samples are from the same population may be made by ranking the observations from from 1 to Σn iExpand
A Formative Study on Designing Accurate and Natural Figure Captioning Systems
This work crawled, annotated, and analyzed a corpus of real-world human-written figure captions, showing that real- world captions usually consist of a finite set of caption units and that automatic figure captioning should be formulated as a multi-stage task. Expand
How science should support researchers with visual impairments.
Naheda Sahtout says being legally blind doesn’t fundamentally affect her skills, and argues that science needs to start a conversation to attract and empower more researchers like her. Naheda SahtoutExpand