• Publications
  • Influence
Detecting In-line Mathematical Expressions in Scientific Documents
TLDR
Although the method is naive and uses a small amount of annotated training data, the method achieved an 88.95% F-measure compared with 22.81% for existing math OCR software. Expand
An Evaluation Dataset for Identifying Communicative Functions of Sentences in English Scholarly Papers
TLDR
To show the usefulness of the dataset, a series of experiments were conducted that determined to what extent sentence representations acquired by recent models, such as word2vec and BERT, can be employed to detect communicative functions in sentences. Expand
Extraction and Evaluation of Formulaic Expressions Used in Scholarly Papers
TLDR
A new approach that is robust to variation of spans and forms of formulaic expressions is proposed and a new extraction method that utilises named entities and dependency structures to remove the non-formulaic part from a sentence is proposed. Expand
Using Formulaic Expressions in Writing Assistance Systems
TLDR
This work proposes a new framework for semantic searches of FEs and a new method to leverage both existing dictionaries and domain sentence corpora, and expands an existing FE dictionary to consider building a more comprehensive and domain-specific FE dictionary. Expand
Extraction of Formulaic Expressions from Scientific Papers
TLDR
This paper proposes a sentence-level FE extraction method in which the CFs are taken into account and is compared to existing methods to demonstrate that it is better at CForiented FEs. Expand
Towards Interactive Information Access based on Document Structures
A framework is examined, in which the users interactively access documents, like scientific papers, with a physical structure appearing in the layout and a logical structure based on their contents.Expand
Communicative-Function-Based Sentence Classification for Construction of an Academic Formulaic Expression Database
TLDR
This study considers a fully automated construction of a CF-labelled FE database using the top–down approach, in which the CF labels are first assigned to sentences, and then the FEs are extracted. Expand