# AnnoMathTeX - a formula identifier annotation recommender system for STEM documents

@article{Scharpf2019AnnoMathTeXA, title={AnnoMathTeX - a formula identifier annotation recommender system for STEM documents}, author={Philipp Scharpf and Ian Mackerracher and Moritz Schubotz and Joeran Beel and Corinna Breitinger and Bela Gipp}, journal={Proceedings of the 13th ACM Conference on Recommender Systems}, year={2019} }

Documents from science, technology, engineering and mathematics (STEM) often contain a large number of mathematical formulae alongside text. Semantic search, recommender, and question answering systems require the occurring formula constants and variables (identifiers) to be disambiguated. We present a first implementation of a recommender system that enables and accelerates formula annotation by displaying the most likely candidates for formula and identifier names from four different sources…

## Figures and Topics from this paper

## 13 Citations

Mathematics in Wikidata

- Computer ScienceWikidata@ISWC
- 2021

The current state, challenges, and discussions related to integrating Mathematical Entity Linking into Wikidata and Wikipedia are summarized and some data mining methods and applications of the mathematical information are outlined.

ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?

- Computer ScienceCLEF
- 2020

The ARQMath Task at CLEF 2020 aims to tackle the problem of linking newly posted questions from Math Stack Exchange (MSE) to existing ones that were already answered by the community, and several formula retrieval methods were explored.

Towards Explaining STEM Document Classification using Mathematical Entity Linking

- Computer ScienceArXiv
- 2021

First advances towards STEM document classification explainability using classical and mathematical Entity Linking are presented and it is indicated that mathematical entities have the potential to provide high explainability as they are a crucial part of a STEM document.

Mathematical Information Retrieval Trends and Techniques

- Computer ScienceAdvances in Computational Intelligence and Robotics
- 2021

This chapter discusses the recent advancement in formula-based search engines, various formula representation styles and indexing techniques, benefits of formula- based search engines in various future applications like plagiarism detection, math recommendation system, etc.

Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language

- Computer ScienceJCDL
- 2020

It is shown that the computer outperforms a human expert when classifying documents and the classification and clustering can be employed, e.g., for document search and recommendation.

Mathematical Formulae in Wikimedia Projects 2020

- Computer ScienceJCDL
- 2020

This poster summarizes the contributions to Wikimedia's processing pipeline for mathematical formulae and describes the plans to improve the accessibility and discoverability of mathematical knowledge in Wikimedia projects further.

An Analysis of Variable-Size Vector Based Approach for Formula Searching

- Mathematics, Computer ScienceCLEF
- 2020

Results have shown that the approach of variable size formula embedding requires significant improvement to retrieve the syntactically and semantically similar formula.

Data-Driven Recognition and Extraction of PDF Document Elements

- Computer ScienceTechnologies
- 2019

This paper proposes a system that forms the basis for structuring unstructured PDF documents, so that the identified document elements can subsequently be retrieved and analyzed with tailor-made approaches.

zbMATH Open: API Solutions and Research Challenges

- Computer ScienceDISCO@JCDL
- 2021

The current and future overview of the services offered by zbMATH are illustrated, the initial version of the zb MATH links API is presented, and potentials and limitations of the links API are analyzed based on the example of the NIST Digital Library of Mathematical Functions.

zbMATHOpen: API Solutions and Research Challenges

- 2021

We present zbMATH Open, the most comprehensive collection of reviews and bibliographic metadata of scholarly literature in mathematics. Besides our website zbMATH.org which is openly accessible since…

## References

SHOWING 1-10 OF 12 REFERENCES

Introducing MathQA - A Math-Aware Question Answering System

- Computer ScienceInformation Discovery and Delivery
- 2018

An open source math-aware Question Answering System based on Ask Platypus that returns as a single mathematical formula for a natural language question in English or Hindi that outperformed a commercial computational mathematical knowledge engine by 13 per cent.

Semantification of Identifiers in Mathematics for Better Math Information Retrieval

- Computer ScienceSIGIR
- 2016

This work learns namespace definitions by clustering the MLP results and mapping those clusters to subject classification schemata, and discovers that identifier namespaces improve the performance of automated identifier-definition extraction, and elevate it to a level that cannot be achieved within the document context alone.

Extracting Textual Descriptions of Mathematical Expressions in Scientific Papers

- Computer ScienceD Lib Mag.
- 2014

This study developed a method for automatic description extraction, whereby the problem was formulated as a binary classification by pairing each mathematical expression with its description candidates and classifying the pairs as correct or incorrect.

Faceted Search for Mathematics

- Computer ScienceLWA
- 2015

This paper describes one way of solving the faceted search problem in mathematics: by extracting recognizable formula schemata from a given set of formulae and using theseschemata to divide the initial set into formula classes.

Math Object Identifiers - Towards Research Data in Mathematics

- Computer ScienceLWDA
- 2017

MOIs constitute a very lightweight form of semantic annotation that can support many knowledge-based workflows in mathematics, e.g. classification of articles via the objects mentioned or object-based search.

Wikidata

- 2014

This collaboratively edited knowledgebase provides a common source of data for Wikipedia, and everyone else.

Wikidata: a free collaborative knowledgebase

- Computer ScienceCommun. ACM
- 2014

This collaboratively edited knowledgebase provides a common source of data for Wikipedia, and everyone else, to help improve the quality of the encyclopedia.

AnnoMathTeX - a Formula Identifier Annotation Recommender System for STEM Documents

- Proceedings of the 13th ACM Conference on Recommender Systems (RecSys
- 2019

- 2015

Extracting textual descriptions of mathematical expressions in scientifc papers. D-Lib Magazine

- 2014