Share This Author
Software Framework for Topic Modelling with Large Corpora
This work describes a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion, and implements several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation in a way that makes them completely independent of the training corpus size.
Gensim -- Statistical Semantics in Python
Gensim was created for large digital libraries, but its underlying algorithms for large-scale, distributed, online SVD and LDA are like the Swiss Army knife of data analysis---also useful on their own, outside of the domain of Natural Language Processing.
Indexing and Searching Mathematics in Digital Libraries - Architecture, Design and Scalability Issues
The design and architecture of the MIaS (Math Indexer and Searcher) system is presented, and the design decisions are discussed in detail.
The art of mathematics retrieval
The design and architecture of MIaS (Math Indexer and Searcher), a system for mathematics retrieval is presented, and design decisions are discussed. We argue for an approach based on Presentation…
Automated Classification and Categorization of Mathematical Knowledge
Results of machine learning of the MSC on full texts of papers in the mathematical digital libraries DML-CZ and NUMDAM show the F1- measure achieved on classification task of top-level MSC categories exceeds 89%.
Web Interface and Collection for Mathematical Retrieval :WebMIaS and MREC
The solution - the WebMIaS system - allows the retrieval of mathematical expressions written in TEX or MathML, a math aware search engine based on the state-of-the-art system Lucene, which implements proximity math indexing with a subformulae similarity search.
Gait Recognition from Motion Capture Data
Experiments on the CMU MoCap database show that the suggested method outperforms 13 relevant methods based on geometric features and a method to learn the features by a combination of Principal Component Analysis and Linear Discriminant Analysis.
Similarity Search for Mathematics: Masaryk University Team at the NTCIR-10 Math Task
This paper describes and summarizes experiences of Masaryk University team MIRMU with the mathematical search performed for the NTCIR pilot Math Task and shows that the system performs best using TeX queries that are translated to combined Presentation-Content MathML.
Math Indexer and Searcher under the Hood: Fine-tuning Query Expansion and Unification Strategies
This paper summarizes the experience of Math Information Retrieval team of Masaryk University with the NTCIR-12 MathIR arXiv Main Task and its subtasks and developed an evaluation platform based on NTCir-11 Math-2 Task relevance judgements.
DML-CZ: The Objectives and the First Steps
Digitalni knihovna DML-CZ by měla zajistit dostupnost a digitalni archivaci významne matematicke literatury publikovane v ceských zemich v dosavadni historii.