• Corpus ID: 235755085

Topic Modeling in the Voynich Manuscript

  title={Topic Modeling in the Voynich Manuscript},
  author={Rachel Sterneck and Annie Polish and Claire Bowern},
This article presents the results of investigations using topic modeling of the Voynich Manuscript (Beinecke MS408). Topic modeling is a set of computational methods which are used to identify clusters of subjects within text. We use latent dirichlet allocation, latent semantic analysis, and nonnegative matrix factorization to cluster Voynich pages into ‘topics’. We then compare the topics derived from the computational models to clusters derived from the Voynich illustrations and from… 
1 Citations

The Linguistics of the Voynich Manuscript

The Voynich Manuscript is a fifteenth-century illustrated cipher manuscript. In this overview of recent approaches to the Voynich Manuscript, we summarize and evaluate current work on the language ...



A possible generating algorithm of the Voynich manuscript

The results support the so-called “hoax hypothesis,” i.e., interpretation of the text as a set of meaningless strings, and present a concrete text-generator algorithm (the “self-citation” process), easily executable without additional tools even by a medieval scribe.

Hoaxing statistical features of the Voynich Manuscript

The main unusual qualitative and quantitative features of the Voynich Manuscript are explicable as products of a low-technology hoax, with no need to invoke an undiscovered new type of code and/or the presence of meaningful text in the manuscript.

Latent Dirichlet Allocation

Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript

A framework for determining whether a text is compatible with a natural language and to which language it could belong is proposed, based on three types of statistical measurements obtained from first-order statistics of word properties in a text.

On the Voynich manuscript

The intriguing, multilateral statistical matches render the investigated sections of these two documents into the same linguistic universality class, suggesting that most likely the Voynich manuscript carries no rationally comprehensible content, offering also a plausible explanation why the ciphertext was unbreakable so far.

Visualizing Data using t-SNE

A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.

A proposed partial decoding of the Voynich script

The aim of the paper is to attempt to lay the groundwork for an eventual full decoding and complete decipherment of this fascinating document, by proposed partial decoding of the Voynich script.

The Voynich Manuscript - An Elegant Enigma

Abstract : In spite of all the papers that others have written about the manuscript, there is no complete survey of all the approaches, ideas, background information and analytic studies that have


  • G. Rugg
  • Computer Science
  • 2004
How sixteenth century cryptographic techniques can be adapted to generate text similar to that in the Voynich manuscript is described, which concludes that the hoax hypothesis is now a plausible explanation for the VoyNich manuscript.

A Statistical Approach to Mechanized Encoding and Searching of Literary Information

The problem of literature searching by machines still presents major difficulties and a statistical approach to this problem will be outlined and the various steps of a system based on this approach will be described.