Mapping subsets of scholarly information

  title={Mapping subsets of scholarly information},
  author={Paul H. Ginsparg and Paul Houle and Thorsten Joachims and Jae Hoon Sul},
  journal={Proceedings of the National Academy of Sciences of the United States of America},
  pages={5236 - 5240}
  • P. GinspargP. Houle J. Sul
  • Published 11 December 2003
  • Computer Science
  • Proceedings of the National Academy of Sciences of the United States of America
We illustrate the use of machine learning techniques to analyze, structure, maintain, and evolve a large online corpus of academic literature. An emerging field of research can be identified as part of an existing corpus, permitting the implementation of a more coherent community structure for its practitioners. 

Figures and Tables from this paper

Usage bibliometrics

The state-of-the-art in usage-based informetric, i.e. the use of usage data to study the scholarly process, is reviewed.

Scholarly information network

The use of machine learning techniques to analyze, structure, maintain, and evolve a large online corpus of academic literature, and it is illustrated how these could provide not only more efficient means of accessing and navigating the information, but also more cost-effective means of authentication and quality control.

Differentiating, describing, and visualizing scientific space: A novel approach to the analysis of published scientific abstracts

This paper will develop and demonstrate a novel method for analyzing scientific indexes called Latent Semantic Differentiation, able to identify the dominant themes, cluster the articles accordingly, visualize the results, and provide a qualitative description of each cluster.

A note concerning primary source knowledge

It is concluded that not even the arXiv filters, which are otherwise successful in filtering fringe‐topic papers, can fully acquire “Domain‐Specific Discrimination” and thus distinguish technical papers that are taken seriously by an expert community from those that are not.

Managing Knowledge in Light of Its Evolution Process: An Empirical Study on Citation Network-Based Patent Classification

This study focuses on the process of knowledge evolution and proposes to incorporate this perspective into knowledge management tasks and introduces a labeled citation graph kernel to classify patents under a kernel-based machine learning framework.

Automating the Horae: Boundary-work in the age of computers

This article describes the intense software filtering that has allowed the arXiv e-print repository to sort and process large numbers of submissions with minimal human intervention, making it one of

Visual overviews for discovering key papers and influences across research fronts

A novel network-visualization tool based on meaningful layouts of nodes to present research fronts and show citation links that indicate influences across research fronts is applied.

A text visualization method for cross-domain research topic mining

This study investigates the evolution of cross-domain topics of three interdisciplinary research domains and uses a visual analytic approach to determine unique topics for each domain and a hierarchical topic model is adopted to extract Topics of three different domains and to correlate the extracted topics.



Discovering Informative Patterns and Data Cleaning

A method for discovering informative patterns from data that can be reduced to only a few representative data entries and an attractive candidate for new applications in knowledge discovery is presented.

Mapping knowledge domains

  • R. ShiffrinK. Börner
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2004
The Arthur M. Sackler Colloquium on Mapping Knowledge Domains was designed to showcase the ongoing developments in this transformation and provide pointers toward the directions it will move.

Creating a global knowledge network

Key questions raised by the past decade of initial experience with new forms of electronic research infrastructure are suggested, with more complete answers expected on the 5-10 year timescale.

CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories

The Construe news story categorization system assigns indexing terms to news stories according to their content using knowledge-based techniques and Reuters expects the speed and consistency of TIS to provide significant competitive advantage and, hence, an increased market share for Country Reports and other products from Reuters Historical Information Products Division.

Text Categorization with Support Vector Machines: Learning with Many Relevant Features

This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are

Inductive learning algorithms and representations for text categorization

A comparison of the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy is compared.

Finding scientific topics

  • T. GriffithsM. Steyvers
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 2004
A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.

A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.

Advances in Large Margin Classifiers

This book provides an overview of recent developments in large margin classifiers, examines connections with other methods, and identifies strengths and weaknesses of the method, as well as directions for future research.