We describe a large-scale application of methods for finding plagiarism and self-plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors,… (More)
Motivated by preservation and resource discovery, we examine how digital resources, and not just metadata about resources, can be harvested using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). We review and critique existing techniques for identifying and gathering digital resources using metadata harvested through the OAI-PMH. We… (More)
Are the e-prints (electronic preprints) from the arXiv repository being used instead of the journal articles? In this paper we show that the e-prints have not undermined the usage of journal papers in the astrophysics community. As soon as the journal article is published, the astronomical community prefers to read the journal article and the use of… (More)
A variety of approaches have emerged in HCI that grapple with the ineffable, ill-defined, and idiosyncratic nature of aesthetic experience. The most straightforward approach is to transform the ineffable aspects of these experiences into precise representations, producing systems that are well-defined and testable but may miss the fullness of the… (More)
In this article I outline the ideas behind the Open Archives Initiative metadata harvesting protocol (OAIMH), and attempt to clarify some common misconceptions. I then consider how the OAIMH protocol can be used to expose and harvest metadata. Perl code examples are given as practical illustration.
The Open Archives Initiative (OAI) was created as a practical way to promote interoperability between eprint repositories. Although the scope of the OAI has been broadened, eprint repositories still represent a significant fraction of OAI data providers. In this article I present a brief survey of OAI eprint repositories, and of services using metadata… (More)
Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of ag-gregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web… (More)