Learn More
Collaborative tagging systems—systems where many casual users annotate objects with free-form strings (tags) of their choosing—have recently emerged as a powerful way to label and organize large collections of data. During our recent investigation into these types of systems, we discovered a simple but remarkably effective algorithm for converting a large(More)
Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe to be the largest dataset from a social bookmarking site yet analyzed by(More)
Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from large-scale social bookmarking websites such as del.icio.us can be used as a complementary data source to page text and anchor text for improving automatic clustering of web pages. This paper(More)
In recent years, social Web sites have become important components of the Web. With their success, however, has come a growing influx of spam. If left unchecked, spam threatens to undermine resource sharing, interactivity, and openness. This article surveys three categories of potential countermeasures - those based on detection, demotion, and prevention.(More)
In this paper, we look at the "social tag prediction" problem. Given a set of objects, and a set of tags applied to those objects by users, can we predict whether a given tag could/should be applied to a particular object? We investigated this question using one of the largest crawls of the social bookmarking system del.icio.us gathered to date. For URLs in(More)
Tagging systems allow users to interactively annotate a pool of shared resources using descriptive tags. As tagging systems are gaining in popularity, they become more susceptible to <i>tag spam:</i> misleading tags that are generated in order to increase the visibility of some resources or simply to confuse users. We introduce a framework for modeling(More)
Tagging systems allow users to interactively annotate a pool of shared resources using descriptive strings called <i>tags</i>. Tags are used to guide users to interesting resources and help them build communities that share their expertise and resources. As tagging systems are gaining in popularity, they become more susceptible to <i>tag spam</i>:(More)
Social cataloging sites—tagging systems where users tag books—provide us with a rare opportunity to contrast tags to other information organization systems. We contrast tags to a controlled vocabulary, the Library of Congress Subject Headings, which has been developed over several decades. We find that many of the keywords designated by tags and LCSH are(More)