• Publications
  • Influence
Mining the peanut gallery: opinion extraction and semantic classification of product reviews
This work develops a method for automatically distinguishing between positive and negative reviews and draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation. Expand
Face recognition: a convolutional neural-network approach
A hybrid neural-network for human face recognition which compares favourably with other methods and analyzes the computational complexity and discusses how new classes could be added to the trained recognizer. Expand
CiteSeer: an automatic citation indexing system
CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations. Expand
Efficient identification of Web communities
A focused crawler that crawls to a depth can approximate community membership by augmenting the graph induced by the cra wl with links to a virtual sink node. Expand
Accessibility of information on the web
As the web becomes a major communications medium, the data on it must be made more accessible, and search engines need to make the data more accessible. Expand
Digital Libraries and Autonomous Citation Indexing
Digital libraries incorporating ACI can help organize scientific literature and may significantly improve the efficiency of dissemination and feedback and speed the transition to scholarly electronic publishing. Expand
Focused Crawling Using Context Graphs
A focused crawling algorithm is presented that builds a model for the context within which topically relevant pages occur on the web that can capture typical link hierarchies within which valuable pages occur, as well as model content on documents that frequently cooccur with relevant pages. Expand
Free online availability substantially increases a paper's impact
  • S. Lawrence
  • Political Science, Medicine
  • Nature
  • 31 May 2001
The results are dramatic, showing a clear correlation between the number of times an article is cited and the probability that the article is online, in computer science. Expand
Collaborative Filtering by Personality Diagnosis: A Hybrid Memory and Model-Based Approach
This work describes and evaluates a new method called personality diagnosis (PD), which compute the probability that a user is of the same "personality type" as other users, and, in turn, the likelihood that he or she will like new items. Expand
Winners don't take all: Characterizing the competition for links on the web
A simple generative model quantifies the degree to which the rich nodes grow richer, and how new (and poorly connected) nodes can compete, and accurately accounts for the true connectivity distributions of category-specific web pages, the web as a whole, and other social networks. Expand