CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications

  title={CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications},
  author={Kurt D. Bollacker and Steve Lawrence and C. Lee Giles},
  booktitle={AGENTS '98},
Research papers available on the World Wide Web (WWW or Web) areoften poorly organized, often exist in forms opaque to searchengines (e.g. Postscript), and increase in quantity daily.Significant amounts of time and effort are typically needed inorder to find interesting and relevant publications on the Web. Wehave developed a Web based information agent that assists the userin the process of performing a scientific literature search. Givena set of keywords, the agent uses Web search engines and… 

Figures from this paper

Autonomous citation matching

This work presents machine learning techniques that identify variant forms of citations to the same paper, and presents a number of algorithms that perform best and are sufficiently accurate for unassisted use in an autonomous citation indexing system.

Information Gathering System: Internet Navigation

A multi-agent cooperative information gathering system (CIGS) that assist different users to locate, retrieve and integrate information on the WWW and can be used to improve and enhance the Internet search engines' functionality, performance, and quality.

Automated gathering of Web information: An in-depth examination of agents interacting with search engines

This research provides a classification for information agent using stages of information gathering, gathering approaches, and agent architecture, and examines an implementation of one of the resulting classifications in detail, investigating how agents search for information on Web search engines, including the session, query, term, duration and frequency of interactions.

Complementing search engines with online web mining agents

  • F. Menczer
  • Computer Science
    Decis. Support Syst.
  • 2003

Information Retrieval on the World Wide Web and Active Logic: A Survey and Problem Definition

Recent advances in machine learning and crawling problems related to the web are surveyed, the continuum of supervised to semi-supervised to unsupervised learning problems are reviewed, the specific challenges which distinguish information retrieval in the hypertext domain are highlighted and a proposed Information Integration Environment is proposed.

Indexing and retrieval of scientific literature

This paper discusses the creation of digital libraries of scientific literature on the web, including the efficient location of articles, full-text indexing of the articles, autonomous citation indexing, information extraction, display of query-sensitive summaries and citation context, hubs and authorities computation.

PubSearch: a Web citation‐based retrieval system

PubSearch proposes a Web citation‐based retrieval system, known as PubSearch, for the retrieval of Web publications, which indexes Web publications based on citation indices and stores them into a Web Citation Database.

Mining a web citation database for document clustering

A mining process to extract document cluster knowledge from the Web Citation Database to support the retrieval of Web publications is proposed and incorporated into a citation-based retrieval system known as PubSearch for Web scientific publications.

SERGEANT: A framework for building more flexible web agents by exploiting a search engine

This paper proposes SERGEANT, a framework for building flexible web agents that handle imperfect situations, and exploits an information retrieval (IR) system as a general discovery tool to assist finding and pruning information.

Can Collective Use Help for Searching?

  • D. DichevaChristo Dichev
  • Computer Science
    2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery
  • 2011
Experimental results indicate that the algorithm exploiting meta-information about the documents provides a good approximation of the understanding of the contextual dependency of the notion of similarity.



CIFI: An Intelligent Agent for Citation Finding on The World-wide Web

CIFI is described, a rule-based agent which autonomously finds citations on the Web, using multiple search strategies, and multiple Web-based information sources, using the Lycos search engine.

ParaSite: Mining Structural Information on the Web

Syskill & Webert: Identifying Interesting Web Sites

The naive Bayesian classifier offers several advantages over other learning algorithms on this task and an initial portion of a web page is sufficient for making predictions on its interestingness substantially reducing the amount of network transmission required to make predictions.

The Institute of Scientific Information

1S1, a multinational corporation that provides a wide variety of information services to scientists throughout the world, is an unusual mix of scientific academia and commercial business, whose services reflect the information scientist’s concern with bibliographic subtleties and innovative methodology.

An adaptive Web page recommendation service

The Fab system strikes a balance between these two approaches, taking advantage of the shared interests among users without losing the benefits of the representations provided by content analysis.

ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery

This analysis highlights an interesting feature of the Web environment that bodes well for ARACH-NID's search methods and discusses the role played in both by user relevance feedback and unsupervised learning by individual agents.


It is shown that the use of bibliographic citations in addition to the normal keyword‐type indicators produces improved retrieval performance, and that in some circumstances, citations are more effective for retrieval purposes than other more conventional terms and concepts.

Exploiting learning technologies for World Wide Web agents

A generic Web browsing assistant; an agent which constructs personalised travel brochures; the application of clustering techniques to Web browser histories; and multi-agent support for adaptive user querying are discussed.

A Universal Citation Database as a Catalyst for Reform in Scholarly Communication

A universal, Internet-based, bibliographic and citation database would link every scholarly work ever written - no matter how published - to every work that it cites and every work that cites it.