Accessibility of information on the web

  title={Accessibility of information on the web},
  author={Steve Lawrence and C. Lee Giles},
Search engines do not index sites equally, may not index new pages for months, and no engine indexes more than about 16% of the web. As the web becomes a major communications medium, the data on it must be made more accessible. 
Link Analysis in Web Information Retrieval
This survey describes two successful link analysis algorithms and the state-of-the art of the field.
Web Search
The World Wide Web has become in a few years into the largest cultural endeavour of all times and can be seen as a vast, diverse, rapidly changing and unstructured database.
The indexable web is more than 11.5 billion pages
The size of the public indexable web is estimated at 11.5 billion pages and the overlap and the index size of Google, MSN, Ask/Teoma and Yahoo are estimated.
Keeping up with the changing Web
What "current" means for Web search engines and how often they must reindex the Web to keep current with its changing pages and structure are quantified.
Web searching, search engines and Information Retrieval
The challenges in indexing the World Wide Web, the user behaviour, and the ranking factors used by these engines are discussed, mainly of those that are based on the widely used link popularity measures.
Web Information Retrieval Support Systems: The Future of Web Search
  • O. Hoeber
  • Computer Science
    2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
  • 2008
The goal of this paper is to provide an overview of some the key issues, challenges, and opportunities in WIRSS research.
Database Selection for Longer Queries
This paper focuses on the area of general-purpose search engines, such as Google and Yahoo, which have been attempting to index the whole Web and provide a search capability for all Web documents.
Context in Web Search
Nextgeneration search engines will make increasing use of context information, either by using explicit or implicit context information from users, or by implementing additional functionality within restricted contexts.
On the Automatic Extraction of Data from the Hidden Web
An increasing amount of Web data is accessible only by filling out HTML forms to query an underlying data source. While this is most welcome from a user perspective (queries are easy and precise) and
Web impact factors and search engine coverage
The results indicate that search engine coverage, even of large national domains is extremely uneven and would be likely to lead to misleading calculations.


The MetaCrawler architecture for resource aggregation on the Web
The paper discusses the MetaCrawler Softbot parallel Web search service that has been available at the University of Washington since June 1995 and has some sophisticated features that allow it to obtain results of much higher quality than simply regurgitating the output from each search service.
Digital Libraries and Autonomous Citation Indexing
Digital libraries incorporating ACI can help organize scientific literature and may significantly improve the efficiency of dissemination and feedback and speed the transition to scholarly electronic publishing.
Evolutionary Dynamics of the World Wide Web
We present a theory for the growth dynamics of the World Wide Web that takes into account the wide range of stochastic growth rates in the number of pages per site, as well as the fact that new sites
Internet: Growth dynamics of the World-Wide Web
It is found that web pages are distributed among sites according to a universal power law: many sites have only a few pages, whereas very few sites have hundreds of thousands of pages.