Accessibility of information on the web

@article{Lawrence1999AccessibilityOI,
  title={Accessibility of information on the web},
  author={S. Lawrence and C. Lee Giles},
  journal={Nature},
  year={1999},
  volume={400},
  pages={107-107}
}
Search engines do not index sites equally, may not index new pages for months, and no engine indexes more than about 16% of the web. As the web becomes a major communications medium, the data on it must be made more accessible. 

Figures, Tables, and Topics from this paper

Next Generation Web Search: Setting Our Sites
TLDR
A new way to support task-based site search is to dynamically present appropriate metadata that organizes the search results and suggests what to look at next, as a personalized intermixing of search and hypertext. Expand
The indexable web is more than 11.5 billion pages
TLDR
The size of the public indexable web is estimated at 11.5 billion pages and the overlap and the index size of Google, MSN, Ask/Teoma and Yahoo are estimated. Expand
Next Generation Web Search : Setting Our Sites
The analysis of the hyperlink structure of the web has led to s ignificant improvements in web information retrieval. This survey describes two successful link analy sis algorithms and theExpand
Structured databases on the web
The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep Web" of searchable databses is...
Keeping up with the changing Web
TLDR
What "current" means for Web search engines and how often they must reindex the Web to keep current with its changing pages and structure are quantified. Expand
Web searching, search engines and Information Retrieval
TLDR
The challenges in indexing the World Wide Web, the user behaviour, and the ranking factors used by these engines are discussed, mainly of those that are based on the widely used link popularity measures. Expand
Web Information Retrieval Support Systems: The Future of Web Search
  • O. Hoeber
  • Computer Science
  • 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
  • 2008
TLDR
The goal of this paper is to provide an overview of some the key issues, challenges, and opportunities in WIRSS research. Expand
Database Selection for Longer Queries
TLDR
This paper focuses on the area of general-purpose search engines, such as Google and Yahoo, which have been attempting to index the whole Web and provide a search capability for all Web documents. Expand
On the Automatic Extraction of Data from the Hidden Web
An increasing amount of Web data is accessible only by filling out HTML forms to query an underlying data source. While this is most welcome from a user perspective (queries are easy and precise) andExpand
Context in Web Search
TLDR
Nextgeneration search engines will make increasing use of context information, either by using explicit or implicit context information from users, or by implementing additional functionality within restricted contexts. Expand
...
1
2
3
4
5
...

References

SHOWING 1-4 OF 4 REFERENCES
The MetaCrawler architecture for resource aggregation on the Web
TLDR
The paper discusses the MetaCrawler Softbot parallel Web search service that has been available at the University of Washington since June 1995 and has some sophisticated features that allow it to obtain results of much higher quality than simply regurgitating the output from each search service. Expand
Digital Libraries and Autonomous Citation Indexing
TLDR
Digital libraries incorporating ACI can help organize scientific literature and may significantly improve the efficiency of dissemination and feedback and speed the transition to scholarly electronic publishing. Expand
Evolutionary Dynamics of the World Wide Web
We present a theory for the growth dynamics of the World Wide Web that takes into account the wide range of stochastic growth rates in the number of pages per site, as well as the fact that new sitesExpand
Internet: Growth dynamics of the World-Wide Web
TLDR
It is found that web pages are distributed among sites according to a universal power law: many sites have only a few pages, whereas very few sites have hundreds of thousands of pages. Expand