Ashutosh Dixit

Learn More
The World Wide Web is a huge source of hyperlinked information contained in hypertext documents. Search engines use web crawlers to collect these documents from web for the purpose of storage and indexing. However, many of these documents contain dynamic information which gets changed on daily, weekly, monthly or yearly basis and hence we need to refresh(More)
The World Wide Web is an interlinked collection of billions of documents. Ironically the very size of this collection has become an obstacle for information retrieval. The user has to sift through scores of pages to come upon the information he/she desires. Web crawlers are the heart of search engines. <b>Mercator</b> is a scalable, extensible web crawler,(More)
— The research that has been carried out on blogs focused on blog posts only, ignoring the title of the blog page. Also, in summarization only a set of representative sentences are extracted. Some analysis has been done and it has been found that the blog post contains the content that is likely to be related to the topic of the blog post. Thus, proposed(More)
Question answering system can be seen as the next step in information retrieval, allowing users to pose question in natural language and receive compact answers. For the Question answering system to be successful, research has shown that the correct classification of question with respect to the expected answer type is requisite. We propose a novel(More)
Web is a wide term which mainly consists of surface web and hidden web. One can easily access the surface web using traditional web crawlers, but they are not able to crawl the hidden portion of the web. These traditional crawlers retrieve contents from web pages, which are linked by hyperlinks ignoring the information hidden behind form pages, which cannot(More)
— Deep Web is content hidden behind HTML forms. Since it represents a large portion of the structured, unstructured and dynamic data on the Web, accessing Deep-Web content has been a long challenge for the database community. This paper describes a crawler for accessing Deep-Web using Ontologies. Performance evaluation of the proposed work showed that this(More)
Blogs are undoubtedly the richest source of information available in cyberspace. Blogs can be of various natures i.e. personal blogs which contain posts on mixed issues or blogs can be domain specific which contains posts on particular topics, this is the reason, they offer wide variety of relevant information which is often focused. A general search engine(More)
  • 1