Ashutosh Dixit

Learn More
Deep Web is content hidden behind HTML forms. Since it represents a large portion of the structured, unstructured and dynamic data on the Web, accessing Deep-Web content has been a long challenge for the database community. This paper describes a crawler for accessing Deep-Web using Ontologies. Performance evaluation of the proposed work showed that this(More)
WWW's expansion coupled with high change frequency of web pages poses a challenge for maintaining and fetching up-to-date information. The traditional crawling methods are no longer catch up with this updating and growing web. Alternative distributed crawling scheme that uses migrating crawlers try to maximize the network utilization by minimizing the(More)
The World Wide Web is a huge source of hyperlinked information contained in hypertext documents. Search engines use web crawlers to collect these documents from web for the purpose of storage and indexing. However, many of these documents contain dynamic information which gets changed on daily, weekly, monthly or yearly basis and hence we need to refresh(More)
Study reports that about 40% of current internet traffic and bandwidth consumption is due to the web crawlers that retrieve pages for indexing by the different search engines. As the size of the web continues to grow, searching it for useful information has become increasingly difficult. The centralized crawling techniques are unable to cope up with(More)
A general crawler downloads web pages that may be of any kind, thus forming a source of information for the search engine. Blog crawler is similar to a general crawler except that it restricts its crawl boundary to the blog space, thus downloading only the blog pages and ignoring rest of the web. Since blog is an emerging phenomenon and serve as very useful(More)
WWW is a decentralized, distributed and heterogeneous information resource. With increased availability of information through WWW, it is very difficult to read all documents to retrieve the desired results; therefore there is a need of summarization methods which can help in providing contents of a given document in a precise manner. Keywords of a document(More)
Due to the lack of efficient refresh techniques, current crawlers add unnecessary traffic to the already overloaded Internet. Frequency of visits to sites can be optimized by calculating refresh time dynamically. It helps in improving the effectiveness of the crawling system by efficiently managing the revisiting frequency of a website; and appropriate(More)
Question answering system can be seen as the next step in information retrieval, allowing users to pose question in natural language and receive compact answers. For the Question answering system to be successful, research has shown that the correct classification of question with respect to the expected answer type is requisite. We propose a novel(More)
WWW is a distributed heterogeneous information resource. With the exponential growth of WWW, it has become difficult to access desired information that matches with user needs and interest. In spite of strong crawling, indexing and page ranking techniques, the returned result-sets of the search engine lack in accuracy and preciseness. Large number of(More)