Automatic document classification due to its various applications in data mining and information technology is one of the important topics in computer science. Classification plays a vital role in many information management and retrieval tasks. Document classification, also known as document categorization, is the process of assigning a document to one or(More)
Prevalent use of human language in computer systems and language oriented document processing, especially in the web, caused creating the need of designing mechanisms for developing natural language processing. In this paper, a survey in Natural Language Processing Laboratory has been done and a proposed framework for its development is suggested. Achieving(More)
Identifying topics and concepts associated with a set of documents is a critical task for information retrieval systems. One approach is to associate a query with a set of topics selected from a fixed ontology or vocabulary of terms. The core idea of this research is using Wikipedia articles and associated pages to make a topic ontology for this purpose.(More)
Statistical n-gram language modeling is applied in many domains like speech recognition, language identification, machine translation, character recognition and topic classification. Most language modeling approaches work on n-grams of words. In this paper, we employ language models classifier based on word level n-grams for Persian text classification. The(More)
Today the Internet in almost all ethnic groups and cultures is found and the Web pages are developing very quickly in most countries and different languages. Considering the size and incoherent available information in the Internet has made the use of search engines obvious and necessary. Since search engines pay less attention to the linguistics and(More)
Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have(More)
Accumulation of existing web documents on the Internet from one side and rapid changes of these pages and their exponential growth made their manually organizing and retrieving almost impossible. Therefore it is necessary to have a system that can automatically put these pages into the related classes to provide their results for the applied tools to be(More)
As the Internet includes millions of web pages for each and every search query, a fast retrieving of the desired and related information from the Web becomes very challenging subject. Automatic classification of web pages into relevant categories is an important and effective way to deal with the difficulty of retrieving information from the Internet. There(More)