Jebari Chaker

Learn More
With the increase of the number of web pages, it is very difficult to find wanted information easily and quickly out of thousands of web pages retrieved by a search engine. To solve this problem, many researches propose to classify documents according to their genre, which is another criteria to classify documents different from the topic. Most of these(More)
— This paper presents a new approach for automatic document categorization. Exploiting the logical structure of the document, our approach assigns a HTML document to one or more categories (thesis, paper, call for papers, email, …). Using a set of training documents, our approach generates a set of rules used to categorize new documents. The approach(More)
This paper proposes a new centroid-based approach to classify web pages by genre using character ngrams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages and the rapid evolution of web genres, our approach implements a multi-label and adaptive classification scheme in which web(More)
This paper presents a new method for genre identification that combines homogeneous classifiers using OWA (Ordered Weighted Averaging) operators. Our method uses character n-grams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages, we applied MLKNN as a multi-label classifier, in(More)