Thierson Couto

Learn More
Due to the increasing amount of information present on the Web, Automatic Document Classification (ADC) has become an important research topic. ADC usually follows a standard supervised learning strategy, where we first build a model using preclassified documents and then use it to classify new unseen documents. One major challenge for ADC in many scenarios(More)
It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we present a comparative study of digital library citations and Web(More)
In this work we propose a model to represent the web as a directed hypergraph (instead of a graph), where links connect pairs of disjointed sets of pages. The web hypergraph is derived from the web graph by dividing the set of pages into non-overlapping blocks and using the links between pages of distinct blocks to create hyperarcs. A hyperarc connects a(More)
BACKGROUND Cancer is a critical disease that affects millions of people and families around the world. In 2012 about 14.1 million new cases of cancer occurred globally. Because of many reasons like the severity of some cases, the side effects of some treatments and death of other patients, cancer patients tend to be affected by serious emotional disorders,(More)
Automatic document classification can be used to organize documents in a digital library, construct on-line directories, improve the precision of web searching, or help the interactions between user and search engines. In this paper we explore how linkage information inherent to different document collections can be used to enhance the effectiveness of(More)
In this work we propose a representation of the web as a directed hypergraph, instead of a graph, where links can connect not only pairs of pages, but also pairs of disjoint sets of pages. In our model, the web hypergraph is derived from the web graph by dividing the set of pages into non-overlapping blocks and using the links between pages of distinct(More)
This paper addresses the problem of automatically learning to classify texts by exploiting information derived from meta-level features (i.e., features derived from the original bag-of-words representation). We propose new meta-level features derived from the class distribution, the entropy and the within-class cohesion observed in the k nearest neighbors(More)
The unprecedented growth of available data nowadays has stimulated the development of new methods for organizing and extracting useful knowledge from this immense amount of data. Automatic Document Classification (ADC) is one of such methods, that uses machine learning techniques to build models capable of automatically associating documents to well-defined(More)