Eduardo F. A. Silva

Learn More
Information Extraction (IE) aims to extract from textual documents only the relevant data required by the user. In this paper, we propose a hybrid machine learning approach for IE on semi-structured texts that combines conventional text classification techniques and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an(More)
This paper proposes a new method for binarization of digital documents. The proposed approach performs binarization by using a heuristic algorithm with two different thresholds and the combination of the thresholded images. The method is suitable for binarization of complex background document images. In experiments, it obtained better results than(More)
The fast growth of electronic text collections (in particular, the Web) and the diversity of available documents immensely increased the difficulty to retrieve relevant documents in an efficient way. A variety of Web search engines have been built to help users in this task. These systems, however, lack precision in the retrieved documents. Different(More)
In this paper, we propose a hybrid machine learning approach to Information Extraction by combining conventional text classification techniques and Hidden Markov Models (HMM). A text classifier generates a (locally optimal) initial output, which is refined by an HMM, providing a globally optimal classification. The proposed approach was evaluated in two(More)
Information Extraction (IE) aims to extract from textual documents only the fragments which correspond to data fields required by the user. In this paper, we present new experiments evaluating a hybrid machine learning approach for IE that combines text classifiers and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an(More)
  • 1