Applying Data Mining Techniques to Improve Breast Cancer Diagnosis
OBJECTIVE To identify, with the assistance of computational techniques, rules concerning the conditions of the physical environment for the classification of risk micro-areas. METHODS Exploratory research carried out in Curitiba, Southern Brazil, in 2007. It was divided into three phases: the identification of attributes to classify a micro-area; the construction of a database; and the process of discovering knowledge in a database through the use of data mining. The set of attributes included the conditions of infrastructure; hydrography; soil; recreation area; community characteristics; and existence of vectors. The database was constructed with data obtained in interviews by community health workers using questionnaires with closed-ended questions, developed with the essential attributes selected by specialists. RESULTS There were 49 attributes identified, 41 of which were essential and eight irrelevant. There were 68 rules obtained in the data mining, which were analyzed through the perspectives of performance and quality and divided into two sets: the inconsistent rules and the rules that confirm the knowledge of experts. The comparison between the groups showed that the rules that confirm the knowledge, despite having lower computational performance, were considered more interesting. CONCLUSIONS The data mining provided a set of useful and understandable rules capable of characterizing risk areas based on the characteristics of the physical environment. The use of the proposed rules allows a faster and less subjective area classification, maintaining a standard between the health teams and overcoming the influence of individual perception by each team member.