Domain Knowledge to Support Understanding and Treatment of Outliers

Abstract

The understanding and treatment of outliers is complex and non trivial in many data analysis and data mining exercises. It is not always done well. One approach that can be used in combination with others is to understand the domain of interest and use this knowledge to guide the data preparation and subsequent steps in terms of the treatment and interpretation of outliers. To demonstrate the approach proposed a study on web usage in the tertiary education sector is used. A particular issue that occurs in the sector, where widespread use of the World Wide Web occurs, is to monitor student’s web use in the environment. This is important in evaluating and improving teaching outcomes. Data mining techniques play a key role in analyzing student interaction as it is captured in Web logs. This paper considers the non-trivial task of data preparation and analysis of web data and in particular the treatment of outliers in this domain. Some conclusions on how to define an outlier in terms of the strategic aims of the particular analysis are made. Some general conclusions are made about how to classify outliers as noise or indicative indicators in a web environment. It is argued that the approach demonstrated can be applied across a range of domains and is a guide as to how the knowledge discovery task may be partially automated.

4 Figures and Tables

Cite this paper

@inproceedings{Redpath2006DomainKT, title={Domain Knowledge to Support Understanding and Treatment of Outliers}, author={Robert Redpath and J . I . Sheard}, year={2006} }