The term "outlier" can generally be defined as an observation that is significantly different from the other values in a data set. The outliers may be instances of error or indicate events. The task of outlier detection aims at identifying such outliers in order to improve the analysis of data and further discover interesting and useful knowledge about unusual events within numerous applications domains. In this paper, we report on contemporary unsupervised outlier detection techniques for multiple types of data sets and provide a comprehensive taxonomy framework and two decision trees to select the most suitable technique based on data set. Furthermore, we highlight the advantages, disadvantages and performance issues of each class of outlier detection techniques under this taxonomy framework.
|Place of Publication||Enschede|
|Publisher||Centre for Telematics and Information Technology (CTIT)|
|Number of pages||40|
|Publication status||Published - 14 Nov 2007|
|Name||CTIT Technical Report Series|
|Publisher||Centre for Telematics and Information Technology, University of Twente|
- CAES-PS: Pervasive Systems
Zhang, Y., Meratnia, N., & Havinga, P. J. M. (2007). A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets. (CTIT Technical Report Series; No. Paper P-NS/TR-CTIT-07-79). Enschede: Centre for Telematics and Information Technology (CTIT).