Sarah M. Erfani

Learn More
High-dimensional problem domains pose significant challenges for anomaly detection. The presence of irrelevant features can conceal the presence of anomalies. This problem, known as the ‘curse of dimensionality’, is an obstacle for many anomaly detection techniques. Building a robust anomaly detection model for use in high-dimensional spaces requires the(More)
Participatory sensing using mobile devices is emerging as a promising method for large-scale data sampling. A critical challenge for participatory sensing is how to preserve the privacy of individual contributors' data. In addition, the integrity of the data aggregation is vital to ensure the acceptance of the participating sensing model by the(More)
The problem of unsupervised anomaly detection arises in a wide variety of practical applications. While one-class support vector machines have demonstrated their effectiveness as an anomaly detection technique, their ability to model large datasets is limited due to their memory and time complexity for training. To address this issue for supervised learning(More)
The ubiquity of mobile sensing devices in the Internet of Things (IoT) enables an emerging data crowdsourcing paradigm called participatory sensing, where multiple individuals collect data and use a cloud service to analyse the union of the collected data. An example of such collaborative analysis is collaborative anomaly detection. Given the possibility(More)
Identifying unusual or anomalous patterns in an underlying dataset is an important but challenging task in many applications. The focus of the unsupervised anomaly detection literature has mostly been on vectorised data. However, many applications are more naturally described using higher-order tensor representations. Approaches that vectorise tensorial(More)
In collaborative anomaly detection, multiple data sources submit their data to an on-line service, in order to detect anomalies with respect to the wider population. A major challenge is how to achieve reasonable detection accuracy without disclosing the actual values of the participants’ data. We propose a lightweight and scalable privacypreserving(More)
Many conventional statistical machine learning algorithms generalise poorly if distribution bias exists in the datasets. For example, distribution bias arises in the context of domain generalisation, where knowledge acquired from multiple source domains need to be used in a previously unseen target domains. We propose Elliptical Summary Randomisation(More)