# Outlier Detection for Text Data : An Extended Version

@article{Kannan2017OutlierDF, title={Outlier Detection for Text Data : An Extended Version}, author={Ramakrishnan Kannan and Hyenkyun Woo and Charu C. Aggarwal and Haesun Park}, journal={ArXiv}, year={2017}, volume={abs/1701.01325} }

The problem of outlier detection is extremely challenging in many domains such as text, in which the attribute values are typically non-negative, and most values are zero. [] Key Method Our iterative algorithm TONMF is based on block coordinate descent (BCD) framework. We define blocks over the term-document matrix such that the function becomes solvable. Given most recently updated values of other matrix blocks, we always update one block at a time to its optimal.

## 15 Citations

### Outlier Detection for Text Data

- Computer ScienceSDM
- 2017

This paper presents a matrix factorization method, which is naturally able to distinguish the anomalies with the use of low rank approximations of the underlying data, and has significant advantages over traditional methods for text outlier detection.

### A Lightweight Yet Robust Approach to Textual Anomaly Detection

- Computer ScienceTRAC
- 2022

A new approach based on an alternative regularization of the NMF objective is introduced, which surpass other linear AD models and are on par with deep models, performing comparably well even in very small outlier concentrations.

### A Study on Different Methods of Outlier Detection Algorithms in Data Mining

- Computer Science
- 2020

This survey provides an overview of outliers and existing outliers by classifying them into different dimensions and proves the performance of the rough set based entropy measure with weighted density value over existing methods.

### Anomaly Detection Between Judicial Text-Based Documents

- Computer Science2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)
- 2020

Two methods for searching for anomalies in judicial practice are presented and a comparative analysis of the results of the effectiveness of both methods is given.

### Progress in Outlier Detection Techniques: A Survey

- Computer ScienceIEEE Access
- 2019

This survey presents a comprehensive and organized review of the progress of outlier detection methods from 2000 to 2019 and categorizes them into different techniques from diverse outlier Detection techniques, such as distance-, clustering-, density-, ensemble-, and learning-based methods.

### Convolutional Neural Networks for Unsupervised Anomaly Detection in Text Data

- Computer ScienceIDEAL
- 2017

A specific CNN architecture that consists of one convolutional layer and one subsampling layer, which uses RBF activation function and logarithmic loss function on the final layer and minimization of the corresponding objective function helps to calculate the location parameter of the features’ weights discovered on the last network layer.

### GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers

- Computer ScienceArXiv
- 2021

This paper proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets that generates instances by the Gaussian Mixture Model and adapt tail probability of instances through the Mahalanobis distance to consider local outliers.

### Anomaly Detection in Text Documents using HTM Networks

- Computer ScienceITAT
- 2021

This work has combined multiple algorithms, including a non-traditional neural network model the Hierarchical Temporal Memory (HTM) network, to find anomalies in texts, using semantic folding to represent the text inputs for the HTM algortihm.

### DATE: Detecting Anomalies in Text via Self-Supervision of Transformers

- Computer ScienceNAACL
- 2021

This work learns the DATE model end-to-end, enforcing two independent and complementary self-supervision signals, one at the token-level andOne at the sequence-level, and shows strong quantitative and qualitative results on the 20Newsgroups and AG News datasets.

### Topic modeling for sequential documents based on hybrid inter-document topic dependency

- Computer ScienceJ. Intell. Inf. Syst.
- 2021

Two new topic modeling methods for sequential documents based on hybrid inter-document topic dependency are proposed, which outperform state-of-the-art models in terms of the accuracy of topic modeling, the quality of topic clustering, and the effectiveness of outlier detection.

## References

SHOWING 1-10 OF 36 REFERENCES

### Outlier detection for high dimensional data

- Computer ScienceSIGMOD '01
- 2001

New techniques for outlier detection which find the outliers by studying the behavior of projections from the data set are discussed.

### LOF: identifying density-based local outliers

- Computer ScienceSIGMOD '00
- 2000

This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.

### Algorithms for Mining Distance-Based Outliers in Large Datasets

- Computer ScienceVLDB
- 1998

This paper provides formal and empirical evidence showing the usefulness of DB-outliers and presents two simple algorithms for computing such outliers, both having a complexity of O(k N’), k being the dimensionality and N being the number of objects in the dataset.

### Feature bagging for outlier detection

- Computer ScienceKDD '05
- 2005

A novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed, which combines results from multiple outlier detection algorithms that are applied using different set of features.

### Outlier Ranking via Subspace Analysis in Multiple Views of the Data

- Computer Science2012 IEEE 12th International Conference on Data Mining
- 2012

This work proposes Outrank, a novel outlier ranking concept that exploits subspace analysis to determine the degree of outlierness, and outperforms state-of-the-artoutlierness measures.

### HiCS: High Contrast Subspaces for Density-Based Outlier Ranking

- Computer Science2012 IEEE 28th International Conference on Data Engineering
- 2012

A novel subspace search method that selects high contrast subspaces for density-based outlier ranking and proposes a first measure for the contrast of subspace dimensions to enhance the quality of traditional outlier rankings.

### Outlier Detection in Arbitrarily Oriented Subspaces

- Computer Science2012 IEEE 12th International Conference on Data Mining
- 2012

In this paper, we propose a novel outlier detection model to find outliers that deviate from the generating mechanisms of normal instances by considering combinations of different subsets of…

### Efficient algorithms for mining outliers from large data sets

- Computer ScienceSIGMOD '00
- 2000

A novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor is proposed and the top n points in this ranking are declared to be outliers.

### Outlier Analysis

- Computer ScienceSpringer New York
- 2013

Outlier Analysis is a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit.

### Robust PCA via Outlier Pursuit

- Computer ScienceIEEE Transactions on Information Theory
- 2012

This work presents an efficient convex optimization-based algorithm that it calls outlier pursuit, which under some mild assumptions on the uncorrupted points recovers the exact optimal low-dimensional subspace and identifies the corrupted points.