• Corpus ID: 10431152

Distributed and parallel time series feature extraction for industrial big data applications

@article{Christ2016DistributedAP,
  title={Distributed and parallel time series feature extraction for industrial big data applications},
  author={Maximilian Christ and A. Kempa-Liehr and Michael Feindt},
  journal={ArXiv},
  year={2016},
  volume={abs/1610.07717}
}
The all-relevant problem of feature selection is the identification of all strongly and weakly relevant attributes. [...] Key Method The proposed algorithm combines established feature extraction methods with a feature importance filter. It has a low computational complexity, allows to start on a problem with only limited domain knowledge available, can be trivially parallelized, is highly scalable and based on well studied non-parametric hypothesis tests. We benchmark our proposed algorithm on all binary…Expand
FLOps: On Learning Important Time Series Features for Real-Valued Prediction
TLDR
This paper proposes an automated feature learning mechanisms to filter out most useful features from hundreds of available features for time series prediction problems using a novel mechanism to dynamically filter features that are most suitable for the given input time series data.
Energy Time-Series Features for Emerging Applications on the Basis of Human-Readable Machine Descriptions
TLDR
This article studies the issue of extracting features from energy time series for a novel use case: Deriving human-understandable descriptions for smart-meter measurements of industrial production machines and selects features suitable for the use case to derive machine descriptions for an industrial production facility.
tofee-tree: automatic feature engineering framework for modeling trend-cycle in time series forecasting
TLDR
The proposed automatic feature engineering framework for modeling the trend-cycle (tofee-tree) in time series forecasting improved the overall Symmetric Mean Absolute Percentage Error (SMAPE) in the one-step, medium- and long-term.
Hierarchical Time Series Feature Extraction for Power Consumption Anomaly Detection
TLDR
A novel systematic timeseries feature extraction method named hierarchical time series feature extraction is proposed, used for supervised binary classification model that only using user registration information and daily power consumption data, to detect anomaly consumption user with an output of stealing probability.
Towards a Near Universal Time Series Data Mining Tool: Introducing the Matrix Profile
TLDR
The utility of matrix profile is demonstrated for many time series data mining problems, including motif discovery, discord discovery, weakly labeled time series classification, and representation learning on domains as diverse as seismology, entomology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring, and medicine.
A comparative evaluation of novelty detection algorithms for discrete sequences
TLDR
An experimental comparison of candidate methods for the novelty detection problem applied to discrete sequences is provided to identify which state-of-the-art methods are efficient and appropriate candidates for a given use case.
VEST: Automatic Feature Engineering for Forecasting
TLDR
It is discovered that combining the features generated by VEST with auto-regression significantly improves forecasting performance, and is provided evidence using 90 time series with high sampling frequency.
Master Computer Science Automated Regression Pipeline for Time-Series problems with Real-World applications
In this Master’s Thesis research project, an automated machine learning pipeline for time series regression problems is proposed and applied to a number of real-world data sets in areas such as
Time Series Features Extraction Versus LSTM for Manufacturing Processes Performance Prediction
TLDR
This research work addresses challenges such as the determination of the best threshold for distinguishing between performant and unperformant processes, the identification of the most frequent patterns in unper performant processes and the consideration of several techniques for replacing the missing data given the complexity of manufacturing processes.
Sensor Data Preprocessing, Feature Engineering and Equipment Remaining Lifetime Forecasting for Predictive Maintenance
TLDR
An overview of common univariate time series preprocessing steps and the most appropriate methods, with consideration of the field of application is provided.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 85 REFERENCES
Some issues on scalable feature selection
TLDR
A scalable probabilistic algorithm that expedites feature selection further and can scale up without sacrificing the quality of selected features and an incremental algorithm that adapts to the newly extended feature set and captures `concept drifts' by removing features from previously selected and newly added ones are proposed.
Highly Comparative Feature-Based Time-Series Classification
TLDR
A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series, allowing the method to perform well on very large data sets containing long time series or time series of different lengths.
Characteristic-Based Clustering for Time Series Data
TLDR
This paper proposes a method for clustering of time series based on their structural characteristics, which reduces the dimensionality of the time series and is much less sensitive to missing or noisy data.
Consistent Feature Selection for Pattern Recognition in Polynomial Time
TLDR
It is proved that ALL-RELEVANT is much harder than MINIMAL-OPTIMAL and two consistent, polynomial-time algorithms are proposed to simplify feature selection in a wide range of machine learning tasks.
Automatic Feature Extraction for Classifying Audio Data
TLDR
A unifying framework for feature extraction from value series is presented and operators of this framework can be combined to feature extraction methods automatically, using a genetic programming approach.
The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances
TLDR
This work implemented 18 recently proposed algorithms in a common Java framework and compared them against two standard benchmark classifiers (and each other) by performing 100 resampling experiments on each of the 85 datasets, indicating that only nine of these algorithms are significantly more accurate than both benchmarks.
Comparison of different weighting schemes for the kNN classifier on time-series data
TLDR
This article revisits the kNN classifier on time-series data by considering ten classic distance-based vote weighting schemes in the context of Euclidean distance, as well as four commonly used elastic distance measures: DTW, Longest Common Subsequence, Edit Distance with Real Penalty and Edit Distance on Real sequence.
A Practical Approach to Feature Selection
TLDR
Comparison with other feature selection algorithms shows Relief's advantages in terms of learning time and the accuracy of the learned concept, suggesting Relief's practicality.
Feature Selection for High-Dimensional Data
TLDR
This chapter focuses on setting out some inherent difficulties that these datasets may have and, therefore, representing a challenge for any learning technique, including feature selection, turning to huge datasets where dimensionality reduction becomes a necessity.
Pattern Extraction for Time Series Classification
TLDR
It is argued that many time-series classification problems can be solved by detecting and combining local properties or patterns in time series, and a technique is proposed to find patterns which are useful for classification.
...
1
2
3
4
5
...