DPASF: a flink library for streaming data preprocessing

@article{AlcaldeBarros2019DPASFAF,
  title={DPASF: a flink library for streaming data preprocessing},
  author={Alejandro Alcalde-Barros and Diego Garc{\'i}a-Gil and Salvador Garc{\'i}a and Francisco Herrera},
  journal={Big Data Analytics},
  year={2019},
  volume={4},
  pages={1-17}
}
  • Alejandro Alcalde-Barros, Diego García-Gil, +1 author Francisco Herrera
  • Published in ArXiv 2019
  • Computer Science, Mathematics
  • Big Data Analytics
  • BackgroundData preprocessing techniques are devoted to correcting or alleviating errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data preprocessing, there is little research devoted to the continuous Big Data problem. Apache Flink is a recent and novel Big Data framework, following the MapReduce paradigm, focused on distributed stream and batch data processing.In this paper, we… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    4
    Twitter Mentions

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 28 REFERENCES

    Contrary to Popular Belief Incremental Discretization can be Sound, Computationally Efficient and Extremely Useful for Streaming Data

    • Geoffrey I. Webb
    • Computer Science
    • 2014 IEEE International Conference on Data Mining
    • 2014
    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL

    Online Feature Selection and Its Applications

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    elbaulp/dpasf: 0.1.1 release (Oct. 2018)

    • A. Alcalde
    • URL https://github.com/elbaulp/DPASF
    • 2018
    VIEW 2 EXCERPTS