• Corpus ID: 15356409

Data Mining and Data Pre-processing for Big Data

  title={Data Mining and Data Pre-processing for Big Data},
  author={Ashish R. Jagdale and Kavita Sonawane and Shamsuddin Sultan Khan},
Big Data is a term which is used to describe massive amount of data generating from digital sources or the internet usually characterized by 3 V's i.e. Volume, Velocity and Variety. From the past few years data is exponentially growing due to the use of connected devices such as smart phone's, tablets, laptops and desktop computer. Moreover E-commerce which is also known as online market, internet services and social networking sites are generating tremendous user data in the form of documents… 
MapReduce Based Multilevel Association Rule Mining from Concept Hierarchical Sales Data
This paper overcomes the drawback of single node computing by distributing the task to cluster of nodes and the time efficiency of proposed algorithms is compared with existing Multilevel Frequent Pattern Mining Algorithm (MFPM).
Mining Big Data to Predicting Future
This paper discusses secure management and privacy of big data as one of essential issues and defines some of these problems, using illustrations with applications from various areas.
Filtering and Analysis of Big Data to Improve Performance
This research paper focuses on extracting useful data leaving out the noisy and redundant data by using the 2 Phase filtering model and the proposed architecture, Rule-based filtering, Cluster- based filtering and String matching (using Jaro-distance formula) Algorithm used with Spark to achieve the filtered data.
A Survey on Big Data Pre-processing
  • Zhi-bin Guan, Tongkai Ji, Xu Qian, Yan Ma, Xuehai Hong
  • Computer Science
    2017 5th Intl Conf on Applied Computing and Information Technology/4th Intl Conf on Computational Science/Intelligence and Applied Informatics/2nd Intl Conf on Big Data, Cloud Computing, Data Science (ACIT-CSII-BCD)
  • 2017
The four phases of data pre-processing, including data cleansing, data integration, data reduction, and data transformation, have been discussed and different approaches for a variety of purposes have been presented, which show current methods and techniques need to be further modified in order to improve the quality of data before data analysis.
A Newfangled IoT Big Data Parallel Preprocessing Frameworkto Facilitate Quality IoT Big Dataanalytics
A newfangled IoT Big data parallel preprocessing framework has been proposed to convert the raw data into treasurable information thereby enabling quality IoT big data analytics to attain the full fruition of this emerging technology.
A Novel Perspective on Hand Vein Patterns for Biometric Recognition: Problems, Challenges, and Implementations
Biometric recognition using hand vein patterns is a relatively new technology that has showed a lot of promise from its inception, with features matching or even exceeding well-established biometric


Mining of Massive Datasets
Determining relevant data is key to delivering value from massive amounts of data and big data is defined less by volume which is a constantly moving target than by its ever-increasing variety, velocity, variability and complexity.
Scaling big data mining infrastructure: the twitter experience
This paper discusses the evolution of Twitter's infrastructure and the development of capabilities for data mining on "big data", and observes that a major challenge in building data analytics platforms stems from the heterogeneity of the various components that must be integrated together into production workflows.
Mining big data: current status, and forecast to the future
This issue introduces four articles, written by influential scientists in the field, covering the most interesting and state-of-the-art topics on Big Data mining, and presents a broad overview of the topic, its current status, controversy, and a forecast to the future.
Mining Big Data in Real Time
  • A. Bifet
  • Computer Science
  • 2013
The current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years are discussed.
Mining Big Data in the Enterprise for Better Business Intelligence
Intel IT is deploying a big data platform in 2012 in close partnership with Intel business groups in proofs of concept to demonstrate its utility in providing BI within the enterprise.
Challenges and Opportunities with Big Data
The controversies and myths surrounding Big Data are explored, to try to explore the controversies and debunk the myths around Big Data.
On the Origin(s) and Development of the Term 'Big Data'
The origins of the now-ubiquitous term ”Big Data," in industry and academics, in computer science and statistics/econometrics, are investigated, with results indicating that Big Data the term is now firmly entrenched, Big data the phenomenon continues unabated, and Big DataThe discipline is emerging.
How to Semantically Enhance a Data Mining Process?
This paper focuses first on the pre-processing steps of business understanding and data understanding in order to build an ontology driven information system (ODIS), then shows how the knowledge base is used for the post-processing step of model interpretation.
Reinventing society in the wake of big data
  • Edge.org, http://www.edge.org/conversation/reinventing-society-in-thewake-of-big-data,
  • 2012
3-D Data Management: Controlling Data Volume, Velocity and Variety
  • META Group Research Note, February
  • 2001