V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows

@article{Ghit2014VFV,
  title={V for Vicissitude: The Challenge of Scaling Complex Big Data Workflows},
  author={Bogdan Ghit and Mihai Capota and Tim Hegeman and Jan Hidders and Dick H. J. Epema and Alexandru Iosup},
  journal={2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing},
  year={2014},
  pages={927-932}
}
In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several MapReduce jobs that have a heavy-tailed distribution with respect to both execution time and input size characteristics. Processing BitTorrent data in… CONTINUE READING

Similar Papers

Citations

Publications citing this paper.
SHOWING 1-4 OF 4 CITATIONS

Massivizing Computer Systems: A Vision to Understand, Design, and Engineer Computer Ecosystems Through and Beyond Modern Distributed Systems

  • 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)
  • 2018
VIEW 5 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

The AtLarge Vision on the Design of Distributed Systems and Ecosystems

VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Self-awareness of Cloud Applications

  • Self-Aware Computing Systems
  • 2016
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 10 REFERENCES

The BTWorld use case for big data analytics: Description, MapReduce logical workflow, and empirical evaluation

  • 2013 IEEE International Conference on Big Data
  • 2013
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Towards Machine Learning-Based Auto-tuning of MapReduce

  • 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems
  • 2013
VIEW 1 EXCERPT

Resource Management for Dynamic MapReduce Clusters in Multicluster Systems

  • 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
  • 2012
VIEW 1 EXCERPT

The Case for Evaluating MapReduce Performance Using Workload Suites

  • 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
  • 2011
VIEW 1 EXCERPT

Unraveling the BitTorrent Ecosystem

  • IEEE Transactions on Parallel and Distributed Systems
  • 2011
VIEW 1 EXCERPT

The HiBench benchmark suite: Characterization of the MapReduce-based data analysis

  • 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)
  • 2010
VIEW 1 EXCERPT