Optimizing the data distribution layer of BOINC with BitTorrent

@article{Costa2008OptimizingTD,
  title={Optimizing the data distribution layer of BOINC with BitTorrent},
  author={Fernando Costa and Lu{\'i}s Moura Silva and Ian Kelley and Gilles Fedak},
  journal={2008 IEEE International Symposium on Parallel and Distributed Processing},
  year={2008},
  pages={1-8}
}
  • Fernando Costa, L. Silva, G. Fedak
  • Published 14 April 2008
  • Computer Science
  • 2008 IEEE International Symposium on Parallel and Distributed Processing
In this paper we show how we applied BitTorrent data distribution techniques to the BOINC middleware. Our goal was to decentralize BOINC's data model to take advantage of client network capabilities. To achieve this, we developed a prototype that adds BitTorrent functionality for task distribution and conducted small- scale tests of the environment. Additionally, we measured the impact of the BitTorrent components in both the BOINC client and server, and compared it with the original… 

Figures from this paper

BitWorker, a Decentralized Distributed Computing System Based on BitTorrent
TLDR
This paper evaluates the performance of BitWorker using mathematical models and real tests, showing processing and robustness gains, and is available for download and use by the community.
nuBOINC: BOINC Extensions for Community Cycle Sharing
TLDR
A set of BOINC extensions that allow any user to create and submit jobs that can take advantage of remote idle cycles and a exhaustive definition of jobs while leveraging a cycle-sharing platform into a global computer cycle market is described.
Large-scale volunteer computing over the Internet
TLDR
GiGi-MR, a framework that allows non-expert users to run CPU-intensive jobs on top of volunteer resources over the Internet, obtains a performance increase of over 60 % in application turnaround time, while reducing the bandwidth used by an order of magnitude.
Scaling the Deployment of Virtual Machines in UnaCloud
TLDR
Tests showed that BitTorrent, a P2P file transfer protocol, outperforms copying a single image using other protocols and can be used to scale the deployment in UnaCloud to support clusters with a large number of nodes.
Distributed computing and communication in peer-to-peer networks
TLDR
Results show that that a distributed processing system based on a decentralised, peer-to-peer network can provide similar results to distributed processing systems based on traditional client/server networking architectures.
Internet-scale support for map-reduce
TLDR
VMR is presented, a VC system able to run MapReduce applications on top of volunteer resources, spread throughout the Internet, that obtains a performance increase of over 60% in application turnaround time, while reducing server bandwidth use by two orders of magnitude and showing no discernible overhead.
Internet-scale support for map-reduce processing
TLDR
VMR is presented, a VC system able to run MapReduce applications on top of volunteer resources, spread throughout the Internet, that obtains a performance increase of over 60% in application turnaround time, while reducing server bandwidth use by two orders of magnitude and showing no discernible overhead.
nuBOINC: BOINC Extensions for Community Cycle Sharing
  • J. Silva, L. Veiga, P. Ferreira
  • Computer Science
    2008 Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshops
  • 2008
TLDR
A set of BOINC extensions that allow any user to create and submit jobs that can take advantage of remote idle cycles, allowing an expressive definition of jobs providing considerable speed gains, while leveraging a cyclesharing platform and widely available commodity applications, in a truly global communal computer cycle market.
A scalable super-peer approach for public scientific computation
VMR: volunteer MapReduce over the large scale internet
TLDR
VMR is presented, a VC system able to run MapReduce applications on top of volunteer resources, over the large scale Internet, and obtains a performance increase of over 60% in application turnaround time, while reducing the bandwidth use by an order of magnitude.
...
1
2
3
...

References

SHOWING 1-10 OF 27 REFERENCES
Scheduling independent tasks sharing large data distributed with BitTorrent
TLDR
A performance model to select the best of FTP and BitTorrent protocols according to the size of the file to distribute and the number of receiver nodes is proposed and enhancement of the BitTorrent protocol is proposed which provides more predictable communication patterns.
Incentives Build Robustness in Bit-Torrent
The BitTorrent file distribution system uses tit-fortat as a method of seeking pareto efficiency. It achieves a higher level of robustness and resource utilization than any currently known
Dissecting BitTorrent: Five Months in a Torrent's Lifetime
TLDR
This paper studies BitTorrent, a new and already very popular peer-to-peer application that allows distribution of very large contents to a large set of hosts and assesses the performance of the algorithms used in BitTorrent through several metrics.
The Julia Content Distribution Network
TLDR
Compared with the state-of-the-art BitTorrent content distribution network, the authors find that while Julia achieves slightly slower average finishing times relative to BitTorrent, Julia nevertheless reduces the total communication cost in the network by approximately 33%.
The KaZaA Overlay : A Measurement Study
TLDR
This study builds two measurement apparatus and uses the measurement results to set forth a number of key principles for the design of a successful unstructured P2P overlay.
Tackling the Collusion Threat in P2P-enhanced Internet Desktop Grids
TLDR
Some sabotage tolerance techniques that can be used by a middleware like BOINC if enhanced with a P2P infrastructure for data distribution, and the master will build the reputation of all the workers, by observing their trustworthiness from their previous results and the compliance with the P1P data delivery protocols.
FreeLoader: Scavenging Desktop Storage Resources for Scientific Data
TLDR
The experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets, by delivering higher data access rates than traditional storage facilities, and novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation’s network communication bandwidth and local I/O bandwidth are presented.
A performance vs. cost framework for evaluating DHT design tradeoffs under churn
TLDR
PVC analysis shows that the key to efficiently using additional bandwidth is for a protocol to adjust its routing table size, and shows that routing table stabilization is wasteful and can be replaced with opportunistic learning through normal lookup traffic.
Scheduling data-intensive bags of tasks in P2P grids with bittorrent-enabled data distribution
TLDR
This work proposes to combine several existing technologies and patterns to perform efficient data-aware scheduling, including use of the BitTorrent P2P file sharing protocol to transfer data, data caching on computational Resources, and a new Task selection scheduling algorithm, based on the temporally grouped scheduling of Tasks sharing input data files.
...
1
2
3
...