• Corpus ID: 12430011

Design, Configuration, Implementation, and Performance of a Simple 32 Core Raspberry Pi Cluster

  title={Design, Configuration, Implementation, and Performance of a Simple 32 Core Raspberry Pi Cluster},
  author={Vincent A. Cicirello},
In this report, I describe the design and implementation of an inexpensive, eight node, 32 core, cluster of raspberry pi single board computers, as well as the performance of this cluster on two computational tasks, one that requires significant data transfer relative to computational time requirements, and one that does not. We have two use-cases for the cluster: (a) as an educational tool for classroom usage, such as covering parallel algorithms in an algorithms course; and (b) as a test… 

Figures and Tables from this paper


Budget Beowulfs: A Showcase of Inexpensive Clusters for Teaching PDC
This special session, several PDC educators will bring, present, and demonstrate their innovative Beowulf clusters; each designed and built using a different inexpensive multiprocessor board.
The Micro-Cluster Showcase: 7 Inexpensive Beowulf Clusters for Teaching PDC
This special session, six cluster designers will bring and demonstrate micro-clusters they have built using inexpensive single-board computers (SBCs) and describe how they have used their clusters to provide their students with hands-on experience using the shared-memory, distributed- memory, and heterogeneous computing paradigms.
BEOWULF: A Parallel Workstation for Scientific Computation
It is shown that the Beowulf architecture provides a new operating point in performance to cost for high performance workstations, especially for file transfers under favorable conditions.
Communication-optimal parallel algorithm for strassen's matrix multiplication
A new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication is obtained, and it exhibits perfect strong scaling within the maximum possible range.
Teaching with parallella: a first look in an undergraduate parallel computing course
The Parallella, an energy efficient single board computer (SBC) with 18 cores, with a small form-factor, high number of cores and relative cheapness makes it a very attractive option for introducing students to parallel computing.
Recursive array layouts and fast parallel matrix multiplication
It is demonstrated that carrying the recursive layout down to the level of individual matrix elements is counterproductive, and that a combination of recursive layouts down to canonically ordered matrix tiles instead yields higher performance.
Variable Annealing Length and Parallelism in Simulated Annealing
The results show that the approach can achieve substantial performance gains, throughout the course of the run, demonstrating the approach to be an effective anytime algorithm.
Matrix Multiplication, a Little Faster
A generalization of Probert's lower bound that holds under change of basis is proved, showing that for matrix multiplication algorithms with a 2x2 base case, the leading coefficient of the algorithm cannot be further reduced, hence optimal.
Gaussian elimination is not optimal
t. Below we will give an algorithm which computes the coefficients of the product of two square matrices A and B of order n from the coefficients of A and B with tess than 4 . 7 n l°g7 arithmetical
Determining Sample Sizes for Monte Carlo Integration
In an introductory course in probability, one discusses the use of random samples for estimating an unknown population average /jl. If one chooses a large enough random sample xv...,xn, then by the