Projects »

ClusterHAT Review for the Raspberry Pi Zero

Cluster computing on a £100 budget with 4 Raspberry Pi Zeros and a ClusterHAT, networked together using USB Gadget mode and no cables

The Cluster HAT from UK firm 8086 Consultancy is an ideal tool for teaching, testing or simulating small scale clusters. It interfaces a (controller) Raspberry Pi A+/B+/2/3 with four Raspberry Pi Zeros configured to use USB Gadget mode, which effectively networks them together without needing any cables or separate Ethernet switch hardware (and of course the Pi Zeros don’t actually have an Ethernet port). Fortunately the ClusterHAT is easy to set up and use, and there is no need to learn the ins-and-outs of device drivers or networking interfaces – an already-configured version of Raspbian is provided for download.

Many thanks to Pimoroni in Sheffield for kindly sending me a ClusterHAT to play with. It took me months to buy the four Pi Zeros one at a time!

What you get with the ClusterHAT Raspberry Pi 3, ClusterHAT and four Pi Zeros

Power requirements

I’m using a Raspberry Pi 3 for the controller board, and with 4 slave Pi Zeros powered on the cluster only uses a total of 3.8W at idle. (Or 1.2W if you use the ClusterHAT commands to cut power to the Pi Zeros when you don’t need them.)

At 100% load the cluster uses a total of 7.4W. I found a standard 2A microUSB power supply worked fine, although ClusterHAT recommend 2.5A if you use a Pi 3.

Temperature & Cooling

The Raspberry Pi 3 SoC runs much hotter than in previous models, and if it reaches 80°C (176°F) it will automatically throttle down its clock speed, to avoid getting any hotter and damaging the chip. It can safely run long-term at this temperature, but you don’t get maximum performance.

Using:
clusterhat on all
vcgencmd measure_temp
to measure the SoC core temperature, the cluster idles at an average of 50°C (122°F) with the Pi Zeros powered up and just passive cooling. i.e., no fans, and hot air convecting out through grills in the case.

At 100% load, using:
sysbench --test=cpu --cpu-max-prime=200000 --num-threads=4 run &
on each node the SoC core temperatures eventually reached an average of 70°C (158°F) and none of the boards need to throttle down their clock speed. The top of the case gets pretty warm to the touch, but I think it is (just about) practical to run the cluster 24/7 without needing a fan.

Networking

The ClusterHAT registers each Pi Zero on the Ethernet port of the controller Pi, essentially acting like a network switch. After powering up the Pi Zeros, the Linux network configuration will be something like this:
pi@controller:~ $ ifconfig

# Actual Pi 3 controller Ethernet interface
eth0      Link encap:Ethernet  HWaddr 1c:d1:12:bb:88:ee
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

# Bridge interface created by ClusterHAT, routing between Pi 3 and Pi Zeros
br0       Link encap:Ethernet  HWaddr 00:22:82:ff:ff:01
          inet addr:192.168.1.120  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::ba27:ebff:feac:6a6e/64 Scope:Link

# Pi Zero #1 gadget (IPv4 address from DHCP)
ethpi1    Link encap:Ethernet  HWaddr 00:22:82:ff:ff:01
          inet6 addr: fe80::222:82ff:feff:ff01/64 Scope:Link

# Pi Zero #2 gadget
ethpi2    Link encap:Ethernet  HWaddr 00:22:82:ff:ff:02
          inet6 addr: fe80::222:82ff:feff:ff02/64 Scope:Link

# Pi Zero #3 gadget
ethpi3    Link encap:Ethernet  HWaddr 00:22:82:ff:ff:03
          inet6 addr: fe80::222:82ff:feff:ff03/64 Scope:Link

# Pi Zero #4 gadget
ethpi4    Link encap:Ethernet  HWaddr 00:22:82:ff:ff:04
          inet6 addr: fe80::222:82ff:feff:ff04/64 Scope:Link

I used iPerf between 2 boards to do some simple speed tests:
# Install iperf package on all 5 boards
apt-get install iperf
# Run server on controller
iperf -s -V
# Run client for 100 seconds on a Pi Zero, connecting to controller
iperf -c controller -i 1 -t 100 -V

iPerf v2.0.5 (TCP, ClusterHAT, board-to-board)
» Mbits/sec, More Is Better
I was impressed with the speed of the network-over-USB that the ClusterHAT provides. Transfer from one of the Pi Zeros to the Pi 3 controller is only slightly slower than the external Pi 3 Ethernet (94.1Mbits/sec). The total usable bandwidth within the ClusterHAT is 162.9Mbits/sec, so an average of about 40Mbits/sec each if all 4 Pi Zeros are using the internal network at once. This will obviously be lower if the Zeros are sharing the external Pi 3 Ethernet to connect outside of the ClusterHAT.

Cluster Benchmarks

It proved harder than expected to find a suitable benchmark for the cluster. I’ve used distributed Linpack on other clusters, but the standard release only caters for distributing the benchmark across multiple identical processors+memory (homogeneous computing). The ClusterHAT has a four faster A53 1.2GHz cores in the Pi 3 SoC sharing 1GB of RAM and four slower ARM11 1.0GHz cores sharing 2GB of RAM, one in each Raspberry Pi Zero (heterogeneous computing). I originally tried running Linpack across all the Pis, but the Pi 3 cores finish each unit of work quickly, and then wait for the Pi Zero cores to complete before starting the next unit of work.

I looked at a Linpack customised for heterogeneous clusters, and also the SHOC benchmark, but neither of these are available for Raspberry Pi boards (missing a working OpenCL library, etc.)

So I settled for benchmarking with HPC Challenge Linpack from the Debian hpcc package. Software was a customised Raspbian Linux (Debian Jessie 8.4) download provided by ClusterHAT, running on a Kingston class 10 16Gb microSDHC card in each of the Pis. I did separate benchmarks for a single Pi Zero, four Pi Zeros at once, and finally just the quadcore Raspberry Pi 3:

Linpack TPP v1.4.1 (Linear Equation Solver)
» MFLOPS, More Is Better
A very rough estimation of the total cluster performance would be to simply add up the separate Linpack scores for 4× Pi Zeros and the Pi 3 controller. This cluster set up clearly isn’t about total computing power – a single Pi 3 gives more than double the MFLOPS performance of even 4× Pi Zeros working together. This is partly down to the faster individual cores in the Pi 3, but also the node-node communication is obviously a great deal faster within the Pi 3 SoC compared to communicating between separate Zeros with Ethernet-over-USB.

Shell commands to set up each of the five Pis:
# Expand file system to fill SD card on each Pi
sudo su
raspi-config
# Install hpcc benchmark on each Pi
apt-get install hpcc
# Add a new user for running cluster benchmarks on each Pi
adduser mpiuser
# Disable swap on all Raspberry Pis
dphys-swapfile swapoff
dphys-swapfile uninstall
update-rc.d dphys-swapfile remove

and on the Pi 3 (controller) node in the ClusterHAT:
su - mpiuser
cp /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt
# Edit hpccinf.txt so that NB=108, N=9972, P=2 and Q=2 (P x Q = 4 cores)
sed -i "8s/.*/108\tNBs/; 6s/.*/9972\tNs/; 11s/.*/2\tPs/; 12s/.*/2\tQs/" hpccinf.txt
mpirun -n 4 hpcc
grep -F -e HPL_Tflops hpccoutf.txt

and on one of the Pi Zero slave nodes (use the hostnames or IP addresses for your Pis):
su - mpiuser
cp /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt
# Edit hpccinf.txt so that NB=108, N=13176, P=2 and Q=2 (P x Q = 4 cores)
sed -i "8s/.*/108\tNBs/; 6s/.*/13176\tNs/; 11s/.*/2\tPs/; 12s/.*/2\tQs/" hpccinf.txt
# Generate & copy SSH keys across cluster, controller NOT benchmarked - only the Pi Zeros
ssh-keygen -t dsa
ssh-copy-id p1
ssh-copy-id p2
ssh-copy-id p3
ssh-copy-id p4
# Run 1 process on each PiZero
mpirun -npernode 1 -host p1,p2,p3,p4 hpcc
grep -F -e HPL_Tflops hpccoutf.txt

Conclusion

The ClusterHAT is a low cost and hassle-free way to dive into the world of cluster computing. You can run standard HPC/distributed applications (at least those written for CPU rather than GPUs) in a cluster that literally fits in the palm of your hand, and only consumes a few watts of power.

The Raspberry Pi Zero is undoubtedly very slow in terms of modern computing performance, but a tiny cluster like this is ideal for teaching, or even developing distributed software that can then be ported to much more powerful systems.

Making a ClusterHAT case

Ever since I discovered how easy laser-cutting was a couple of months ago, I haven’t been able to stop making new computer cases... so here is one that you can download the design for, customise if you want to, and laser-cut in 3mm extruded acrylic (perspex):

2D design of case parts Finished ClusterHAT acrylic case

For the initial 2D design of the box I used my Making a laser-cut box/case with elastic clips web app, and then added holes for Ethernet, HDMI, USB and Video/Audio jack using the free Inkscape application, ready for exporting to the laser cutter. Each colour is a different pass of the laser, at different power/speed levels, so the green lines are cut first to make holes for ports, pink are extra cuts to help extract delicate parts, orange is text/lines that are etched and finally blue cuts the outside of each panel. Download files for laser cutting (requires approx 280×150mm) on a 3mm thick sheet:

  1. SVG format Sheet or
  2. DXF format Sheet

The Pi 3 is mounted in the case with 4× M2.5 hex spacer/standoffs which screw though the PCB into the 12mm standoffs that the ClusterHAT uses, and there are 4× screws left over to fix the entire cluster to the bottom of the case. The adhesive feet from the ClusterHAT look much better on the bottom of the acrylic case, rather than stuck directly onto the PCB...

I don’t know if small cases are harder to design, but I got through 6 or 7 prototype cases in MDF before I had the position & sizing of the ports just right! I only exposed the Audio/Video port to make it easier to connect through the case to the microUSB and HDMI ports (because the A/V port sticks out from the PCB). And then my largest microUSB plug wouldn’t fit through the case holes. It was finally? perfect until I tried to put the microSD cards into the Pi Zeros, and realised I needed another 0.5mm of clearance at that end of the case. Doh.

Bill of materials

3mm extruded clear perspex 280×150mm£0.93
Laser cutting chargen/a
M2.5 nylon hex spacer 5mm female+6mm male (4 from 20 pack)£1.17
Total for case, inc P&P£2.10
ClusterHAT, inc 4× M2.5 spacers, 8× screws, 4× feet£28.00
Raspberry Pi Zero (4 pack... good luck ;))£18.50
Raspberry Pi 3£32.00
2.5A 5V power supply£7.50
Kingston class 10 16Gb microSDHC card (5 pack)£16.95
Total for everything, inc P&P£105.05

Future improvements/ideas

Clusters of other Single Board Computers

So far I’ve built clusters using the following ARM boards: I’d like to build a small cluster of all the current crop of sub-$100 ARM SBCs, comparing the different features, and with detailed benchmarks. e.g., Odroid C2/XU4 and the Banana Pi M3. Please email me if you’d like to send boards for review.
Fb Share this on Facebook
Nick Smith, August 2016.