ClusterHAT Review for the Raspberry Pi Zero
Cluster computing on a £100 budget with 4 Raspberry Pi Zeros and a ClusterHAT, networked together using USB Gadget mode and no cablesThe Cluster HAT from UK firm 8086 Consultancy is an ideal tool for teaching, testing or simulating small scale clusters. It interfaces a (controller) Raspberry Pi A+/B+/2/3 with four Raspberry Pi Zeros configured to use USB Gadget mode, which effectively networks them together without needing any cables or separate Ethernet switch hardware (and of course the Pi Zeros don’t actually have an Ethernet port). Fortunately the ClusterHAT is easy to set up and use, and there is no need to learn the ins-and-outs of device drivers or networking interfaces – an already-configured version of Raspbian is provided for download.
Many thanks to Pimoroni in Sheffield for kindly sending me a ClusterHAT to play with. It took me months to buy the four Pi Zeros one at a time!
Power requirements
I’m using a Raspberry Pi 3 for the controller board, and with 4 slave Pi Zeros powered on the cluster only uses a total of 3.8W at idle. (Or 1.2W if you use the ClusterHAT commands to cut power to the Pi Zeros when you don’t need them.)At 100% load the cluster uses a total of 7.4W. I found a standard 2A microUSB power supply worked fine, although ClusterHAT recommend 2.5A if you use a Pi 3.
Temperature & Cooling
The Raspberry Pi 3 SoC runs much hotter than in previous models, and if it reaches 80°C (176°F) it will automatically throttle down its clock speed, to avoid getting any hotter and damaging the chip. It can safely run long-term at this temperature, but you don’t get maximum performance. Using:clusterhat on all vcgencmd measure_tempto measure the SoC core temperature, the cluster idles at an average of 50°C (122°F) with the Pi Zeros powered up and just passive cooling. i.e., no fans, and hot air convecting out through grills in the case. At 100% load, using:
sysbench --test=cpu --cpu-max-prime=200000 --num-threads=4 run &on each node the SoC core temperatures eventually reached an average of 70°C (158°F) and none of the boards need to throttle down their clock speed. The top of the case gets pretty warm to the touch, but I think it is (just about) practical to run the cluster 24/7 without needing a fan.
Networking
The ClusterHAT registers each Pi Zero on the Ethernet port of the controller Pi, essentially acting like a network switch. After powering up the Pi Zeros, the Linux network configuration will be something like this:pi@controller:~ $ ifconfig # Actual Pi 3 controller Ethernet interface eth0 Link encap:Ethernet HWaddr 1c:d1:12:bb:88:ee UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 # Bridge interface created by ClusterHAT, routing between Pi 3 and Pi Zeros br0 Link encap:Ethernet HWaddr 00:22:82:ff:ff:01 inet addr:192.168.1.120 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::ba27:ebff:feac:6a6e/64 Scope:Link # Pi Zero #1 gadget (IPv4 address from DHCP) ethpi1 Link encap:Ethernet HWaddr 00:22:82:ff:ff:01 inet6 addr: fe80::222:82ff:feff:ff01/64 Scope:Link # Pi Zero #2 gadget ethpi2 Link encap:Ethernet HWaddr 00:22:82:ff:ff:02 inet6 addr: fe80::222:82ff:feff:ff02/64 Scope:Link # Pi Zero #3 gadget ethpi3 Link encap:Ethernet HWaddr 00:22:82:ff:ff:03 inet6 addr: fe80::222:82ff:feff:ff03/64 Scope:Link # Pi Zero #4 gadget ethpi4 Link encap:Ethernet HWaddr 00:22:82:ff:ff:04 inet6 addr: fe80::222:82ff:feff:ff04/64 Scope:LinkI used iPerf between 2 boards to do some simple speed tests:
# Install iperf package on all 5 boards apt-get install iperf # Run server on controller iperf -s -V # Run client for 100 seconds on a Pi Zero, connecting to controller iperf -c controller -i 1 -t 100 -V
» Mbits/sec, More Is Better
Cluster Benchmarks
It proved harder than expected to find a suitable benchmark for the cluster. I’ve used distributed Linpack on other clusters, but the standard release only caters for distributing the benchmark across multiple identical processors+memory (homogeneous computing). The ClusterHAT has a four faster A53 1.2GHz cores in the Pi 3 SoC sharing 1GB of RAM and four slower ARM11 1.0GHz cores sharing 2GB of RAM, one in each Raspberry Pi Zero (heterogeneous computing). I originally tried running Linpack across all the Pis, but the Pi 3 cores finish each unit of work quickly, and then wait for the Pi Zero cores to complete before starting the next unit of work.I looked at a Linpack customised for heterogeneous clusters, and also the SHOC benchmark, but neither of these are available for Raspberry Pi boards (missing a working OpenCL library, etc.)
So I settled for benchmarking with HPC Challenge Linpack from the Debian hpcc
package. Software was a customised Raspbian Linux (Debian Jessie 8.4) download provided by ClusterHAT, running on a Kingston class 10 16Gb microSDHC card in each of the Pis. I did separate benchmarks for a single Pi Zero, four Pi Zeros at once, and finally just the quadcore Raspberry Pi 3:
» MFLOPS, More Is Better
# Expand file system to fill SD card on each Pi sudo su raspi-config # Install hpcc benchmark on each Pi apt-get install hpcc # Add a new user for running cluster benchmarks on each Pi adduser mpiuser # Disable swap on all Raspberry Pis dphys-swapfile swapoff dphys-swapfile uninstall update-rc.d dphys-swapfile removeand on the Pi 3 (controller) node in the ClusterHAT:
su - mpiuser cp /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt # Edit hpccinf.txt so that NB=108, N=9972, P=2 and Q=2 (P x Q = 4 cores) sed -i "8s/.*/108\tNBs/; 6s/.*/9972\tNs/; 11s/.*/2\tPs/; 12s/.*/2\tQs/" hpccinf.txt mpirun -n 4 hpcc grep -F -e HPL_Tflops hpccoutf.txtand on one of the Pi Zero slave nodes (use the hostnames or IP addresses for your Pis):
su - mpiuser cp /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt # Edit hpccinf.txt so that NB=108, N=13176, P=2 and Q=2 (P x Q = 4 cores) sed -i "8s/.*/108\tNBs/; 6s/.*/13176\tNs/; 11s/.*/2\tPs/; 12s/.*/2\tQs/" hpccinf.txt # Generate & copy SSH keys across cluster, controller NOT benchmarked - only the Pi Zeros ssh-keygen -t dsa ssh-copy-id p1 ssh-copy-id p2 ssh-copy-id p3 ssh-copy-id p4 # Run 1 process on each PiZero mpirun -npernode 1 -host p1,p2,p3,p4 hpcc grep -F -e HPL_Tflops hpccoutf.txt
Conclusion
The ClusterHAT is a low cost and hassle-free way to dive into the world of cluster computing. You can run standard HPC/distributed applications (at least those written for CPU rather than GPUs) in a cluster that literally fits in the palm of your hand, and only consumes a few watts of power.The Raspberry Pi Zero is undoubtedly very slow in terms of modern computing performance, but a tiny cluster like this is ideal for teaching, or even developing distributed software that can then be ported to much more powerful systems.
Making a ClusterHAT case
Ever since I discovered how easy laser-cutting was a couple of months ago, I haven’t been able to stop making new computer cases... so here is one that you can download the design for, customise if you want to, and laser-cut in 3mm extruded acrylic (perspex):
For the initial 2D design of the box I used my Making a laser-cut box/case with elastic clips web app, and then added holes for Ethernet, HDMI, USB and Video/Audio jack using the free Inkscape application, ready for exporting to the laser cutter. Each colour is a different pass of the laser, at different power/speed levels, so the green lines are cut first to make holes for ports, pink are extra cuts to help extract delicate parts, orange is text/lines that are etched and finally blue cuts the outside of each panel. Download files for laser cutting (requires approx 280×150mm) on a 3mm thick sheet:
The Pi 3 is mounted in the case with 4× M2.5 hex spacer/standoffs which screw though the PCB into the 12mm standoffs that the ClusterHAT uses, and there are 4× screws left over to fix the entire cluster to the bottom of the case. The adhesive feet from the ClusterHAT look much better on the bottom of the acrylic case, rather than stuck directly onto the PCB...
I don’t know if small cases are harder to design, but I got through 6 or 7 prototype cases in MDF before I had the position & sizing of the ports just right! I only exposed the Audio/Video port to make it easier to connect through the case to the microUSB and HDMI ports (because the A/V port sticks out from the PCB). And then my largest microUSB plug wouldn’t fit through the case holes. It was finally? perfect until I tried to put the microSD cards into the Pi Zeros, and realised I needed another 0.5mm of clearance at that end of the case. Doh.
Bill of materials
3mm extruded clear perspex 280×150mm | £0.93 |
Laser cutting charge | n/a |
M2.5 nylon hex spacer 5mm female+6mm male (4 from 20 pack) | £1.17 |
Total for case, inc P&P | £2.10 |
ClusterHAT, inc 4× M2.5 spacers, 8× screws, 4× feet | £28.00 |
Raspberry Pi Zero (4 pack... good luck ) | £18.50 |
Raspberry Pi 3 | £32.00 |
2.5A 5V power supply | £7.50 |
Kingston class 10 16Gb microSDHC card (5 pack) | £16.95 |
Total for everything, inc P&P | £105.05 |
Future improvements/ideas
- Boot over network would save the cost of 4× microSDHC cards for the Pi Zeros, network booting from the Pi 3 instead. Not yet possible?
Clusters of other Single Board Computers
So far I’ve built clusters using the following ARM boards:- DIY 5 Node Cluster of Raspberry Pi 3s
- 40-core ARM cluster using the NanoPC-T3
- 5 Node Cluster of Orange Pi Plus 2e
- Bargain 5 Node Cluster of PINE A64+
- ClusterHAT with 4× Raspberry Pi Zero
- 96-core ARM supercomputer using the NanoPi-Fire3
Nick Smith, August 2016.