2019-01-06

Build farm, version 1 (2015-2016)

Context

Being a developer and having built many thousands of Linux kernels over the last two decades, I can say that build time is something which really counts. I've always been interested in distributed build systems, I started to explore these possibilities in 1996 at the university... Nowadays hardware is much cheaper, there's a lot of choice, and lots of software possibilities as well, starting with distcc.

First serious attempt

When reading this article about a very powerful tiny quad-core device called T034 in late 2014, I figured that it was about time to give it a try. The device features 4 Cortex A17 cores which by then were the fastest ARM cores available. I could test the device in early 2015 and published the results here. By then this device was sold with an Android image, and it took me a long time to manage to get Linux to work reliably on it, and even more to recover correct performance (the RAM was running at 200 MHz stock, the CPU was limited to 1.6 GHz etc). Once reliably achieved, I ordered 4 such devices, which by then were replaced in stores by an exact copy with a new name, CS008, for the same price, about $65/piece :


The devices were assembled together, connected to an L-shaped micro-USB connector. The cables were cut to reduce losses, a DC power meter was installed on each device, glued to the unused HDMI connection, and a 4x4x1cm heat sink was installed on top of the original aluminum thermal plate:



The power was fed from a 19V power supply after conversion via a 5V/3A DC/DC converter:



First disappointment

The devices booted well, installed well, but when starting a kernel build, the power suddenly cut off. I thought that the devices consumed too much, but in reality it was the DC/DC power converters which were not precise enough. The SYR927/928 voltage regulators on the boards support an absolute maximum of 5.5V. Two boards had died in the first few seconds, indicating an unexpected over-voltage caused by the huge power variations on the boards, that the regulators do not react to fast enough, leading to much more than 5.0V when the CPU's power usage drops. I had to order two more boards to replace the defective ones, which were not fixable.

Better solution for the power supply

More serious solutions were appearing in early 2016, like these 5-port 50W USB power supply, that I ordered :


By the time this device arrived, I found quite a robust DC/DC regulator that I could salvage from a dead power supply. It took 12V in, 5.1V out under 30A, exactly what I needed. So I identified the pin-out, connected it on a piece of experimentation board with some decoupling capacitors and USB female connectors :


It's not pretty but it works fine. Since then I've added a 5th connector and the device never failed, powering up to 5 boards at full speed, spreading very little heat, indicating it's extremely efficient.
Overclocking attempts on the boards showed that at 2 GHz they could drain up to 3-4 Amps for short peaks, especially due to losses in the micro-USB connector (up to 450 mV lost there, up to 700 mV between the connector and the on-board regulation chip).

The ordered power supply arrived. It would occasionally cut off during builds. I had to patch it to disable over-current detection. It's well made, there's one such detector per port, made of a 50 mOhm resistor. But this one causes too many losses under high current, so I shorted it. Since then this power supply has always been working flawlessly, and it's much better packaged than the hack above:

Final status

The boards were stacked on top of a 5-port Gigabit Ethernet switch, itself powered by the same power supply; Everything was connected using 20cm Ethernet cables. By the way, I had to order 10 of them to find 5 working ones, the build quality was terrible. The power meters were removed as barely unreadable, and new L-shaped connectors were used :


This solution worked well enough for tests and demos, but the boards would overheat and occasionally hang. It was necessary to use a heartbeat LED trigger to know which ones were working reliably and which ones were hung. The boards were of very low build quality, some had to be fixed due to poorly soldered components, so the freezing issues were not very surprising. The DDR chips were marked with the reference of a DDR3 one while they didn't even match the form factor, they definitely were DDR2 chips. Oh and there was no easy way to keep all the boards together, I tried to make a plastic piece, then hard foam, then I soldered a thick copper plate to all HDMI connectors, nothing was very good nor a durable solution.

However, despite the stability issues, the performance was quite good thanks to the impressive power of the RK3288 SoC. A full-modules kernel build went down from 45mn to 13mn on my laptop (core i5-3320M at 3.1 GHz) when using the build farm. When factoring in the total cost of the solution, around $280, it's impressively efficient.

A solution was thus needed to improve the quality and stability, but the CPU was convincing.

More information

A presentation of this build farm was given at Kernel Recipes 2016 and was featured on LWN. The first one includes the video showing a live build of the complete kernel.

Version 2 of the build farm is described here. Version 3 of the build farm is described here.

No comments:

Post a Comment