2024-05-21

An affordable 10GbE capable NAS

Background

Two weeks ago, in early May, Tom Cubie of Radxa asked me if I would be interested in testing their new ROCK 5 ITX board. I've followed a little bit the development of this really nice board, and Tom probably valued some of my tests and comments during the ROCK 5B debug party. And he knows I'm not shy when I disagree with certain choices, and he confirmed he doesn't seek a complacent review at all, so that was enough for me to accept the offer, because I'm one of those who believe that making issues public is the most efficient way to collectively find the best solution to them.

First impressions

Unpacking

A week later I received a parcel from DHL. First impression, the board is packaged just like a PC mother board. There's a rationale for this, it's an ITX form factor, aiming at being installed in a PC enclosure. The customer must feel in known territory when installing it ;-)



Front panel and buttons

Like on a PC mother board, there's a front panel connector with Power switch, Power LED, Reset switch, HDD LED. These are nice improvements since many Arm-based boards are missing a reset button which would be appreciated during development and kernel porting. However, here a button could have been placed on the right of the SPDIF connector for debugging periods when the card is left on a desk without anything connected to this front panel connector. That's no big deal since a screw driver suffices to make contact to the pins, but it would be cleaner. Some PC mother boards have adopted this principle nowadays by the way:

Installing a cooling solution

One excellent design choice on this board concerns the cooling. Radxa adopted a design compatible with LGA115x thermal solutions. This means that instead of having to resort to mixes of inefficient and inconvenient solutions as is often the case, here reusing an old PC heat sink will work. I found one from an old 1U server in my tray and decided to install it. It even has the PWM pin to control the fan's speed (which I won't use except for testing).

Installing a serial console

Another point to note regarding the connectors is that there is no externally accessible serial console, though there's a pin header on the board next to the micro-SD connector. Serial connectors are still needed in the Arm world because of the boot loaders. While you can often do most of the day-to-day operation using an HDMI display and a USB keyboard, each time the machine fails to boot, the only option is to pick the screw driver, open the box and connect a UART connector inside to fix the boot problem. This tends to be less of a problem with systems adopting the Arm SystemReady approach which provides a PC-like UEFI BIOS where you can really control everything from the early boot, and normally don't have to fiddle with low-level commands just to load a recovery kernel. Here there's no UEFI at the moment so the only recovery option is the serial port. And in general I don't like the idea of having to plug/unplug a screen connector and move it around in the rack between all my machines, it even happens quite often that a sick machine fails to enable the frame buffer and display anything. The USB serial console is much easier to use and allows for multiple machines to cross-connect so that all your machines in the rack are accessible at the same time from the same display.

I noticed that the connector has the same pinout as more and more boards I'm seeing these days, so I could reuse an adapter I prepared for another board. This one is cheap and based on a CH340N. It's tiny, only requires to solder 3 wires, doesn't cause trouble when not powered, and support Rockchip's speed of 1.5 Mbps.

Surprise of the power connector

When trying to plug the 12V adapter cable that was lying on my desk, I noticed it wouldn't enter. I looked closer and saw a huge central pin. Grrr... it's a 2.5mm one. These ones are really really not common. I checked all my adapters here (about 20, all voltages included). None of them had a 2.5mm connector, all were 2.1mm. Fortunately I had a 2.5mm male jack and a 2.1mm female one, so I could make an adapter to connect the 12V power block.

I suspect that the reason for using a larger connector than usual is to make sure users don't accidentally connect a laptop 19V power input. That's understandable of course. Another option could be to make the board accept a wider input voltage range. Some PC boards do this. For example some boards will take 8 to 25V, and will only need more than 12 if they really need to deliver 12V. Most of the time the 12V pins are not used, they're basically only used for SATA spinning disks, but most often not even for SSDs.


First power up

Once plugged in, the console immediately shows that it sees 8 GB of LPDDR5 DRAM installed as 4 banks of 2 GB each and that the speed is configured to 2400 MHz, hence 4800 MT/s:

DDR 9fffbe1e78 cym 24/02/04-10:09:20,fwver: v1.16
LPDDR5, 2400MHz
channel[0] BW=16 Col=10 Bk=16 CS0 Row=16 CS=1 Die BW=16 Size=2048MB
channel[1] BW=16 Col=10 Bk=16 CS0 Row=16 CS=1 Die BW=16 Size=2048MB
channel[2] BW=16 Col=10 Bk=16 CS0 Row=16 CS=1 Die BW=16 Size=2048MB
channel[3] BW=16 Col=10 Bk=16 CS0 Row=16 CS=1 Die BW=16 Size=2048MB
Manufacturer ID:0x6
CH0 RX Vref:29.7%, TX Vref:19.0%,0.0%
CH1 RX Vref:31.0%, TX Vref:19.0%,0.0%
CH2 RX Vref:31.8%, TX Vref:20.0%,0.0%
CH3 RX Vref:28.5%, TX Vref:20.0%,0.0%
change to F1: 534MHz
change to F2: 1320MHz
change to F3: 1968MHz
change to F0: 2400MHz

The system starts to boot to an pre-installed debian 11 image and presents a login prompt showing that the host name is called "roobi". The console is polluted a bit by some bluetooth messages (there's no BT device on this board):

[   24.409994] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)
[   24.463642] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)
[   24.685977] Bluetooth: hci0: command 0xfc18 tx timeout

Debian GNU/Linux 11 roobi ttyFIQ0

roobi login: [   27.981985] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)
[   32.792957] Bluetooth: hci0: BCM: failed to write update baudrate (-110)
[   32.793062] Bluetooth: hci0: Failed to set baudrate
[   34.926158] Bluetooth: hci0: command 0x0c03 tx timeout
[   42.819615] Bluetooth: hci0: BCM: Reset failed (-110)

roobi login:

At this point I tried many combinations of"root/root", "rock/rock", "radxa/radxa", "roobi/roobi", but none worked, so I looked on the net and couldn't find any relevant info. When rebooting I noticed that u-boot proposes a second boot choice:

U-Boot menu
1:	Debian GNU/Linux 11 (bullseye) 5.10.110-33-rockchip
2:	Debian GNU/Linux 11 (bullseye) 5.10.110-33-rockchip (rescue target)
Enter choice: 2

But the result is basically the same, no valid login/password found:

Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.
[   24.167338] Bluetooth: hci0: command 0xfc18 tx timeout
[   32.060943] Bluetooth: hci0: BCM: failed to write update baudrate (-110)
[   32.061077] Bluetooth: hci0: Failed to set baudrate
[   34.194156] Bluetooth: hci0: command 0x0c03 tx timeout
[   42.087379] Bluetooth: hci0: BCM: Reset failed (-110)


[   87.668622] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)
[   87.735524] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)


Debian GNU/Linux 11 roobi ttyFIQ0

roobi login: [   91.286888] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)

Request for help and first surprises

Since the board is pretty new, I suspected that the login/pass were well-known by a few users and not yet put into an easy to find documentation. I knew that a few other people had got their hands on this board as well, so I asked for help on the Radxa forum. Thomas Kaiser responded, suggesting that apparently I was not supposed to log in there, because this "roobi" image was in fact an installer that's supposed to be used via a keyboard+display or via a browser. Some doc for it is found here.

A first feeling of over-engineering and needless complexity started to build up. Usually it's as simple as downloading an image of choice, writing it on a micro-SD or USB thumb drive, plugging it and booting off it, and you're done. I generally don't like installers that tend to half-work and not to let you decide what nor how you install, or easily leave the system in an unrecoverable state. Thomas found in the image build scripts that a user "ps/ps" is created.

I tried it and this user worked, even though it shows some syntax errors in some scripts:

roobi login: ps
Password: 
Linux roobi 5.10.110-33-rockchip #65700d485 SMP Wed Apr 3 04:26:57 UTC 2024 aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Fri May 17 19:29:29 UTC 2024 on tty1
-bash: [: : integer expression expected
-bash: [: : integer expression expected
-bash: [: : integer expression expected
ps@roobi:~$

This account is allowed to sudo and from there it's possible to create new users and install packages, so I could run some frequency tests and DRAM latency tests locally. BTW, sshd is already present and enabled, and gcc is there as well.

After all these tests were done, I tried to connect using a browser to this machine's IP on port 80. I was presented with an installation screen, that required me to contact the power connector 3 times to validate that I was on the right board:

Due to remote access to this device, authentication is required.
After clicking the start button, please press the power button
three times within 60 seconds to complete the verification.

Note that I still had a root shell on this machine over the network, so it's definitely not a security measure, most likely it's just a way to avoid mistakes when installing multiple boards in parallel.

From there I could choose to install one between two possible debian images (how one is supposed to install other operating systems with this is unknown for now, maybe one needs to enter a specific URL, still complicated when you already have your image on an SD),.

When choosing the installation target, the installer lists available block devices. Here, "no device available" is displayed.

Thomas suggested that Radxa did not intend for the eMMC to be usable by the end user and instead waste it to host this absurdly huge installer (a full fledged debian distro). Now this is where it sounds absurd: they design a really great device, that corresponds exactly to the design one would expect as a server, with the right choice of connectivity, storage and extensions, but someone passes after this and says "no, please leave me the eMMC, I would like the installer to stay there forever so that I can spend my life reinstalling this board every day".

OK, I'm a bit sarcastic, but why suddenly ruin valid and efficient use cases for the sake of keeping an installer there that you'll need only once in the product's life ? Micro-SD is made for this! There are so many owners of competing boards that lack eMMC and that are asking to get one that it's really not morally acceptable to sacrifice the eMMC for an unused installer.

Migration of the installer to SD

The eMMC's layout is the following:

  • 0 to 16MB: U-Boot, before the first partition 
  • 16 to 32MB: partition 1, 16MB FAT, mounted in /config
  • 32 to 332MB: partition 2, 300 MB EFI, unused for now
  • 332MB to 7.3G: partition 3, 7G ext4, debian 11 for installer

More precisely it looks like this:

$ sudo fdisk -l /dev/mmcblk0
Disk /dev/mmcblk0: 7.28 GiB, 7818182656 bytes, 15269888 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 388778EF-E065-4B77-A699-D0C2B1832E62

Device          Start      End  Sectors  Size Type
/dev/mmcblk0p1  32768    65535    32768   16M Microsoft basic data
/dev/mmcblk0p2  65536   679935   614400  300M EFI System
/dev/mmcblk0p3 679936 15269854 14589919    7G EFI System

Let's have a look at what options are offered to us by U-Boot. For this, we need to enter U-Boot. It interrupts during the first second when it lists the available system images, just pressing Ctrl-C returns to the U-Boot prompt:

=> printenv
... boot_targets=nvme0 mmc1 mmc0 mtd2 mtd1 mtd0 usb0 pxe dhcp
...
distro_bootcmd= ... for target in ${boot_targets}; do run bootcmd_${target}; done

OK so the boot loader will check mmc1 (micro-SD) before mmc0 (eMMC). Let's just copy eMMC to a 8 GB micro-SD and try again. The boot loader will still be loaded from the eMMC (first 16 MB), and the kernel loaded from the SD. We'd like the OS to load from the SD as well, so for this we'll need to rename the UUID on the SD and adjust it in the extlinux.conf file so that only the SD is used. Let's reboot to the installer, insert a 8 GB minimum micro-SD, and prepare it this way:

roobi login: ps
Password:
ps@roobi:~$ sudo -s

# eval $(blkid -o export /dev/mmcblk0p3)
# echo $UUID
b055efba-0f72-448b-927c-07f40f2714c8
# NEW_UUID=$(cat /proc/sys/kernel/random/uuid)
# echo $NEW_UUID
5ee2e87c-0db4-4235-a5ec-aabe607d9c48

# dd if=/dev/mmcblk0 of=/dev/mmcblk1 bs=1M status=progress
# cat /proc/partitions
# e2fsck -f /dev/mmcblk1p3
# tune2fs -U $NEW_UUID /dev/mmcblk1p3
# mount /dev/mmcblk1p3 /mnt/
# sed -i -e "s/$UUID/$NEW_UUID/g" /mnt/boot/extlinux/extlinux.conf
# umount /mnt/
# reboot

Now the system reboots from the micro-SD. Logging into the system shows that it has only mounted the SD (mmcblk1) and not the eMMC (mmcblk0).

Installation to eMMC

Using the browser again to connect to the installer shows that now it properly lists /dev/mmcblk0 as an available installation device. Yay!

However when trying to continue, it starts to scare you by suggesting that everything will be wiped:

This made me hesitate for a while, but I thought it wouldn't make sense to wipe the boot loader parts on a target device from which the system later hopes to possibly boot (e.g. a micro-SD). So I finally confirmed and it installed on it. It took a few minutes after which it automatically rebooted. Of course, since I had left the SD card in, it booted again on the installer, but this showed me that it hadn't wiped the boot loader at least. Removing the SD card and booting again this time ended up with the Debian 11 prompt corresponding to the new image:

Debian GNU/Linux 11 rock-5-itx ttyFIQ0


roobi login: rock
Password:
Linux rock-5-itx 5.10.110-33-rockchip #65700d485 SMP Wed Apr 3 04:26:57 UTC 2024 aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
rock@rock-5-itx:~$

The contents of /etc/passwd show that radxa/radxa is also valid. It's nice to see that the distro is not too fat and that there's plenty of room left to install whatever one wants to install on their NAS:

$ df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 3929848 0 3929848 0% /dev
tmpfs 813580 1700 811880 1% /run
/dev/mmcblk0p3 7116452 1396500 5393508 21% /
tmpfs 4067900 0 4067900 0% /dev/shm
tmpfs 5120 4 5116 1% /run/lock
/dev/mmcblk0p1 16112 1 16111 1% /config
tmpfs 813580 4 813576 1% /run/user/1001

Pfew... that was a bit more complex than usual and than needed but it was worth it in the end. Now we have the target operating system set up and running on the board. One annoying point with such an installer is that you copy a pre-installed distro to your system, you're not offered the choice to choose the FS layout for example. It's as if you just used "dd" to dump the .img to the target device.

In fact what should be done with this installer is that it lets you select an installation image for the distro of your choice, that you deposit on a micro-SD and that you then boot from to install on the target device(s).

Connecting SATA SSDs

For a NAS, one needs to have storage devices and cabling. I didn't intend to receive the board that fast and I thought I didn't have power cables. But after digging in my boxes, I managed to find sufficient cable converters to connect 4 devices. I did have 4 used 120 GB intel SSD. They're not extremely fast but at least they do work so I started to hack with them. The result looks like an ugly octopussy :-)




Running a simple basic test of all disks at once

I'd intend to use these SSDs in RAID5, as an NFS server doing mostly reads though writes will be needed as well of course. Let's first see what the whole devices are capable of in terms of read speed. For this I'm just running vmstat 1 while reading from 1 disk first, then from all 4 disks at once:

# dd if=/dev/sda of=/dev/null bs=1M

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 7851296 8552 95092 0 0 0 0 110 114 0 0 100 0 0
1 0 0 7625296 233644 95240 0 0 224800 0 972 757 0 5 95 0 0
0 1 0 7301748 556392 95900 0 0 323072 0 1283 1053 0 5 95 1 0
1 0 0 6977948 879464 96804 0 0 323072 0 1289 1073 0 5 95 1 0
0 1 0 6653140 1203560 97540 0 0 324096 0 1301 1080 0 5 94 1 0
0 1 0 6328852 1527144 98204 0 0 323584 0 1289 1084 0 5 95 1 0

# for i in a b c d; do dd if=/dev/sd$i of=/dev/null bs=1M & done

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 7848444 8548 93428 0 0 0 0 97 116 0 0 100 0 0
0 4 0 6773340 1082724 95060 0 0 1074224 0 3942 3252 0 17 53 30 0
1 3 0 5573384 2279780 97580 0 0 1197056 0 3932 3607 0 8 49 42 0
1 3 0 4365080 3485540 100276 0 0 1205760 0 3962 3657 0 7 50 43 0
2 2 0 3155736 4696504 103184 0 0 1210880 0 4055 3668 0 10 49 41 0
0 4 0 1927256 5922148 105972 0 0 1225728 0 4043 3705 0 8 50 42 0
0 4 0 711880 7135076 108428 0 0 1212928 0 4010 3670 0 9 49 42 0
3 1 0 294572 7548632 110632 0 0 1211904 0 4194 3993 0 11 49 40 0

This test was running on the big cores, which were mostly waiting for the disks (42%) and using little CPU (8-10%). Running on the little cores instead showed almost the same performance:

# for i in a b c d; do taskset -c 0-3 dd if=/dev/sd$i of=/dev/null bs=1M & done

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 7851584 8548 94188 0 0 0 0 98 102 0 0 100 0 0
2 2 0 7679972 179556 93864 0 0 171076 0 819 530 0 10 89 1 0
1 3 0 6508936 1348452 96700 0 0 1168896 0 3931 3619 0 19 72 8 0
3 1 0 5343568 2511204 98952 0 0 1162752 0 3852 3595 0 17 74 9 0
3 1 0 4177208 3674548 101952 0 0 1163264 0 3837 3617 0 18 73 9 0
4 0 0 2993412 4855344 104268 0 0 1180672 0 3951 3660 0 19 72 9 0
0 4 0 1807820 6038372 107168 0 0 1183232 0 3953 3650 0 18 73 9 0
1 3 0 630820 7212876 109680 0 0 1174016 0 3931 3663 0 19 72 9 0

Here the CPU is used approximately at twice the load, and iowait is much smaller, indicating that there's less margin on the CPU. But the performance is almost the same, at 1.17 GB/s vs 1.21 for the big cores.

One will note that this bandwidth is slightly smaller than the sum of all 4 SSDs (4*323 = 1292 MB/s vs 1210 MB/s measured). Could we be hitting a wall ? Note that for 10 GbE it's fine because that's slightly above the limit of what one can transfer over TCP at 10 Gbps. But it's still interesting to know.

The SATA controller is connected using 2 PCIe 3.0 lanes. Each lane is running at 8 GT/s, encoded using 128/130 coding (128 bits transported over 130 bits). The MaxPayload is at the minimum, 128 bytes:

root@rock-5-itx:~# lspci -nnvv -s 0001:11:00.0 | grep -A2 Ctl:
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 256 bytes
--
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x2 (ok)

This is a 64-bit system so the PCIe overhead is 26 bytes. Thus each transfer of 128 bytes requires extra 26 bytes, that's 154 bytes total, or a transfer efficiency of 83.1%, which becomes 81.8% once the wire encoding is taken into account. Thus the absolute maximum bandwidth with the SATA controller, not taking into account commands and control stuff, is 81.8% of 2*8 GT/s = 13.09 Gbps, or 1.63 GB/s or 1.52 GiB/s. We're not there yet, only at ~75% but the margin is small. Anyway, as previously said, we're already reaching the speed that at 10 GbE Ethernet adapter could deliver, so there's nothing lost here, and the choice of assigning 2 lanes to the SATA controller was the best one.

Connecting a 10 GbE NIC

For a 10 GbE NAS, one will need a 10 GbE NIC. I already had such a device that I bought for testing the ROCK 5B, and knew that it now works pretty well. The NIC fits nicely into the M.2 adapter, and has fixature for the RJ45 port to use a regular slot:

This NIC is made around a chip ACQ107 and requires the module called "atlantic" that's enabled via "CONFIG_AQTION" under "CONFIG_NET_VENDOR_AQUANTIA". Unfortunately it's not enabled in the default kernel. I needed to rebuild the kernel, but it's a bit complicated to find the relevant sources. The wiki says that there's a bsp tool used to download, patch and build the kernel. I already have tons of kernel versions here ("git branch | wc -l" reports more than 400) and having to go through such a process when you know you already have most of it, and having to learn yet another tool is not the most developer-friendly solution IMHO.

With that said, at least "bsp" is a clean and readable script, at least it's not a Python horror that requires to download half of Github. But it's still annoying to have to read a script to figure what directory's config file to read to find the kernel's URL. I found a few candidate URLs, one of them being Joshua Riek's rk-5.10-rkr6. (I later found that it was apparently linux-5.10-gen-rkr3.4 from Radxa's repo that ought to be used).

I also noticed a 6.1 kernel ("linux-6.1-stan-rkr1"). I tried this one first, at least to gauge the progress of the porting to newer kernels. This one properly detected the 10GbE NIC but not the SATA, there were PCIe errors and I'm wondering if it properly applies the bifurcation to see two distinct devices. It was a bit late at night, so I quickly gave up. Instead I tried with "rk-5.10-rkr6" that's based on 5.10.160 (vs 5.10.110 for rkr3.4), and everything works there, both SSD and the 10GbE NIC!

BTW for those interested in using the repositories above, issuing "make rockchip_defconfig" is all you need to do to configure the base kernel (after setting your ARCH and CROSS_COMPILE as usual, of course).

Basic testing of the network speed

I'm used to measuring the bit rate using the if_rate utility that I adopted many years ago after it looked unmaintained. It's simple and convenient and supports a vmstat-like output that eases monitoring and logging during tests:

$ git clone https://github.com/wtarreau/if_rate
$ cd if_rate
$ make
$ ./bin/if_rate -l -i enp1s0 1
# time enp1s0(ikb ipk okb opk)
1716215181 0.0 0.0 0.0 0.0
1716215182 0.0 0.0 0.0 0.0
1716215183 0.0 0.0 0.0 0.0
1716215184 0.0 0.0 0.0 0.0
1716215185 28290.2 53104.4 3698500.4 305417.7
1716215186 89379.8 168813.3 9974752.8 823541.1
1716215187 90188.3 170637.7 9940160.9 820688.8
1716215188 90498.4 171225.5 9974026.0 823474.4
1716215189 89900.4 170266.6 9974559.0 823537.7
1716215190 90303.3 171029.9 10016757.3 827012.2
1716215191 88657.9 167802.2 9941214.6 820773.3
1716215192 90085.0 170447.7 9974735.9 823542.2
1716215193 90063.5 170265.5 10006728.4 826183.3

In parallel a simple netcat of /dev/zero was being sent over the network to another 10GbE machine on the network. So the NIC is properly able to saturate the wire on output, that's what we needed to check.

Installing everything in an enclosure

Choice of enclosure

One of my ancient file servers was based on an ATOM D510 and was installed in a really nice APLUS CS-CUPID 2 ITX enclosure, so I decided to remove the board and reuse this perfect enclosure:



Installing the SSDs

The enclosure comes with a frame to carry up to two 3.5" disks, so I needed a way to attach all 4 2.5" SSD frames to it. Since I had already done something similar in the past, it was easy to replicate. I measured that everything would fit with a 14mm pitch between devices, so I used my laser cutter to make a support:



Fixing the serial port

The serial port was fixed thanks to a metal square to which I screwed an unused piece of PCB on top of which the adapter was stuck using double-sided tape (it's not very clean but it works):


Final assembly

The final result looks like this, with the board, the 4 SSD, the 10GbE NIC and connector and the console connector at the bottom. The enclosure's power board is not used for now.


Running a full load test

Now that we have all the components together, let's see what the whole system is capable of. The aim will ultimately be to turn this into a real NAS server that will replace my local server currently running on an Odroid-H3 once it works fine with a mainline kernel. The indications based on the various measurements to date are that the device should be 10G-capable.

I didn't want to go into setting up NFS etc. So in order to verify the ability of the board to pull 10G from the disks to the network, I simply installed NGINX, created a RAID5 array on the disks, and created a set of 32 files 1GB in size so that the total work set cannot fit into memory (8G).

On the other machine (a Core2 Quad with a similar NIC), I started 32 instances of h1load each requesting its own file in loops on a single connection (to avoid the risk of reuse). In parallel I collected CPU usage, network bit rate and the I/O rate (that I turned to bits per second for the scale, by multiplying by 8). This gives this (I stopped the test around 4 minutes):


The fact that the disk I/O is always a bit lower than 10G is in part that there's ethernet, ip and TCP overhead (94.1% efficiency) and in part because depending on the load order, it may occasionally happen that some pieces are still in cache. Regardless, the 10G cable was full flat, and the CPU was at less than 50%, which indicates quite some headroom. More modern SSDs than these old ones would also probably do a better job at keeping the cable full.

Power measurements

Note that the following measurements were already reported on the Radxa's forum here. I finally replaced the SSDs with slightly more recent 2x intel X25M 160 GB and 2x intel 530 180 GB, and conducted a power measurement using various methods:

  • feeding 12V into the motherboard's jack
  • feeding 12V into the enclosure's power board (I only later noticed that it's supposed to be fed 19V, it should be tested again)
  • feeding 12V into an aliexpress 12V jack-to-ATX "160W" adapter:


The power was measured using an ampmeter in series with the 12V connector, and a voltmeter in parallel at the closest possible to the connector:


Here are the measurements:

  • 12V via the aliexpress ATX 160W adapter: 1.06 * 11.83 = 12.54W
  • 12V via the enclosure’s adapter board: 1.16A * 11.95V = 13.86W
  • 12V via the motherboard’s jack, ATX adapter still connected: 1.15A * 11.95V = 13.74W
  • 12V via the motherboard’s jack, ATX unplugged: 1.02A * 11.96V = 12.20W

Thus the board’s power design looks extremely efficient, beating the other two. There’s 1.7W saved here by powering the board via its own jack instead of the enclosure’s adapter.

In addition I measured the individual power draw of various components, all powered from the motherboard’s jack since it's the most efficient:

  • removed all SSDs: 0.81*12.03 = 9.74W => 2.45W drawn by the 4 SSDs in idle
  • no SSD and 10GbE link down: 0.59 * 12.14W = 7.16W => the 10GbE RJ45 link draws 2.58W alone.
  • no SSD, 10GbE adapter removed: 0.40 * 12.21 = 4.88W => the 10GbE adapter draws 2.28W with a link down, and 4.86W with a link up

Testing mainline kernel

Apparently according to Collabora's page, the SoC is now quite well supported. I'm interested in seeing how mainline behaves on this board because in my opinion using a BSP kernel is a showstopper for storing data. So I gave a quick try at kernel 6.9 but found no working DTB for now, the kernel is loaded  and no single message is emitted. Usually this indicates a console mapped to a wrong address or a missing node for the UART. I tried to reuse a ROCK 5B DTS to see if it made any difference but no, everything seems to hang. I'll have to investigate later.

PCIe limitations

The organization of the PCIe lines around the SoC on this board is really great, should I say optimal. 2 Gen3 lines are used for SATA, two for M.2,  and each 2.5GbE port uses one Gen2 line.

This means that we're having the same bandwidth between M.2 and SATA, so as long as the CPU is able to move the bytes, there's enough bandwidth to use M.2 for the 10GbE network adapter.

However I noticed that the MaxPayload is set to 128 bytes on all devices while they are all capable of 256:

root@rock-5-itx:~# lspci -vv | grep MaxPayload | cut -f1 -d,
DevCap: MaxPayload 256 bytes
MaxPayload 128 bytes
DevCap: MaxPayload 512 bytes
MaxPayload 128 bytes
DevCap: MaxPayload 256 bytes
MaxPayload 128 bytes
DevCap: MaxPayload 256 bytes
MaxPayload 128 bytes
DevCap: MaxPayload 256 bytes
MaxPayload 128 bytes
DevCap: MaxPayload 256 bytes
MaxPayload 128 bytes
DevCap: MaxPayload 256 bytes
MaxPayload 128 bytes
DevCap: MaxPayload 256 bytes
MaxPayload 128 bytes

All the "DevCap" lines indicate  what the device is capable of. The other ones indicate what was negotiated. 128 bytes cause an efficiency of 81.8% (128/(128+26)*128/130), while 256 bytes would reach 89.4% (256/(256+26)*128/130) and achieve a 9.2% performance increase. It might be worth finding what is causing this limitation.

Closing words

That's all for now. This board is amazing from a hardware perspective. First, it looks extremely clean and well designed. Second, its I/O distribution makes an optimal use of the SoC's capability. The SoC doesn't heat that much and I managed to leave the fan disconnected during operation and this will be its target state anyway. The onboard DC-DC converters show a much higher efficiency than the two other options I tested, which also indicates a choice of great components.

I missed a reset button on the board, and a USB console connector on the back (there's not much room for this at the back, but maybe some combo connectors now exist with an extra USB-C connector that could appear above / below the RJ45 connectors for example, or maybe atop the existing USB-C one). If / when the board adopts a UEFI installer (the SPI NOR still remains empty), then the console will no longer be needed.

One point that I really disliked is the annoying Roobi installer that made everything more complicated than usual. Furthermore, the fact that it confiscates the only storage available to put an operating system seriously needs to be revisited. This is a totally bogus choice. I'm definitely not going to install the OS on a micro-SD, that's the place for an installer. And I'm not going to put the OS on a data disk either. Having had to deal with that painful experience in the past, making it super complicated to exchange data disks when trying to recover data or just for a migration, I seriously don't want to do that again.

Thomas Kaiser showed me that Radxa recently merged a patch in their kernel to disable the eMMC, and only reserve it for the installer. Not only this makes no sense, but it just voids the interest of the product for me if it only leaves me with the option to remove one data disk and cut the I/O performance and storage capacity by 1/3. And by the way, the 16MB SPI NOR is still unused and 16MB is plenty to store an installer, I'm personally stuffing full-featured OS on that on other machines!

Just like with the ROCK 5B, I'll keep this device reserved for testing for as long as it will not have an LTS mainline kernel available (likely by the end of the year). I've ordered new SSDs to run better tests, and I'll have to run many more tests and also to test other distros (notably Slackware ARM64). I'll also evaluate how it deals with HTTP/HTTPS load balancing now that it has a good NIC ;-)

2024-03-10

Optimizing the shelf life of battery-powered devices


Background

For a few years now I've been placing small lithium batteries in a lot of devices, including a digital camera, a multi-meter, an RC car remote controller, a table vacuum cleaner, a few UPS etc.

Unexpectedly, one of the devices that required the most adjustments was the voltmeter:

I carry it everywhere in my computer bag and don't use it very often. As usual with lithium battery powered devices, it's difficult to have a gauge because the battery is connected to a regulated DC-DC converter so you discover that the battery is depleted when you need to use it. I drilled a hole to plug a micro-USB connector to recharge it but it's quite annoying to have to wait before using it:

So it was about time to investigate how to improve this.

Opening the assembly again

The voltmeter displays the "batt low" warning once the voltage goes below 7.00V, which never happens here. I thought about ways to improve that so that the output voltage starts to progressively drop when the battery is getting near its end, but that's not the point for now, as I've been pretty sure that the battery should last longer. Let's have a look inside. We find my good old battery with the DC-DC converter + charger I found back then, I don't even remember the module's name nor reference but similar ones are easy to find on Aliexpress or eBay. That one produces 9V from a single lithium battery and can be charged over micro-USB:



These are two separate chips, a classical TP4056 for the charge, and a B6287q for the conversion:
 

We already see a voltage divider made of a 10k resistor at the top (the one marked "1002") and a 140k at the bottom left (the one marked "1403"). That indeed gives 0.6V from 9V output (9*10/(10+140)). But... wait a minute, 150k serial on 9V should drain quite a bit of power, no ? 9/150k = 60 microamps, or 540 microwatts. Considering 80% conversion efficiency, that's roughly 675 microwatts drained from the 3.7V battery, or 182 microamps! That's sufficient to drain this entire 240mAh battery in 1300 hours or about 50 days. That's no much more than what I'm observing, it's drained in one month.

Let's verify by placing the ammeter in series between the battery and the module:


Bingo, 254 microamps drained from the battery! That's 40 days, roughly in line with my obvservations.

Solution

The solution sounds simple: let's just change the resistors. But these are already high values, it's not certain that the voltage regulator will operate correctly with larger ones. Also many voltage regulators still have a quiescent current that's a bit too high for such a use case.
 
After some research, I found the perfect chip: TI's TPS61040 and TPS61041 have a voltage input range from 1.8 to 6V that's compatible with single-cell lithium batteries, an output from Vin to 28V that's quite sufficient, a switching current of 400mA (or 250mA for the -41 variant), a very low quiescent current of 28µA and a feedback current of only 1µA at 1.3V which allows to use even larger resistors. And the best, it has the same pinout as the existing chip, it just requires to solder the top two pins together, which is not too hard :-)

At such very low current values it requires a bit of experimentation though, because the IFB isn't exactly 1.00µA but around 1µA. In practice I found that using a 2Meg resistor at the top and a 330k at the bottom of the divider were sufficient to provide approx 9V on output, thus very close to the theoretical value (the resistor drains 3.9µA which is still above the theoretical 1µA from the chip).

Tests

Let's assemble this and test again:


Bingo! The current was divided by 11! One will note that the measured value is even lower than the datasheet's quiescent current now.

Measuring the output voltage shows irregularities when the voltmeter is turned off, the output oscillates between 8 and 10V, with 10v peaks every few seconds. This seems to be caused by the low switching frequency: the pulses, despite being short, are sufficient to significantly refill the output capacitor. This does not happen when the voltmeter is turned on, so I don't care. The most important in fact is that it remains slightly above 7V so that the multi-meter always operates in the recommended range without needlessly draining current through the bleeding resistors.

After about two months, I found the voltmeter discharged again but this time it wouldn't appear to take the charge. I found that the regulator's output wouldn't work anymore, it was trying to emit 30V in open circuit and much less when connected, and the battery was totally depleted. I think I finally understood what happened. The battery was an unprotected one because the previous chip likely had an under-voltage protection. But this new chip does not, so it continues to drain current even when the battery is almost dead, and it very likely failed when entering very low voltages which draw more current.

I replaced the chip and used a protected battery instead. Now it might have been 3 months, I've used the multi-meter a few times and it's still operating fine this time.

Thus this time it looks like a solved issue!

Improvements

It would be nice to design a small circuit that would bias the feedback voltage so that a lower battery voltage increases the voltage on the FB pin and lowers the output. But when I tried this in the past it used to always drain significant current. It would be better done directly inside the chips, the battery-focused ones could instead modulate their internal reference in parallel to the battery voltage so that the output varies with the battery voltage, but varies less. On a DC-DC regulator with an external Vref pin it might be sufficient to connect a 200k-to-2M resistor between Vin and Vref to adjust the output voltage. This was the same problem with the digital camera and other devices. This would allow the multi-meter to report a battery low condition. Maybe these things will finally appear over time.

Conclusion

In conclusion, such a mod to replace a 9V battery is quite easy with these boards, it remains easy to recharge the device, and assuming the willingness to replace the SOT23-6 chip and two SMD resistors, the shelf-life after the conversion becomes basically as good as with the original battery. Even without the chip change, the results were very good (sufficient for other use cases).

Here it should last around 10k hours or more than a year without a recharge, not counting leakage, which is why it's not dramatic not to have a "battery low" indicator anymore. And I feel like I'm no longer annoyed with replacing batteries in such not frequently used devices.

2020-09-22

Breadbee: Build your own single board computer

Background

Back in April, the hardware news site cnx-software featured a very nice new small Linux board called Breadbee, that was designed and still being developed by one of the site's readers, Daniel Palmer (aka 'dgp'). The board has many unique particularities that appealed me:

  • it is small, very small, about 3x3cm, barely larger than the Ethernet jack, because yes, at this size, it comes with Ethernet!
  • it is quite cheap, roughly $10 for a complete board in small quantities
  • it is entirely made out of discrete components that are human-friendly and it can be assembled at home!
  • despite its small size, it comes with a 1 GHz Cortex A7 CPU, 64 MB RAM, 16 MB SPI NOR flash, and has plenty of I/Os available.
  • Daniel wants it to be not only blob-free, but also supported by a mainline Linux kernel

Daniel had started assembling and giving away a few of these to various testers. I could have been interested but had no immediate use case for it and didn't want to abuse. But Daniel insisted on sending me the required parts so that I could try to assemble two boards. I guess he was interested in testing how feasible it was for the board to be assembled by an amateur, and in sharing experience to improve the design. With that in mind, I finally accepted, and after a few months (due to lock down here) I received his package with all the components to make two boards plus quite some extras allowing me to fail multiple times.



Presentation of the board

The board exists in two versions, the original one, made by JLCPCB and the new one, made by OSH Park with a much higher quality. I was originally impressed by the precision of the first board until I met a number of difficulties related to the insufficient solder mask (see below), but was totally impressed by the second version! The blue one is the JLCPCB one, the purple one is made by OSH Park.


The board is made around an MStar/SigmaStar MSC313E system-on-chip. This device is normally found in IP cameras. It is a very interesting chip to make a small hand-solderable Linux-based computer, because the RAM chip is integrated, so there are very few pins (80) and it comes in a QFN package which isn't as painful to deal with as BGA. Note that the 0.35mm pin pitch is still very small: pins are 0.18mm wide and spaced 0.17mm. The chip has quite some connectivity, such as USB 2.0 (host/device), Ethernet with integrated PHY, meaning that an Ethernet jack with integrated magnetics is all you need to connect to a network, SPI to access a NOR flash for the operating system, I2S for sound, and plenty of GPIO. The chip itself can be found for ~$2.10/pc by 5, making it an appealing competitor to ESP32 and other WiFi-based chips whenever TCP/IP is involved, because it will allow to run a real operating system with a solid software stack.

It's visible that 1/4 of the board is used by decoupling capacitors and 1/4 by the DC-DC voltage regulation, so depending where the chip is expected to be used, it might be possible to achieve much smaller footprints if needed (e.g. install it inside a USB connector for example).

The board contains a USB-to-serial adapter, namely a CH340E, in TSOP10 form factor, allowing to access the SoC's console from a PC. The same micro-USB port that is used for the console is also used to power the board. I would personally have wired it differently, by powering the CH340E's VCC from the USB connector, so that the PC optionally powers the board. This would ease development by keeping the serial port continually connected even when the board is powered down (from an external power adapter or an onboard switch or jumper allowing to connect it to the micro-USB port). But this is a minor detail.

The boot loader and operating system are stored in an SPI NOR flash. This flash is 16 MB. For those fearing this is a bit tight, a few vendors now provide 32 MB models but rare are those able to provide it in SOP-8 form factor. At least Macronix does with the MX25L25645G, though it's untested yet. Beyond this there are SPI-NAND solutions storing up to 512 MB but requiring a different driver as they may or may not emulate SPI-NOR. Otherwise there are enough I/Os on the chip to attach an SD-card. But honestly when you see this as a potential alternative to replace an ESP32, 16MB ought to be way sufficient for many users!

Assembling the board

 

Prepare everything

Let's be honest on this: assembling the board is not for everybody, you need to have quite some experience soldering SMD components, to have the patience to fix your mistakes and not to get discouraged when you fear you've fried something, because you will make mistakes. In addition you need a few tools:

  • hot air gun with controllable temperature and air flow. You don't need the temperature to be accurate (most are not), you just need to know how to set it to solder or desolder; a little bit of training on an old dead board might help. Even a $30 one like mine is OK. Do not even imagine doing anything right without this because you won't be able to desolder.

  • good quality tweezers. Mine are large and soft and not really convenient. Good ones must be very fine at the edge and rigid enough to perfectly hold the components. For example these ones look correct though I have not tested them.
  • a good soldering iron with a fine tip. Mine was 0.2mm when new, it's probably more 0.4mm now. You need it to be less than 0.5mm wide so that you can selectively solder each 0402 component (0.5mm wide). A wide one is appreciable to remove excess solder around the SoC. A knife-shaped one could be perfect.
  • thin and recent solder paste. Mine is type 4 (25-38µm balls) and was a few months old, making it look thicker and less sticky. A better one is type 5 (15-20µm balls) in a syringe-like dispenser. This is hard to find however, but I'll show some tricks below.
  • some liquid flux in a syringe. The more fluid the better. You will need it to help heat small components to place them correctly, to clean up the pads after you've badly messed up with a solder joint, to remove bridges between pins of the QFN chips, to remove excess solder from some joints, etc. Do not even start if you don't have some.
  • a multimeter, with thin probes.
  • a magnifying glass, and if possible, a microscope. I do have a tiny crappy microscope like this one. It's basically useless as you can't see further than a few mm deep so you can't inspect solder joints. My camera however has a macro mode at ~5000 DPI (about 5µm per pixel) and is extremely convenient to closely inspect the board.
  • some isopropylic alcoohol (IPA) to clean up the flux. If you can't find it, acetone also works but is not as good, and smells a lot. IPA is odorless, and probably less toxic.
  • a toothpick or any thin stick with a 0.5mm tip that can be used to precisely deposit the solder paste. Don't use a needle, that's too thin and the paste will not stick to it. A nail might work though (not tested).

If you're missing some tools, don't get too impatient, better order them and wait for them to arrive than ruin your board.


Use a large nozzle with the hot air gun

Contrary to what could seem intuitive when soldering 0.5mm wide components, you'll need a large nozzle on the hot air gun. The one approximately as large as the SoC is OK. The reasons for this are multiple:

  • the solder paste will not melt until the pads are hot enough, which means the PCB itself must be hot enough. With a thin nozzle, it will take ages to heat the PCB, and you can't easily stay perfectly aligned with the pads you're targetting without moving;
  • the lower air flow on the small one tends to encourage increasing the temperature to get solder joints to actually melt, but the higher temperature also results in burning plastics when you have connectors on your board;
  • you'd think that you can act individually on each component with a small nozzle, but it's almost the opposite. Usually there are just a few seconds of delay between adjacent solder joints start melting, since it mostly depends on the PCB's temperature. However with a small nozzle, you'll have a much faster air flow on output, which will quickly blow displace lightest components and pack them together. On the opposite, with a larger nozzle you'll use a slower and softer air flow that you send vertically. Many components will then see their joints melt but they have no reason for moving under this slow air flow, and using the tweezers you pick only the one you're interested in.
  • you'll need a large one anyway for the SoC and you certainly don't want to try to change it while it's hot.

 

Familiarize yourself with the board layout and print it

You don't want to discover the components while soldering them. Take some time to familiarize yourself with the board's schematic and its layout, to roughly understand what component does what and to locate them on the board. From memory I can say that at the middle top there's the DC-DC converter, on the right there's the NOR, below it are some decoupling capacitors and ethernet resistors, below is the crystal oscillator, middle bottom are other capacitors and the reset circuit, on the middle left, just 2 capacitors and test points, and above is the USB-serial adapter and 3 other capacitors. Simply knowing this helps locating the components you're about to place, and also helps avoid mistakes. Daniel provided a nice dynamic imprint that allows to spot components and their locations.

I personally like to print the whole layout with all components names and values on a full size sheet of paper that's placed on the table under the device I'm working on. And seeing the copper tracks on this layout definitely helps understand if a bridge between two solder joints is expected or not. I also like to note the various voltage measurement points, that provides a quick help during debugging.
 

 

Prepare all the small components

For most components that come in rolls, there's nothing more annoying than permanently switching tools between the tweezers needed to place the components and whatever you use to peel off the rolls and pick one component at a time. A solution to avoid this consists in checking the BOM (Bill of materials) first  and prepare as many of each type as needed. I like to place them on a sheet of paper  under their reference, though the risk of moving them away or mixing them is a bit high. They can also be placed in small plastic boxes or even in an ice tray. Note that values are not marked on 0402 components, so you must absolutely not mix them, or you'll need your multimeter to sort them out. You've been warned!
 
Note that the official list of components is here using a nice lookup interface. I tested an alternate list of resistors that reduces the number of references from 11 to 7. This is only useful if you want to order a small number of component rolls, otherwise it will not bring you any benefit.

 

Prepare the solder paste

Many people using solder paste for the first time believe it's a chemical that turns to tin once heated. That's wrong. It's made of microscopic tin balls glued toghether in fat liquid very similar to flux. This is visible in the following close-up of solder paste dispensed on a metal plate. The second image zooms on the first one at approx 2.5 µm per pixel:


When heated, the liquid evaporates, releases only the tin balls, which melt (if the temperature is high enough). Over time this liquid dries and become thicker, a bit like wax, and loses of it sticking abilities. It doesn't prevent the tin balls from melting, it's just that it doesn't want to stick to pads, forcing you to put a lot of it, and results in huge solder joints that bridge many pins at once.

If you got it with a syringe, as long as it gets out of the syringe it's probably OK. When it's in a small pot like mine, you never know. And you don't even know how long it was stored at the vendor's. A good test is to put a toothpick in it, and slowly pull. If it doesn't make a thin wire, it's probably not sticky enough. And if the toothpick stays vertical once planted just in a few millimeters, it's definitely too thick.

But do not worry, there are workarounds. A first one is that if the paste is slightly heated, it can become fluid and sticky again. For example if your soldering iron comes with a transformer block that's getting hot, you may leave the pot over it during all the operations.
 
Another solution consists in re-injecting a bit of fresh flux into it. That's what I've got used to do now. It's not great because it doesn't last long (i.e. 8 hours later it's thick again) but it allows me to reach the level of fluidity I want.

In order to prepare the solution, I just pick a few cubic millimeters of paste and place them in a metal cup. Then I add 1/4 to 1/3 in volume of flux and stir the mixture until it becomes fluid and sticky. It takes time to get them properly mixed, you need to be patient. If it still doesn't stick, you can add a bit more flux, but be careful that it doesn't become liquid. A good test is to use the toothpick, pick some and deposit it on a pad without pressing. If it sticks that's OK. If it spreads like ink, you'll need to add a bit more paste to thicken it again.
 

It could seem tempting to "fix" the whole pot of paste, but it's not a good idea considering that the fix doesn't last long: after a few days you'd get a thick block again and all you'd have achieved would be to waste a lot of flux.

 

Decide what method you'll use

There are at least 3 methods accessible at home to solder the components, and they don't require the same organization.

 

Soldering iron

This is the first method I tried. With a very fine tip, it is perfectly possible to solder all the small components. But let's face it, it is extremely difficult. If the tip isn't hot enough the solder doesn't stick to the pads and if it's too hot it oxidizes and the solder only melts as large balls creating bridges and you end up with something like below:

Yeah,there's way too much solder there. The only way I found to make this less of a hassle was by using solder paste on the first of the two pads, melt it with the soldering iron tip, then place the component on top of it using the tweezers, melt it again, then use the soldering wire to make the second joint. 
 
Never ever tin the two pads at once or you won't be able to place your component flat! If some tin leaks on one pad from a previous solder joint, restart with this one without tinning the other one. If the first of two joints leaks on a nearby pad, place the second component there before making the second joint so that in the eventuality where the second joint will leak as well, you won't end up with two tinned pads for the next component.

Given how close all components are to each other, it is necessary to process them in the same order, from the furthest to the closest of your hand so that the tip never touches an already installed component. Regardless you'll very likely cause shorts that need to be inspected under a microscope and/or with a multimeter.

Another important point to take into consideration is to always start from the thinnest components and finish with the tallest. A capacitor will hinder your access to a nearby resistor and an inductor will make it even worse. You'll note that this sometimes contractics the previous rule, especially for the alternating resistors and capacitors that are placed above the DC-DC converter, which probably took me one hour to install and fix.

From time to time you can use the hot gun to rectify your components alignment or smoothen the solder joints that show some spikes. You'll need to recheck afterwards because tall solder joints may spill over adjacent ones.

 

Hot air gun

This is the second method I tried, mainly in order to finish what I couldn't realistically finish with the soldering iron whose tip was oxidizing too fast. But the hot air gun comes with its own lot of challenges. You cannot solder individual components with it, you only operate within an area without much precision, so you spend your time heating what was already in place. If you blow too strongly, it will displace some components and create packs that are hard to separate. For this reason it's better to use a large nozzle with low speed air than a small nozzle with high speed air.

If you don't place a protective metal plate under your board, you'll ruin your desk. I used an aluminum plate that came from an old CD player for this. However if you place the board directly on the aluminum, it will not heat, which is the reason why I inserted a small piece of plywood between them, and attached everything using Kapton tape.
 

 

Another issue with the air gun is that you need to be very precise in the way you place the solder paste, because if you leave some on a nearby pad, it will melt and make it difficult to later place a component. So you have to proceed in groups of components, or aggregates. It's not always easy to proceed like this, because each component is close to another one so it's not always simple to delimit aggregates. This is what I tried on the first board captured in the photos below with solder paste that was too thick and thus too abundant, causing some components to move and some bridges to appear:


On this board it is possible to proceed like this, but whatever is around the DC-DC regulator is diffcult to split into small areas. Due to this, the temptation can be high to place all components at once and try to solder all of them in one go. And this is not a bad solution at all!

If you go this way, I'd suggest that you first place all capacitors, all resistors, and the LEDs, and proceed with all of these. Then you can add the inductors and the crystal oscillator, then the DC-DC, regulator. At this point you can mark a break, verify all your solder joints, check for shorts between nearby components (particularly resistors that are very close to capacitors and which tend to move), and you can connect a current-limited 5V PSU to the J1 connector and verify that you have the expected voltages on various test points, 3.3V, 1.8 and 1.0. Otherwise you need to debug. Once that's OK you can add the CH340E USB-serial chip, the micro-USB connector and test again. Once OK you just have to solder the SoC, the flash chip, and flip the board to solder the 3 0R resistors and 2 capacitors. Do not install the ethernet jack before the board is shown to work, as it will be a real pain to later solder/desolder.

 

Oven

No, I'm not talking about buying a reflow oven, just about using your kitchen's oven. I've done this many times already, including to fix issues under BGA chips, and it has become my preferred method, being easier, faster, and more accurate. You just need a thermometer to figure what temperature approximately to use, because as you can guess, the displayed temperature doesn't match at all the one in the oven, and while 10% difference are probably not important for cooking, they can make a big difference for soldering. I found that mine heats to about 260°C when it displays 230, and that temperature is perfect for melting solder paste. If you have multiple settings, prefer those with circulating hot air instead of those with a top infrared resistor: the latter one tends to heat black areas much more. Also it's very convenient to have a rotating plate.

The operating mode that works best for me is the following: I first place all the components like for the hot air gun, except the QFN chips since I know I'll have to fix them afterwards anyway, and except the USB connector in order to protect it against the risk of melting. I then place a small ceramic dish in the oven to hold the PCB, and pre-heat the oven until my thermometer shows it reaches about 260°C. Then I open it, I carefully place the PCB onto the dish, and let it cook for 4-5 minutes in circulating air mode.
 
 
Be very careful not to accidently enable microwaves! If your oven reaches a higher temperature, reduce the duration. Once the time is over, I open it again and have a look at the solder joints. It's possible they haven't melted yet, in which case only the flux will be gone and you'll see plenty of small shiny tin balls. In this case, cook it again for an extra 2 minutes and inspect again, solder joints should be smooth and shiny:
 
 
Once that's done, I stop the oven and it automatically enters a cool down phase which lasts a few minutes. After that the circuit is still very hot but can be picked using tweezers or by placing a knife underneath, and you can inspect it.

Some components will have moved, don't worry about that. Usually those with unequal amounts of solder paste on their pads will move. But overall it's less than 10% of them which get displaced.
 

Inspect very carefully because some components may turn 90 degrees and look normal:
 

 

Install the DC-DC regulator

The EA3036 probably is the most difficult piece to install. There's no clearance to operate around it, and being very small, its very light as well and floats on top of melted tin. As such, if you put too much solder, its pins will not make contact, as can be seen below. The worst part was when removing it and discovered a large amount of tin!


I found that you need to put almost no tin on the ground plane under it, much less than a cubic millimeter! Just put the minimum you can. One approach consits in just touching the ground plane with the tip of your soldering iron, and removing excess solder using solder wick, trying not to pull off the tracks. An alternative is to do that on the chip itself. Also, when you solder paste is very fluid, you can just leave a small drop.

The second difficulty is to figure how much solder paste is enough, and how much is too much.  I found that "painting" the pads with a toothpick impregnated with solder paste, making four thin, 1mm-wide, threads of paste passing through the center of the pads, gave best results. If you fear your paste is too liquid, dispense some on the chip's pins before placing it. As long as the solder quantity is well balanced between the 4 sides, the chip will not move and you can heat it vertically using the hot gun. Once done, inspect it using a microscope. There will likely be shorts, especially with the version 1 board, don't worry for that, you'll easily remove them using flux and the tip of your soldering iron:


What matters is to make sure the chip touches the PCB and is flat on it. If not, you'll need to heat it again and possibly to press over it using your tweezers. If it resists, it means you've put too much paste under it and you'll have to remove it, clean it and place it again.

Also double-check that the chip is well aligned. For example below, after it went through the oven, the regulator turned a little bit and displaced by one pin on two sides. But it's also possible that it's simply shifted by one pin, which is not always easy to spot. Thus be very careful, and fix it if needed:


Oh and by the way, after fixing the alignment of the chip above, I realized it was incorrectly turned so I had to desolder and solder it again after rotating it. By the way, while it used to be very difficult not to have shorts under the chip on the version 1 board, I never had any failure when desoldering and resoldering it on the new board!

Once the chip touches flat on each side, you can remove the bridges that possibly exist between some pins. Just dispense some flux on each side, and using your soldering iron tip, gently sweep each side by pressing against the chip a little bit. This will re-melt excess solder which will deposit on your tip. Clean the tip and repeat for another side. At the end the chip will be perfectly soldered with no bridge between pins.

 

Preliminary tests without the SoC

I found that it's better not to solder the SoC first. It reduces the risk of frying it and frying the voltage regulator. Same with the CH340E that's powered from the 3V3 output. This is why I prefer to solder all passives and the DC-DC regulator first, then test.

First, locate the 1.0V, 1.8V, 3.3V and 5V points, and verify using an ohm-meter that they're neither shorted to the ground nor shorted together.  A short to the ground may indicate a short under a capacitor or bleeding resistor. A short between each other indicates a short in the rare places where they come close to each other, very likely in the capacitor row at the bottom of the board, below the SoC. Once that looks OK, it's time to test.

For testing, I like to power the board through a current limiter circuit like this one I made out of an MCP2544
Mine is limited to about 250 mA max, which is enough to protect the DC-DC regulator for a short time. If you haven't yet soldered the micro-USB connector, just cut a USB cable and connect the red and black wires to the +5V and GND pins below. If you've already soldered the micro-USB connector, simply connect it using a regular cable to the current limiter. I preferred not to solder the connector too early because it's easier to solder the CH340E when the connector is not there.
 

When you find the voltages are OK, it's the moment to install the CH340E and the micro-USB connector. The CH304E is the second easiest device to install (the easiest one being the crystal oscillator). With the right amount of solder paste, it will align perfectly with no effort, even using your soldering iron if you want.

The micro-USB connector is less easy to solder because it can also float on the paste you'll place on the ground plane. Thus be careful and prefer to add some extra later than putting too much initially. If there's too much paste, it may short with the +5V pad under the connector (pin1 at the top). It's wise to verify there is no such short with an ohm-meter before connecting again. Once connected again, the device will drain about 2.4 mA if the chip works. Connecting it to a PC will show a USB-serial port if everything works. If you get USB errors you might have a short between one of the DM or DP pins and the ground (or between each other).

 

Installation of the SoC

The SoC is totally scary. It has a 0.35mm pitch, its pins are 0.18mm wide and there's only 0.17mm spacing between them. How is it possible to solder this by hand, you will wonder? Just trust your skills and do it! It's easier than it looks! Just like for the EA3036 DC-DC regulator, I found that painting 4 thin threads of solder paste over the pins on each side and doing the same on the chip before placing it works pretty well:
 

I find it convenient to also put a bit of paste on the chip itself and some flux on the ground plane. This is particularly handy after replacing the chip because the board is not perfectly smooth anymore and this allows each pin to start to grip the pads. It was mostly necessary for the v1 board, the v2 is of much better quality and I've made one without needing this:
 

 
However I do have a few warnings to mention:
  • the chip is large. Well, by "large" I mean that it has many pins per side (20) and that it is absolutely crucial that the paste is evenly spread between them on each side. If that's not the case, some pins will not be connected and you'll need to desolder it and try again. Be very careful about this.
  • only hold the chip using the tweezers. It has nothing to do with static electricity, it's just that if you hold it using your fingers, and put some paste on the chip itself, you can be certain that you'll put them in the paste at some point and make some areas uneven.
  • put very little paste on the ground plane, less than one cubic millimeter, or the chip will not touch on the sides. Worse, the ground plane acts like a heat spreader and once soldered it will be very hard to melt it again or remove any excess solder:

  • perfectly align the chip before heating. When the solder paste is placed, you won't see the alignment points anymore. Don't confuse the lines used to show capacitor locations with alignment points, they're not the same. In case of doubt, make sure there's an even pad width on each side and you'll be fine.
  • if you put too much solder paste on the pads, for example if there is some paste making bridges with the ground plane, better wipe it and try again.
  • do not wait too long after you've placed the paste, especially if you diluted it with flux, because it will progressively spill and expand.
Once properly placed, heat the SoC vertically using the hot air gun, and be patient. Make sure the PCB is not directly in contact with cold metal, as the heat from the ground plane could spread into it and prevent the solder from melting. Once the solder paste melts, you may see the alignment marks again. If you don't see them, it means they're perfectly under the chip. If the chip is properly aligned, the pads will look even all around. If you have the slightest doubt, don't worry, it means it's OK, because when it's wrong it's totally obvious, with 0.7mm difference in clear area between two sides.

Now's time for visual inspection using a microscope. If any pin looks disconnected (the pad looks like there's very little solder on it), you can try your luck by filling the hole using solder paste and melting it again, but it's likely that you'll need to desolder and resolder.

Once OK, you'll just have to clean the bridges. Do not worry if what you start with looks awful like this (these chips were replaced several times due to misconnected pins and the excess of solder starts to be quite visible):
 

Dispense some flux all around the chip, and slowly wipe it with the tip of your soldering iron in one direction from one corner to the next one. It will melt the solder under the tip and remove the excess that will stick to the tip. Just clean the tip after each pass. A knife-shaped tip is better for this operation but is not mandatory.
 

You'll need to visually inspect it again using a microscope, and you may have to clean up the excess flux using isopropylic alcoohol or acetone. I used to place the board in a glass and pour a bit of alcoohol on top then jiggle it for a few minutes:

A quick inspection after this shows that it's much better:

Note that there are two places where adjacent pins are connected together and these are not short circuits.

 

Install the flash

As anyone who engages in an electronic circuit, you've read the whole doc before starting, didn't you ? :-)  So you know it can take one hour to program the flash and that it had better been done prior to finishing.

I programmed mine using a Buspirate board, and it's indeed extremely slow (1 hour). It can be even twice as slow if you enable verify. The annoying part is that it doesn't work to program the flash in-circuit, so you really need to flash a working image before soldering it, and if it doesn't work you'll need to desolder it and fix it.

The flash is pretty easy to install. Just tin one pad on the board, then using tweezers, you place the chip while keeping the pad hot, until all pins are perfectly aligned with the other pads. Then you just need to solder the other pads and that's done.

You can clean the board, if you're lucky you're done with this side :-)  Doesn't it look awesome for something home-made ?

 

Quick test

When all this is installed, you can hope to start to communicate with the SoC. Locate the C38 capacitor at the bottom right below the SoC, it's the one used to reset the board. The reset pin is the one at the bottom, connected to the R23 resistor. Using your tweezers you can make a contact across the capacitor to reset the SoC again. Be extremely careful, the top pin is connected to +3.3V and the capacitor next to it is connected to 1.0V, if you short the two, you'll fry the chip. Just train yourself with the board disconnected first.

Then connect the board to your PC through the current limiter and possibly a current meter. Open a serial terminal at 115200 bps and observe. You likely won't see anything. Just reset the board using the tweezers as indicated above. You may see a greeting message speaking about IPL and HW RESET. This is starting to get pleasant, isn't it ? :-)
 
The board should be consuming about 80-100 mA at this point:
 

If it consumes much less (around 10mA) probably it's stuck in reset. Re-check all voltages and verify that the reset pin isn't pulled up, e.g. due to a short with one of the console pins. You can also check with an oscilloscope that you see a 24 MHz signal on the crystal oscillator pins. If it still doesn't do anything despite all voltages being good, it may indicate that there are still some bad solder joints below the SoC and that you'll need to fix them.

 

Now's time to finish

Disconnect everything, flip the board, solder the three 0R resistors (R11, R17 & R12), and the two capacitors C33 & C34:
 

 Solder the Ethernet jack, and you're done.
 

Your new computer should boot and greet you with a console. Note that by default U-Boot waits up to 10 seconds so the board may remain silent for 10 seconds after displaying some garbage. That's expected.

If things go wrong

I should have titled this "when things go wrong" because things will go wrong. A few of the possible issues are listed here.


Short circuits

The first issues I met with the initial JLCPCB board was that the solder mask was insufficient and there were no protection between adjacent pads. There were lots of shorts everywhere. I had to desolder and resolder the USB-serial chip 5 times, the micro-USB connector 3 times, the SoC 6 times and the DC-DC 8 times, all of this because of shorts under them. In such a situation there's no other solution but visual inspection using a microscope, and putting the smallest possible amount of solder paste under the chips, and most of it in visible areas so that excess solder goes where you can pick it. Fortunately with the latest versions of the PCB, the solder mask is excellent, the most accurate I've ever seen, and this problem hasn't happened any single time at all. However there can still be shorts between pins on the CH340E chip and between nearby components.
 
For the CH340E chip, the best way to remove thin shorts is to put some flux and use a clean soldering iron tip. The tin will melt and fix by itself. If there's too much tin, or if you accidently add some with your tip, you'll need to remove it using solder wick. The tiny short below between pins 2 and 3 is quite scary given how thin it is, it was only visible in macro mode, but not under the magnifying glass. And it prevented the chip from working. Fortunately this cannot happen anymore with the new boards:
 

Be extremely careful with solder wick, it's always with this that copper tracks get pulled off, like here:
 
 
Never pull it if it doesn't want to come, it means some solder managed to cool down, and that you need to properly heat it up until it comes by itself. Overall it was needed with the version 1 of the board due to the numerous shorts and the incomplete solder mask, but with version 2 I never needed it.


For nearby components like a resistor and a capacitor, let's first check on the board layout if they are supposed to touch or not, because most of the adjacent ones have pads in common and actually touch only because of this. If they are not supposed to touch, you'll need to separate them. Most of the time I try with the iron because I believe I can, but it's always a mess. You can't add more solder since there's already too much, and you can't heat both sides of the component without adding some. You end up making the component very dirty or displacing other ones. The cleanest method consists in patiently heating the area using the hot air gun placed vertically over the component at a low air flow rate, and waiting for its joints to melt so that you can pick it using your tweezers. It may happen that it will come with the other one it was attached to. It's not dramatic, just pick them both and separate them (this time the iron might make this easier). Remove the excess solder, clean them up and place them again.

 

Misoriented components

Only for a few components the orientation matters: the 4 integrated circuits and the 2 LEDs. For all passives you don't care, neither do you for the crystal oscillator. The inductors require a bit of care, because they look perfectly square from above but only support 2 positions out of 4, so better double-check when placing them.

If you figure that you misplaced a component, you'll need to desolder it using the hot air gun, clean its pins, clean the pads, and solder it again. Note that it's exactly what happened to me with the EA3036 on the last board, I got a bit too confident when soldering it, and fortunately I rechecked before plugging. If you need to desolder and resolder the EA3036 or the MSC313E, you may need to put some paste on the chip's pins to compensate for the roughness of the board due to solder residue, as shown below:


For LEDs, you'd rather check with your multimeter first. Many have a diode position allowing you to spot the + (anode) and - (cathode). Be careful if your multimeter only has a resistor position, as many of them have their polarity reversed in ohm-meter mode, so better double-check on a known diode first. Otherwise you can use a battery and a 1k resistor to figure the LEDs polarity.


Wrong voltages

If two voltages appear identical and match a correct one (e.g. 1.8V), you likely have a short between their outputs. 1.0V and 1.8V outputs are very close to each other at several places, and 1.0 and 3.3V as well, especially in the capacitors area below the SoC.
 
If two voltages appear identical and don't match any known one (e.g. 1.5V or 2.4V) you likely have a short between their feedback sensing circuits. This most likely concerns the 1.0V and 3.3V that are in the LED area at the top of the circuit.

If an output shows almost 5V it may indicate that a feedback resistor isn't properly connected, that the pin is tied to ground, or that the equivalent feedback ping on the DC-DC regulator doesn't properly touch its pad. You'll need to locate it on the schematic and find it on the layout to verify.

If an output shows 0V, it could indicate that the output pin of the DC-DC regulator doesn't properly touch its pad, that the feedback pin is shorted with a power pin, or that the output pin is shorted with ground (making the inductor heat up very quickly).

If voltages look inverted, or two voltages seem wrong, with one too high and the other two low, it's very likely that you've mixed up feedback resistors, so you'll have to recheck them.


The system doesn't start

The yellow LED (top left) should light firmly when things are OK. If you don't see it turn on or if you observe just a faint glow in the dark, the chip doens't boot. For example it just stops on "IPL" or "HW RESET". This could indicate it doesn't see the flash (poor connections, bad contents), doesn't see its internal DDR (bad connection under the VDD pins at the top left corner of the SoC) or that some of the 0R resistors are missing on the other side.


The system displays output but takes no input

This just looks like a short on the Rx pin of the SoC or a misconnected pad, look under the bottom right corner.

 

Some chip heats a lot

That's not expected. The EA3036 remains barely warm to the touch, the flash and the USB adapter are totally cold, only the SoC heats up. Something heating up may indicate a short circuit or a dead component, do not let the power connected too long when checking, and avoid connecting it to your PC!

 

Did I kill that chip ?

I fried two regulators and one SoC with the first generation of boards that brought me tons of shorts, and I didn't have my current limiting circuit. Other than that, the components resist severe abuse. Let me give a few examples:
  • EA3036 desoldered/resoldered 8 times
  • SoC desoldered/resoldered 6 times. Kept at 650°C for more than 30 seconds at least once when it refused to desolder (the PCB started to turn slightly brown).
  • SoC wrongly powered on 1.8V several times due to shorts between 1.0 and 1.8V. The DDR controller finally died when 1.8V and 3.3V were shorted for an extended period of time, but the SoC continues to respond to reset.
  • flash can be desoldered/resoldered many times without loosing contents
  • crystal oscillator fell on the floor with no damage

So if you're not seeing something working, it's natural to wonder if you did anything wrong that might have resulted in killing the chip. But I'd say it's extremely unlikely, even with repeated mistakes. As long as the voltages are present on output, the DC-DC regulator does its job. As long as the 24 MHz crystal shows a signal, the SoC isn't completely dead. You'd rather desolder, clean the pins and resolder than consider you killed something.

What can be done with that board

The board is still in active development, Daniel got a number of his patches integrated into kernel 5.9, and the remaining ones will hopefully be in 5.10. His development branches work pretty fine for daily use already. As such, in its current state, this board allows to do whatever can be done with a computer having 64 MB RAM and an Ethernet port. It can be a small proxy, SSH bounce point, VPN gateway, sensor for various stuff, etc.

In addition Daniel has already made some breakout boards for micro-SD cards, GPIO and USB connectors. For example with a USB-OTG port, the card could become a smart USB-ethernet port, featuring internet connectivity via a VPN. With I2S and micro-SD it could be used as a music player. It could also be integrated into a digital music instrument to record/play MIDI files stored on a remote computer.

There is also a WiFi breakout board supporting small and inexpensive USB WiFi cards, opening the field of IoT, where this board can become much more interesting than the regular ESP32 devices:
 

There are countless possibilities in fact, and seeing that thanks to Daniel's work you can now build your own Linux computer yourself opens even more possibilities.

Perspectives

Actually while I wasn't initially much interested in playing with that board since I didn't have an immediate use for it, I figured it could make a very good option for our next generation ALOHA Pocket at work. It's more powerful than the previous one made out of a GL.iNet 6416A, will work with a mainline kernel, which will ease our port, and really looks like an awesomely geeky device! At least I was curious enough to try to boot the ALOHA on it and it worked pretty well ;-)

I'm pretty sure there are plenty of usages to be found in home automation. Let's just connect the WiFi breakout board and a touch display over SPI to see what can be done. Currently equivalent devices run on microcontrollers with limited or even insecure HTTP/TCP stacks. Here you could have a full blown HTTP/REST/H2/gRPC/MQTT/QUIC/whatever over TCP/TLS and the standard tools you use every day to program your home applications. Plus you could ssh into your device to upgrade or debug it.

Improvements

Daniel is looking at other chips of the same family. Some have more RAM, more cores, more MHz, etc. I think that in the current state, the only trivial improvement could be to double the NOR flash size and/or to add an SD card adapter. It's interesting to follow Daniel's work on the kernel, as the chip supports a watchdog, some crypto extensions, a hardware RNG, sound and plenty of other stuff. There's still some work to do, and you can help by testing the development code and giving feedback to Daniel.

Final words

Not only I want to thank Daniel for sending me this kit and making me push my soldering skills beyond the previous limits, but I really want to congratulate it for this absolutely awesome project. I'm willing to help him as time permits to review and test code. We really need such projects to clean the current state of the IoT world by deploying clean and secure operating systems. And such boards are fantastic for education. I've already bought more packs of components to give away and let others make their own boards in turn, hoping this can become viral :-)