Willy Tarreau's stuff: 2024

2024-12-08

Adding a cheap and simple RTC to Rockchip devices

Background

There are plenty of nice devices these days designed around ARMv8.2 SoCs such as RK3568 or variations around RK3588. Many of them have been using the HYM8563 I2C RTC chip for about a decade. This device is reasonably cheap, requires few components, consumes very little power and is long proven to work well. Despite its low price, entry-level devices are often lacking it and only have the pads on the board, which is understandable when every dollar counts. Some such devices include Radxa's E20C and E52C mini routers/versatile servers, which are absolutely awesome devices, which come with either dual-1Gbps or dual-2.5Gbps, feature a USB console so as to always provide a local access, and have a metal enclosure. But they're lacking the RTC, which is quite annoying for a firewall or mini-server, as a power outage always has a painful effect on the dependency chain at home (typically the OS boots faster than the ISP's box and has to start with a bad date). At least once that issue was reported, the products' creator, Tom Cubie (aka @hipboi) acknowledged the problem and suggested that new devices should have it.

Can I do it myself ?

How to proceed with existing devices in the short term ? Isn't it possible to just order the chip and solder it on board ?

One problem with HYM8563 is that it's almost always only found in TSSOP8 format, which means it's a few millimeters wide, with a pitch of 0.65mm, which means that pins are roughly 0.35mm wide with a spacing of 0.3mm between. Actually that's not that difficult to deal with, provided that you have a fine enough soldering iron. The real problem is the components that come around are also of the same scale and very close to each other, as can be seen on the photo of the E52C below (click on the photo to zoom):

The resistors and capacitors are in 0201 format, which is 0.6mm tall by 0.3mm wide, and are very difficult to solder without causing a short circuit. To get a sense of the scale above, the chip is 3mm*3mm. The crystal oscillator is a 32.768 kHz in a flat format that's not easy to find for a hobbyist.

However, the I2C pads (SDA and SCL) are "large enough" and moderately accessible, and there's power on the other side, let's keep that in mind.

Finding a pre-made board

There are various I2C RTC boards available on the net. Most of them are DS1307, and a very nice small one is based on DS3231 with a battery like this one. However, there's no HYM8563 one, and I'd prefer to stay on the same that is referenced in the DTS so that it works out of the box. But I could find the chip alone.

Making a new board instead

Then I started to think whether I could make a board myself based on the chip. Looking at the HYM8563 datasheet above reveals that it's actually not that hard to assemble the few components around. I drew a schematic on by hand (takes one minute vs one hour in Eagle):

I found a few rare occurrences of the chip in SOP8 package, which is easy to deal with, so I could make a PCB with that chip and the 6 components around it. I ordered a pack of 10pcs of that chip, and unfortunately received the tiny TSSOP8 ones instead :-(

But that reminded me that I had some small TSSOP8/SOP8/DIL PCB adapters, which support SOP8 on one side and TSSOP8 on the other side, with the DIL pads in the holes. These are convenient for on-air cabling since pads of both sides are connected together and to the holes:

After all, there were so few components that I could probably solder them directly on that PCB on either side. So let's try.

Bill of materials

Based on the schematic, I'll need this:

1 such PCB
1 HYM8563TS chip
1 diode in 0402 format
2 4.7k resistors in 0402 format
1 82pF capacitor
1 crystal oscillator
1 "large enough" capacitor (I counted about 6mn per 100µF), supporting at least 3.3V

The secret to avoid shorts when soldering TSSOP chips is to use solder paste. I have some in a syringe that I keep cool in the fridge (otherwise it dries in a few months and is unusable the day you need it):

Assembly steps

As a first step, we'll need to cut a trace on the PCB so as to isolate the VCC input pin from its connection to the IC's pin 8, as we'll want to place the diode there instead. We'll need to keep the resistors connected to the external VCC so that the capacitors doesn't discharge its energy into them when off. Thus we'll cut the trace after the via that connects to the other side. What's nice with these boards is that the pads of both sides correspond to the ones of the other side at the same location, so they're easy to match. The final pinout of the board will look like this:

So let's cut the trace:

Now we're going to place the chip before the diode, because it's already painful enough to solder such small chips, we don't want to be hindered by the diode. The approach for this is to place a very little drop of solder paste over all the pads on each side of the chip. Think in terms of volume, considering that the solder paste is mostly flux that will disappear (the flux will avoid shorts by making it difficult for the solder to make bridges between pins). Count that the resulting volume will be roughly 1/3 of the disposed one. Making a small trail roughly as wide as a pad is a good estimate. Make very sure not to leave any beneath the chip, as it will never melt and will stay them forever, risking to make shorts later. Only cover the pads and areas you can clean later. Once done, just place the chip over the paste:

Now solder everything with the soldering iron, without worrying about the risk of shorts, just focus on aligning the chip as best as possible with the pads, and melt absolutely all the paste. Then check with a magnifier or better, a microscope that everything's OK. With a multi-meter you need to check there's no short between adjacent pins:

Now's time to scratch the right side of the track and to place the diode, with the positive on the right and the negative on the left:

Let's now flip the board to solder the resistors. They will attach to the two bottom left pins, and connect to the beneath pad corresponding to the Vdd pin, which is in fact connected to the positive pin at the bottom right. It has the pleasant advantage of being placed immediately next to the two pins, and close enough to have the resistors directly touching on both ends:

Now let's connect the capacitor between the chip's Vss pin and the top rightmost hole that's connected to the chip's pin 1:

On the other side, the oscillator can be soldered. The pins on the left (when reading the reference) or under the notch are the useful ones. The two other ones on the right are not connected so I could connect them to the pin1 pad, though that will depend on the model since some have integrated capacitors and might not work well when doing so, but I could verify with my multi-meter that there was no capacitor there. In order to solder these pins beneath the component, solder paste is your friend again, being cautious again not to put too much:

We're only left with the capacitor that serves as an energy storage during outages. Ideally you'd use 5.5V super capacitors of 0.1F for several days, or the new 3.8V lithium-based supercaps that have even higher capacities (typically 10F and above). But given that my goal here really to just cover occasional power cuts from the mains, when a power supply dies or when I need to move cables inside my rack, I don't need more than a few minutes. And I did happen to have a 4V/150µF capacitor that perfectly matched my needs (should support 7 to 10mn of outage, and was super flat).

Soldering it is just like for the oscillator above except that it's not easy to use solder paste. The positive terminal (the one with the colored bar) needs to be connected to pin 8 of the chip (the top leftmost here) and the negative to the topmost right hole. Soldering that one is not too hard, just melt some solder inside the hole. Alternately, using thin wires is OK as well.

Now's time to connect wires and test:

I connected these to another board that I'm using for testing I2C, and ran i2cdetect:

Good, the device appears at address 0x51, so at least it's detected. Now we need to verify that its oscillator is properly ticking. For this we'll have to write the current time to the chip (which appears as rtc1 on this board). It will emit an "invalid argument" because it first tries to read the time which contains a bit "VL" ("Voltage Low") set in the second field indicating the the device lost power since last time it was set. But we can ignore it when writing the date, reading it next will indicate whether it works or not (the time must change):

Perfect, it works! Now's time to replace my testing wires with thin ones, to put all of that into shrink tube to protect it and solder it in the device.

Installation

Let's spot the 4 pads we'll need inside the E52C. First, at the bottom of the board, we'll find the I2C SDA and SCL pins at the bottom on these photos, where we'll solder our SDA/SCL wires (violet and green here). They'll have to pass between the board and the enclosure so they must be very thin, but where I'm passing them, there's enough room:

Next step is to install the module on the other side of the board. We're going to glue it on top of the micro SD card reader with some double-sided tape. The capacitor close to it has both positive (3.3V) and negative and is large enough to support soldering directly to it:

Conclusion

It was fun to make but took me most of the day to build; soldering small components requires delicate manipulations, and dealing with flux and solder paste requires lots of cleaning along the operations. I've made two modules so I still have one extra left. This will allow me to migrate another of my machines to a new one, but I'm impatient to see them produced with the chip already soldered so that I don't have to do this anymore!

2024-07-28

Improving A Laser Engraver's Resolution And Accuracy

Baby steps are important

Four years after I wrote about some improvements brought to my Eleksmaker laser engraver, I made quite a lot of progress on multiple fronts.

Laser module

I was regularly annoyed by the too irregular laser beam and finally acquired a NEJE A40640 model that's supposedly 15W optical, made of two diodes and that contains lenses offering a Fast Axis Correction (FAC) to reduce the beam's divergence. The result is an almost square beam that's roughly 60x80 µm by default and can be shrunk to even 60x60 µm at a shorter distance, or be made less than 60 µm wide if made taller.

This constituted the most important improvement because with a poorly shaped beam you can't do any good work. I must say that as of now I think I will never buy again a module without FAC. Think about it, previously if you wanted to cut a circle in wood, half of the circle was completed (e.g. horizontal direction) while the other one was still burning large areas due to the line-shaped beam. Now the same amount of energy is spread in all directions and can be made very narrow.

Air assist

I had been trying various approaches using aquarium pumps to implement some form of air assist to blow the smoke and make a better cut, but these didn't work well. The air flow was made of many irregular pulses and you could hear "puff puff puff" at the output of the laser's head.

I finally decided to order NEJE's air assist accessory for my laser module. Strangely it didn't fit well, it was forcing against the lens' screw. I suspect that the module evolved a little bit since I acquired mine and maybe some dimensions were slightly adjusted on new ones. Nevertheless, I could enlarge the opening of the air assist head so that it fits on my module.

And to get rid of the bad pumps, I finally ordered an AtomStack air pump. The device is really great. It takes 12V input, has a potentiometer to set the air flow speed, and has an output. I just assembled their hose with the one I already had. It further smooths the air flow and now at the output of the module, the flow is super regular and can vary from tiny to quite strong. In practice, most of the time, it's sufficient to set it to 1/4 to 1/3 of the speed in order to blow the smoke away:

The results are definitely better, but contrary to what is often seen on ads, it's not on wood that I found the most impressive results, but when engraving images or cutting acrylic. Previously the smoke was causing the beam to diverge a bit, making the result less precise. Now it's super accurate. It's important however to make sure that the piece being worked on doesn't move, because a moving air stream on top of it can make it move (which is another reason for not setting it too strong). That used to be true already for the module's fan anyway. I found that using a few tiny neodymium magnets can be very effective.

Changing the motors' resolution

OK now we're having a nice laser head with much better precision and less disturbances caused by smoke. Isn't that enough ?

In fact I'm using my engraver mostly for PCBs, sometimes for cutting stuff (wood, acrylic), and sometimes to engrave drawings or photos.

For PCBs you do want to have good resolution, otherwise you can't make a track pass between two integrated circuit pins. Or it will touch one side, or be too thin and disappear while etching. With 0.1mm resolution, when you have 0.6mm between two pads of an IC chip, that leaves a single space of 0.1mm on each side, and 0.4mm for the track. Or 0.2mm on each side and 0.2mm for the track. You don't even have an option of 0.15mm each and 0.3mm for the track. And it depends how these are aligned with the motors' steps.

For photos, you generally use Floyd-Steinberg dithering which further reduces the photo's resolution, and when working with 0.1mm dots, that becomes quite visible.

I had been thinking for a long time if it would be possible to find motors with more steps per round. But there was another option that suddenly came to my mind and that I had not yet been considering: what about finding pulleys with less teeth so that it requires more steps to make the same distance ? I searched the net for a few hours and found that the type of belt I'm using is designated as 2GT or GT2 and has a step every 2 mm. My pulleys had 20 teeth and a 5mm axis, and I found others with 16 and even 12. There is a 10-teeth model as well, but only for 4mm axis. So I ordered these, and managed to install the 12-teeth on my motors to replace the 20-teeth one. Here's the photo from the vendor's site:

One assembled, it looks like this (replaced, and with the old one as a comparison):

Doing this requires to adjust the number of steps per mm. It changed from 80 to 133.333 in GRBL's settings, but that's all that needed to be adjusted. I feared that the head would travel slower, but that's not the case. Apparently the speed is more limited by the head's weight and the motors power. However, instead of having a reproducible resolution of 0.1mm, I'm now getting 0.06mm, which precisely is the default size of the beam. Converted to DPI, that's 423 DPI.

This new resolution now allows to export PCBs as images and print them, the result is now good enough. And that's quite convenient because it also means using the same tool for all exports, with the same coordinates system, giving the ability to produce multiple images of various planes, such as the cream plane used to remove varnish and expose solder pads:

In terms of images, this has tremendously improved the result. The dog photo at the bottom is 675x875 pixels and is printed on black-painted aluminum at 0.06mm per dot, resulting in an image of 41x53mm. The result is really impressive, that's 16.6 pixels per mm, or 278 pixels per mm², vs 100 before. That multiplied by 2.78 the pixel density hence the possibilities of nuances in an image.

One problem however is that an image is almost twice as long to print now because there are almost twice as many lines. That was the opportunity for another improvement.

Bidirectional printing

In order to improve print performance, one possibility consists in printing in S form instead of Z, that is, printing even lines from left to right and odd lines from right to left, effectively avoiding a slow return-to-home operation after each line. The software I wrote to convert PNG images to GCODE, png2gcode, already supported such bidirectional printing, but this had always been ugly at high speeds. It was quite visible that there was an offset between each direction. This is not surprising, for three reasons:

the startup acceleration is not necessarily the same in both directions. However this was addressed long ago with an option to add an acceleration margin to both sides, that I'm typically setting to 3mm ("-A3") so that the beam arrives at full speed on the first pixel to be printed.
micro-stepping is used to control motor positions. Despite using the high-quality TMC 2209 drivers, which take the delivered energy into account to make steps homogenous, it's understandable that the belt's elasticity will not reproduce the exact same position when pulled in one direction or the other, and that it will depend on its tension. Here the belt is quite tight, but tightening it too much can also make it difficult for the motor to make it move in micro-steps.
the instruction processing time in the micro-controller counts as well.

Till now, png2gcode would offer an option to set an absolute offset for right-to-left printing (that corresponds to the belt tension), and a time-based offset as well for the processing time. However, approximations that had been used till now were hardly reproducible.

The new laser head combined with the new gears was a great opportunity for trying to improve the situation by taking new measurements.

The test consists in printing on anodized aluminum, a rectangle that's 7 pixels high, 40 pixels wide, with a vertical line at 0, 5, 10 and 20 pixels. When printed only left-to-right ("-Mraster-lr"), it's perfectly regular. When printed in bidirectional mode ("-Mraster"), it's visible that every other line is shifted right by one or a few pixels. The same rectangle was printed at speeds of 600, 1200, 2400 and 3000 mm/min from top to bottom. The pixels are 0.12mm wide. The expected pattern is easier to understand on the top and the deformation is increasingly visible as speed increases. The photo had increased contrast to better see the dots:

This allows to see how much variation there is between them, explaining what is dependent on time, and what is fixed. After some calculation and tests, it appeared that the pixels when going right to left had to be shifted left by 0.12mm and delayed by 2.6ms. This delay is converted to mm depending on the travel speed so that in the end it gives only a distance.

With the right adjustments it's possible to align left-to-right and right-to-left and almost double the print speed. Here's a capture of the final results. It's still visible on the large rectangle that there can be around 10 µm variations in positioning because the vertical lines are not always perfectly straight, but that's very hard to notice on the microscope, let alone to the naked eye! The one on the right was printed at 0.06mm per pixel, and there the positioning resolution remains imperceptible.

The image of the lunch at the top of the sky scraper at the bottom of this page is 1206x943 pixels, rendered on a visit card of 72x56 mm, and took approximately 25 minutes to engrave. With unidirectional printing previously it would have taken approximately 45 minutes (it's not exactly twice as long because the return can be a bit faster when not engraving).

Beam narrowing

As can be seen on the test image below, the beam can be made narrower when it's a bit taller. This is important because if the beam is as large as a pixel, then when it sweeps an area as large as a pixel, it has effectively engraved two pixels. The image shows a test pattern with one dot every 0.12mm (the ruler is in millimeters). A zoom on the dots shows they're between 25 and 30 µm wide and approx 100 µm tall:

We can then exploit this principle to consider that the beam will imprint larger than desired and consider this for granted as soon as the beam turns on.

Example: let's say we want to print the green dashed line below (each square is one pixel the size of the beam). The laser dot (in blue) by default will scan from the left of the square to the right, and will effectively span twice its size. The part that was constantly under the beam will have received more energy, and the borders which were only a limited time under the beam will have received less. As can be seen as the beam advances from left to right (line after line), the pattern is reproduced but the contrast is limited.

Now if we take that beam width into account, we can make the beam start to light up later and extinguish it earlier, still providing a shadow around the borders but leaving the intervals totally unexposed. This is more efficiently done by having the dots twice as large as the beam and reducing the beam duration in half, as this would then preserve the intervals.

When taking all that into account, it becomes possible to print photos on painted metal after passing them through Floyd-Steinberg dithering and produce such stunning outputs:

When zooming in macro mode, it's possible to see the dog's hair as 1/16mm dots:

Similarly, a soom at the center/bottom of the lunch photo, zooming on the metallic beam close to a building shows some of the details, then we can zoom further on the beam and it's visible how the gray is obtained by alternating clear and dark lines:

Wrap up

All of this is a question of patience and experimentation. Now I'm able to easily print a photo on metal by using bright black paint and have enough resolution to almost always do it well on the first pass. In the past I had to adjust power, speed, and color conversion to compensate for the risk of too wide pixels ruining everything. That's no longer the case, as can be seen above with direct printing of photos at very high resolution!

2024-05-21

An affordable 10GbE capable NAS

Background

Two weeks ago, in early May, Tom Cubie of Radxa asked me if I would be interested in testing their new ROCK 5 ITX board. I've followed a little bit the development of this really nice board, and Tom probably valued some of my tests and comments during the ROCK 5B debug party. And he knows I'm not shy when I disagree with certain choices, and he confirmed he doesn't seek a complacent review at all, so that was enough for me to accept the offer, because I'm one of those who believe that making issues public is the most efficient way to collectively find the best solution to them.

First impressions

Unpacking

A week later I received a parcel from DHL. First impression, the board is packaged just like a PC mother board. There's a rationale for this, it's an ITX form factor, aiming at being installed in a PC enclosure. The customer must feel in known territory when installing it ;-)

Front panel and buttons

Like on a PC mother board, there's a front panel connector with Power switch, Power LED, Reset switch, HDD LED. These are nice improvements since many Arm-based boards are missing a reset button which would be appreciated during development and kernel porting. However, here a button could have been placed on the right of the SPDIF connector for debugging periods when the card is left on a desk without anything connected to this front panel connector. That's no big deal since a screw driver suffices to make contact to the pins, but it would be cleaner. Some PC mother boards have adopted this principle nowadays by the way:

Installing a cooling solution

One excellent design choice on this board concerns the cooling. Radxa adopted a design compatible with LGA115x thermal solutions. This means that instead of having to resort to mixes of inefficient and inconvenient solutions as is often the case, here reusing an old PC heat sink will work. I found one from an old 1U server in my tray and decided to install it. It even has the PWM pin to control the fan's speed (which I won't use except for testing).

Installing a serial console

Another point to note regarding the connectors is that there is no externally accessible serial console, though there's a pin header on the board next to the micro-SD connector. Serial connectors are still needed in the Arm world because of the boot loaders. While you can often do most of the day-to-day operation using an HDMI display and a USB keyboard, each time the machine fails to boot, the only option is to pick the screw driver, open the box and connect a UART connector inside to fix the boot problem. This tends to be less of a problem with systems adopting the Arm SystemReady approach which provides a PC-like UEFI BIOS where you can really control everything from the early boot, and normally don't have to fiddle with low-level commands just to load a recovery kernel. Here there's no UEFI at the moment so the only recovery option is the serial port. And in general I don't like the idea of having to plug/unplug a screen connector and move it around in the rack between all my machines, it even happens quite often that a sick machine fails to enable the frame buffer and display anything. The USB serial console is much easier to use and allows for multiple machines to cross-connect so that all your machines in the rack are accessible at the same time from the same display.

I noticed that the connector has the same pinout as more and more boards I'm seeing these days, so I could reuse an adapter I prepared for another board. This one is cheap and based on a CH340N. It's tiny, only requires to solder 3 wires, doesn't cause trouble when not powered, and support Rockchip's speed of 1.5 Mbps.

Surprise of the power connector

When trying to plug the 12V adapter cable that was lying on my desk, I noticed it wouldn't enter. I looked closer and saw a huge central pin. Grrr... it's a 2.5mm one. These ones are really really not common. I checked all my adapters here (about 20, all voltages included). None of them had a 2.5mm connector, all were 2.1mm. Fortunately I had a 2.5mm male jack and a 2.1mm female one, so I could make an adapter to connect the 12V power block.

I suspect that the reason for using a larger connector than usual is to make sure users don't accidentally connect a laptop 19V power input. That's understandable of course. Another option could be to make the board accept a wider input voltage range. Some PC boards do this. For example some boards will take 8 to 25V, and will only need more than 12 if they really need to deliver 12V. Most of the time the 12V pins are not used, they're basically only used for SATA spinning disks, but most often not even for SSDs.

First power up

Once plugged in, the console immediately shows that it sees 8 GB of LPDDR5 DRAM installed as 4 banks of 2 GB each and that the speed is configured to 2400 MHz, hence 4800 MT/s:

DDR 9fffbe1e78 cym 24/02/04-10:09:20,fwver: v1.16
LPDDR5, 2400MHz
channel[0] BW=16 Col=10 Bk=16 CS0 Row=16 CS=1 Die BW=16 Size=2048MB
channel[1] BW=16 Col=10 Bk=16 CS0 Row=16 CS=1 Die BW=16 Size=2048MB
channel[2] BW=16 Col=10 Bk=16 CS0 Row=16 CS=1 Die BW=16 Size=2048MB
channel[3] BW=16 Col=10 Bk=16 CS0 Row=16 CS=1 Die BW=16 Size=2048MB
Manufacturer ID:0x6
CH0 RX Vref:29.7%, TX Vref:19.0%,0.0%
CH1 RX Vref:31.0%, TX Vref:19.0%,0.0%
CH2 RX Vref:31.8%, TX Vref:20.0%,0.0%
CH3 RX Vref:28.5%, TX Vref:20.0%,0.0%
change to F1: 534MHz
change to F2: 1320MHz
change to F3: 1968MHz
change to F0: 2400MHz

The system starts to boot to an pre-installed debian 11 image and presents a login prompt showing that the host name is called "roobi". The console is polluted a bit by some bluetooth messages (there's no BT device on this board):

[   24.409994] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)
[   24.463642] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)
[   24.685977] Bluetooth: hci0: command 0xfc18 tx timeout

Debian GNU/Linux 11 roobi ttyFIQ0

roobi login: [   27.981985] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)
[   32.792957] Bluetooth: hci0: BCM: failed to write update baudrate (-110)
[   32.793062] Bluetooth: hci0: Failed to set baudrate
[   34.926158] Bluetooth: hci0: command 0x0c03 tx timeout
[   42.819615] Bluetooth: hci0: BCM: Reset failed (-110)

roobi login:

At this point I tried many combinations of"root/root", "rock/rock", "radxa/radxa", "roobi/roobi", but none worked, so I looked on the net and couldn't find any relevant info. When rebooting I noticed that u-boot proposes a second boot choice:

U-Boot menu
1:	Debian GNU/Linux 11 (bullseye) 5.10.110-33-rockchip
2:	Debian GNU/Linux 11 (bullseye) 5.10.110-33-rockchip (rescue target)
Enter choice: 2

But the result is basically the same, no valid login/password found:

Cannot open access to console, the root account is locked.
See sulogin(8) man page for more details.

Press Enter to continue.
[   24.167338] Bluetooth: hci0: command 0xfc18 tx timeout
[   32.060943] Bluetooth: hci0: BCM: failed to write update baudrate (-110)
[   32.061077] Bluetooth: hci0: Failed to set baudrate
[   34.194156] Bluetooth: hci0: command 0x0c03 tx timeout
[   42.087379] Bluetooth: hci0: BCM: Reset failed (-110)


[   87.668622] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)
[   87.735524] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)


Debian GNU/Linux 11 roobi ttyFIQ0

roobi login: [   91.286888] dma-pl330 fea30000.dma-controller: fill_queue:2263 Bad Desc(2)

Request for help and first surprises

Since the board is pretty new, I suspected that the login/pass were well-known by a few users and not yet put into an easy to find documentation. I knew that a few other people had got their hands on this board as well, so I asked for help on the Radxa forum. Thomas Kaiser responded, suggesting that apparently I was not supposed to log in there, because this "roobi" image was in fact an installer that's supposed to be used via a keyboard+display or via a browser. Some doc for it is found here.

A first feeling of over-engineering and needless complexity started to build up. Usually it's as simple as downloading an image of choice, writing it on a micro-SD or USB thumb drive, plugging it and booting off it, and you're done. I generally don't like installers that tend to half-work and not to let you decide what nor how you install, or easily leave the system in an unrecoverable state. Thomas found in the image build scripts that a user "ps/ps" is created.

I tried it and this user worked, even though it shows some syntax errors in some scripts:

roobi login: ps
Password: 
Linux roobi 5.10.110-33-rockchip #65700d485 SMP Wed Apr 3 04:26:57 UTC 2024 aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Fri May 17 19:29:29 UTC 2024 on tty1
-bash: [: : integer expression expected
-bash: [: : integer expression expected
-bash: [: : integer expression expected
ps@roobi:~$

This account is allowed to sudo and from there it's possible to create new users and install packages, so I could run some frequency tests and DRAM latency tests locally. BTW, sshd is already present and enabled, and gcc is there as well.

After all these tests were done, I tried to connect using a browser to this machine's IP on port 80. I was presented with an installation screen, that required me to contact the power connector 3 times to validate that I was on the right board:

Due to remote access to this device, authentication is required.
After clicking the start button, please press the power button
three times within 60 seconds to complete the verification.

Note that I still had a root shell on this machine over the network, so it's definitely not a security measure, most likely it's just a way to avoid mistakes when installing multiple boards in parallel.

From there I could choose to install one between two possible debian images (how one is supposed to install other operating systems with this is unknown for now, maybe one needs to enter a specific URL, still complicated when you already have your image on an SD),.

When choosing the installation target, the installer lists available block devices. Here, "no device available" is displayed.

Thomas suggested that Radxa did not intend for the eMMC to be usable by the end user and instead waste it to host this absurdly huge installer (a full fledged debian distro). Now this is where it sounds absurd: they design a really great device, that corresponds exactly to the design one would expect as a server, with the right choice of connectivity, storage and extensions, but someone passes after this and says "no, please leave me the eMMC, I would like the installer to stay there forever so that I can spend my life reinstalling this board every day".

OK, I'm a bit sarcastic, but why suddenly ruin valid and efficient use cases for the sake of keeping an installer there that you'll need only once in the product's life ? Micro-SD is made for this! There are so many owners of competing boards that lack eMMC and that are asking to get one that it's really not morally acceptable to sacrifice the eMMC for an unused installer.

Migration of the installer to SD

The eMMC's layout is the following:

0 to 16MB: U-Boot, before the first partition
16 to 32MB: partition 1, 16MB FAT, mounted in /config
32 to 332MB: partition 2, 300 MB EFI, unused for now
332MB to 7.3G: partition 3, 7G ext4, debian 11 for installer

More precisely it looks like this:

$ sudo fdisk -l /dev/mmcblk0
Disk /dev/mmcblk0: 7.28 GiB, 7818182656 bytes, 15269888 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 388778EF-E065-4B77-A699-D0C2B1832E62

Device          Start      End  Sectors  Size Type
/dev/mmcblk0p1  32768    65535    32768   16M Microsoft basic data
/dev/mmcblk0p2  65536   679935   614400  300M EFI System
/dev/mmcblk0p3 679936 15269854 14589919    7G EFI System

Let's have a look at what options are offered to us by U-Boot. For this, we need to enter U-Boot. It interrupts during the first second when it lists the available system images, just pressing Ctrl-C returns to the U-Boot prompt:

=> printenv
...
boot_targets=nvme0 mmc1 mmc0 mtd2 mtd1 mtd0 usb0 pxe dhcp
...
distro_bootcmd= ... for target in ${boot_targets}; do run bootcmd_${target}; done

OK so the boot loader will check mmc1 (micro-SD) before mmc0 (eMMC). Let's just copy eMMC to a 8 GB micro-SD and try again. The boot loader will still be loaded from the eMMC (first 16 MB), and the kernel loaded from the SD. We'd like the OS to load from the SD as well, so for this we'll need to rename the UUID on the SD and adjust it in the extlinux.conf file so that only the SD is used. Let's reboot to the installer, insert a 8 GB minimum micro-SD, and prepare it this way:

roobi login: ps
Password:
ps@roobi:~$ sudo -s

# eval $(blkid -o export /dev/mmcblk0p3)
# echo $UUID
b055efba-0f72-448b-927c-07f40f2714c8
# NEW_UUID=$(cat /proc/sys/kernel/random/uuid)
# echo $NEW_UUID
5ee2e87c-0db4-4235-a5ec-aabe607d9c48

# dd if=/dev/mmcblk0 of=/dev/mmcblk1 bs=1M status=progress
# cat /proc/partitions
# e2fsck -f /dev/mmcblk1p3
# tune2fs -U $NEW_UUID /dev/mmcblk1p3
# mount /dev/mmcblk1p3 /mnt/
# sed -i -e "s/$UUID/$NEW_UUID/g" /mnt/boot/extlinux/extlinux.conf
# umount /mnt/
# reboot

Now the system reboots from the micro-SD. Logging into the system shows that it has only mounted the SD (mmcblk1) and not the eMMC (mmcblk0).

Installation to eMMC

Using the browser again to connect to the installer shows that now it properly lists /dev/mmcblk0 as an available installation device. Yay!

However when trying to continue, it starts to scare you by suggesting that everything will be wiped:

This made me hesitate for a while, but I thought it wouldn't make sense to wipe the boot loader parts on a target device from which the system later hopes to possibly boot (e.g. a micro-SD). So I finally confirmed and it installed on it. It took a few minutes after which it automatically rebooted. Of course, since I had left the SD card in, it booted again on the installer, but this showed me that it hadn't wiped the boot loader at least. Removing the SD card and booting again this time ended up with the Debian 11 prompt corresponding to the new image:

Debian GNU/Linux 11 rock-5-itx ttyFIQ0



roobi login: rock


Password:
Linux rock-5-itx 5.10.110-33-rockchip #65700d485 SMP Wed Apr 3 04:26:57 UTC 2024 aarch64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
rock@rock-5-itx:~$

The contents of /etc/passwd show that radxa/radxa is also valid. It's nice to see that the distro is not too fat and that there's plenty of room left to install whatever one wants to install on their NAS:

$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
udev             3929848       0   3929848   0% /dev
tmpfs             813580    1700    811880   1% /run
/dev/mmcblk0p3   7116452 1396500   5393508  21% /
tmpfs            4067900       0   4067900   0% /dev/shm
tmpfs               5120       4      5116   1% /run/lock
/dev/mmcblk0p1     16112       1     16111   1% /config
tmpfs             813580       4    813576   1% /run/user/1001

Pfew... that was a bit more complex than usual and than needed but it was worth it in the end. Now we have the target operating system set up and running on the board. One annoying point with such an installer is that you copy a pre-installed distro to your system, you're not offered the choice to choose the FS layout for example. It's as if you just used "dd" to dump the .img to the target device.

In fact what should be done with this installer is that it lets you select an installation image for the distro of your choice, that you deposit on a micro-SD and that you then boot from to install on the target device(s).

Connecting SATA SSDs

For a NAS, one needs to have storage devices and cabling. I didn't intend to receive the board that fast and I thought I didn't have power cables. But after digging in my boxes, I managed to find sufficient cable converters to connect 4 devices. I did have 4 used 120 GB intel SSD. They're not extremely fast but at least they do work so I started to hack with them. The result looks like an ugly octopussy :-)

Running a simple basic test of all disks at once

I'd intend to use these SSDs in RAID5, as an NFS server doing mostly reads though writes will be needed as well of course. Let's first see what the whole devices are capable of in terms of read speed. For this I'm just running vmstat 1 while reading from 1 disk first, then from all 4 disks at once:

# dd if=/dev/sda of=/dev/null bs=1M

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 7851296   8552  95092    0    0     0     0  110  114  0  0 100  0  0
 1  0      0 7625296 233644  95240    0    0 224800     0  972  757  0  5 95  0  0
 0  1      0 7301748 556392  95900    0    0 323072     0 1283 1053  0  5 95  1  0
 1  0      0 6977948 879464  96804    0    0 323072     0 1289 1073  0  5 95  1  0
 0  1      0 6653140 1203560  97540    0    0 324096     0 1301 1080  0  5 94  1  0
 0  1      0 6328852 1527144  98204    0    0 323584     0 1289 1084  0  5 95  1  0

# for i in a b c d; do dd if=/dev/sd$i of=/dev/null bs=1M & done

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 7848444   8548  93428    0    0     0     0   97  116  0  0 100  0  0
 0  4      0 6773340 1082724  95060    0    0 1074224     0 3942 3252  0 17 53 30  0
 1  3      0 5573384 2279780  97580    0    0 1197056     0 3932 3607  0  8 49 42  0
 1  3      0 4365080 3485540 100276    0    0 1205760     0 3962 3657  0  7 50 43  0
 2  2      0 3155736 4696504 103184    0    0 1210880     0 4055 3668  0 10 49 41  0
 0  4      0 1927256 5922148 105972    0    0 1225728     0 4043 3705  0  8 50 42  0
 0  4      0 711880 7135076 108428    0    0 1212928     0 4010 3670  0  9 49 42  0
 3  1      0 294572 7548632 110632    0    0 1211904     0 4194 3993  0 11 49 40  0

This test was running on the big cores, which were mostly waiting for the disks (42%) and using little CPU (8-10%). Running on the little cores instead showed almost the same performance:

# for i in a b c d; do taskset -c 0-3 dd if=/dev/sd$i of=/dev/null bs=1M & done

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 7851584   8548  94188    0    0     0     0   98  102  0  0 100  0  0
 2  2      0 7679972 179556  93864    0    0 171076     0  819  530  0 10 89  1  0
 1  3      0 6508936 1348452  96700    0    0 1168896     0 3931 3619  0 19 72  8  0
 3  1      0 5343568 2511204  98952    0    0 1162752     0 3852 3595  0 17 74  9  0
 3  1      0 4177208 3674548 101952    0    0 1163264     0 3837 3617  0 18 73  9  0
 4  0      0 2993412 4855344 104268    0    0 1180672     0 3951 3660  0 19 72  9  0
 0  4      0 1807820 6038372 107168    0    0 1183232     0 3953 3650  0 18 73  9  0
 1  3      0 630820 7212876 109680    0    0 1174016     0 3931 3663  0 19 72  9  0

Here the CPU is used approximately at twice the load, and iowait is much smaller, indicating that there's less margin on the CPU. But the performance is almost the same, at 1.17 GB/s vs 1.21 for the big cores.

One will note that this bandwidth is slightly smaller than the sum of all 4 SSDs (4*323 = 1292 MB/s vs 1210 MB/s measured). Could we be hitting a wall ? Note that for 10 GbE it's fine because that's slightly above the limit of what one can transfer over TCP at 10 Gbps. But it's still interesting to know.

The SATA controller is connected using 2 PCIe 3.0 lanes. Each lane is running at 8 GT/s, encoded using 128/130 coding (128 bits transported over 130 bits). The MaxPayload is at the minimum, 128 bytes:

root@rock-5-itx:~# lspci -nnvv -s 0001:11:00.0 | grep -A2 Ctl:
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 256 bytes
--
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x2 (ok)

This is a 64-bit system so the PCIe overhead is 26 bytes. Thus each transfer of 128 bytes requires extra 26 bytes, that's 154 bytes total, or a transfer efficiency of 83.1%, which becomes 81.8% once the wire encoding is taken into account. Thus the absolute maximum bandwidth with the SATA controller, not taking into account commands and control stuff, is 81.8% of 2*8 GT/s = 13.09 Gbps, or 1.63 GB/s or 1.52 GiB/s. We're not there yet, only at ~75% but the margin is small. Anyway, as previously said, we're already reaching the speed that at 10 GbE Ethernet adapter could deliver, so there's nothing lost here, and the choice of assigning 2 lanes to the SATA controller was the best one.

Connecting a 10 GbE NIC

For a 10 GbE NAS, one will need a 10 GbE NIC. I already had such a device that I bought for testing the ROCK 5B, and knew that it now works pretty well. The NIC fits nicely into the M.2 adapter, and has fixature for the RJ45 port to use a regular slot:

This NIC is made around a chip ACQ107 and requires the module called "atlantic" that's enabled via "CONFIG_AQTION" under "CONFIG_NET_VENDOR_AQUANTIA". Unfortunately it's not enabled in the default kernel. I needed to rebuild the kernel, but it's a bit complicated to find the relevant sources. The wiki says that there's a bsp tool used to download, patch and build the kernel. I already have tons of kernel versions here ("git branch | wc -l" reports more than 400) and having to go through such a process when you know you already have most of it, and having to learn yet another tool is not the most developer-friendly solution IMHO.

With that said, at least "bsp" is a clean and readable script, at least it's not a Python horror that requires to download half of Github. But it's still annoying to have to read a script to figure what directory's config file to read to find the kernel's URL. I found a few candidate URLs, one of them being Joshua Riek's rk-5.10-rkr6. (I later found that it was apparently linux-5.10-gen-rkr3.4 from Radxa's repo that ought to be used).

I also noticed a 6.1 kernel ("linux-6.1-stan-rkr1"). I tried this one first, at least to gauge the progress of the porting to newer kernels. This one properly detected the 10GbE NIC but not the SATA, there were PCIe errors and I'm wondering if it properly applies the bifurcation to see two distinct devices. It was a bit late at night, so I quickly gave up. Instead I tried with "rk-5.10-rkr6" that's based on 5.10.160 (vs 5.10.110 for rkr3.4), and everything works there, both SSD and the 10GbE NIC!

BTW for those interested in using the repositories above, issuing "make rockchip_defconfig" is all you need to do to configure the base kernel (after setting your ARCH and CROSS_COMPILE as usual, of course).

Basic testing of the network speed

I'm used to measuring the bit rate using the if_rate utility that I adopted many years ago after it looked unmaintained. It's simple and convenient and supports a vmstat-like output that eases monitoring and logging during tests:

$ git clone https://github.com/wtarreau/if_rate
$ cd if_rate
$ make
$ ./bin/if_rate -l -i enp1s0 1
#   time   enp1s0(ikb ipk okb opk) 
1716215181 0.0 0.0 0.0 0.0 
1716215182 0.0 0.0 0.0 0.0 
1716215183 0.0 0.0 0.0 0.0 
1716215184 0.0 0.0 0.0 0.0 
1716215185 28290.2 53104.4 3698500.4 305417.7 
1716215186 89379.8 168813.3 9974752.8 823541.1 
1716215187 90188.3 170637.7 9940160.9 820688.8 
1716215188 90498.4 171225.5 9974026.0 823474.4 
1716215189 89900.4 170266.6 9974559.0 823537.7 
1716215190 90303.3 171029.9 10016757.3 827012.2 
1716215191 88657.9 167802.2 9941214.6 820773.3 
1716215192 90085.0 170447.7 9974735.9 823542.2 
1716215193 90063.5 170265.5 10006728.4 826183.3

In parallel a simple netcat of /dev/zero was being sent over the network to another 10GbE machine on the network. So the NIC is properly able to saturate the wire on output, that's what we needed to check.

Installing everything in an enclosure

Choice of enclosure

One of my ancient file servers was based on an ATOM D510 and was installed in a really nice APLUS CS-CUPID 2 ITX enclosure, so I decided to remove the board and reuse this perfect enclosure:

Installing the SSDs

The enclosure comes with a frame to carry up to two 3.5" disks, so I needed a way to attach all 4 2.5" SSD frames to it. Since I had already done something similar in the past, it was easy to replicate. I measured that everything would fit with a 14mm pitch between devices, so I used my laser cutter to make a support:

Fixing the serial port

The serial port was fixed thanks to a metal square to which I screwed an unused piece of PCB on top of which the adapter was stuck using double-sided tape (it's not very clean but it works):

Final assembly

The final result looks like this, with the board, the 4 SSD, the 10GbE NIC and connector and the console connector at the bottom. The enclosure's power board is not used for now.

Running a full load test

Now that we have all the components together, let's see what the whole system is capable of. The aim will ultimately be to turn this into a real NAS server that will replace my local server currently running on an Odroid-H3 once it works fine with a mainline kernel. The indications based on the various measurements to date are that the device should be 10G-capable.

I didn't want to go into setting up NFS etc. So in order to verify the ability of the board to pull 10G from the disks to the network, I simply installed NGINX, created a RAID5 array on the disks, and created a set of 32 files 1GB in size so that the total work set cannot fit into memory (8G).

On the other machine (a Core2 Quad with a similar NIC), I started 32 instances of h1load each requesting its own file in loops on a single connection (to avoid the risk of reuse). In parallel I collected CPU usage, network bit rate and the I/O rate (that I turned to bits per second for the scale, by multiplying by 8). This gives this (I stopped the test around 4 minutes):

The fact that the disk I/O is always a bit lower than 10G is in part that there's ethernet, ip and TCP overhead (94.1% efficiency) and in part because depending on the load order, it may occasionally happen that some pieces are still in cache. Regardless, the 10G cable was full flat, and the CPU was at less than 50%, which indicates quite some headroom. More modern SSDs than these old ones would also probably do a better job at keeping the cable full.

Power measurements

Note that the following measurements were already reported on the Radxa's forum here. I finally replaced the SSDs with slightly more recent 2x intel X25M 160 GB and 2x intel 530 180 GB, and conducted a power measurement using various methods:

feeding 12V into the motherboard's jack
feeding 12V into the enclosure's power board (I only later noticed that it's supposed to be fed 19V, it should be tested again)
feeding 12V into an aliexpress 12V jack-to-ATX "160W" adapter:

The power was measured using an ampmeter in series with the 12V connector, and a voltmeter in parallel at the closest possible to the connector:

Here are the measurements:

12V via the aliexpress ATX 160W adapter: 1.06 * 11.83 = 12.54W
12V via the enclosure’s adapter board: 1.16A * 11.95V = 13.86W
12V via the motherboard’s jack, ATX adapter still connected: 1.15A * 11.95V = 13.74W
12V via the motherboard’s jack, ATX unplugged: 1.02A * 11.96V = 12.20W

Thus the board’s power design looks extremely efficient, beating the other two. There’s 1.7W saved here by powering the board via its own jack instead of the enclosure’s adapter.

In addition I measured the individual power draw of various components, all powered from the motherboard’s jack since it's the most efficient:

removed all SSDs: 0.81*12.03 = 9.74W => 2.45W drawn by the 4 SSDs in idle
no SSD and 10GbE link down: 0.59 * 12.14W = 7.16W => the 10GbE RJ45 link draws 2.58W alone.
no SSD, 10GbE adapter removed: 0.40 * 12.21 = 4.88W => the 10GbE adapter draws 2.28W with a link down, and 4.86W with a link up

Testing mainline kernel

Apparently according to Collabora's page, the SoC is now quite well supported. I'm interested in seeing how mainline behaves on this board because in my opinion using a BSP kernel is a showstopper for storing data. So I gave a quick try at kernel 6.9 but found no working DTB for now, the kernel is loaded and no single message is emitted. Usually this indicates a console mapped to a wrong address or a missing node for the UART. I tried to reuse a ROCK 5B DTS to see if it made any difference but no, everything seems to hang. I'll have to investigate later.

PCIe limitations

The organization of the PCIe lines around the SoC on this board is really great, should I say optimal. 2 Gen3 lines are used for SATA, two for M.2, and each 2.5GbE port uses one Gen2 line.

This means that we're having the same bandwidth between M.2 and SATA, so as long as the CPU is able to move the bytes, there's enough bandwidth to use M.2 for the 10GbE network adapter.

However I noticed that the MaxPayload is set to 128 bytes on all devices while they are all capable of 256:

root@rock-5-itx:~# lspci -vv | grep MaxPayload | cut -f1 -d,
                DevCap: MaxPayload 256 bytes
                        MaxPayload 128 bytes
                DevCap: MaxPayload 512 bytes
                        MaxPayload 128 bytes
                DevCap: MaxPayload 256 bytes
                        MaxPayload 128 bytes
                DevCap: MaxPayload 256 bytes
                        MaxPayload 128 bytes
                DevCap: MaxPayload 256 bytes
                        MaxPayload 128 bytes
                DevCap: MaxPayload 256 bytes
                        MaxPayload 128 bytes
                DevCap: MaxPayload 256 bytes
                        MaxPayload 128 bytes
                DevCap: MaxPayload 256 bytes
                        MaxPayload 128 bytes

All the "DevCap" lines indicate what the device is capable of. The other ones indicate what was negotiated. 128 bytes cause an efficiency of 81.8% (128/(128+26)*128/130), while 256 bytes would reach 89.4% (256/(256+26)*128/130) and achieve a 9.2% performance increase. It might be worth finding what is causing this limitation.

Closing words

That's all for now. This board is amazing from a hardware perspective. First, it looks extremely clean and well designed. Second, its I/O distribution makes an optimal use of the SoC's capability. The SoC doesn't heat that much and I managed to leave the fan disconnected during operation and this will be its target state anyway. The onboard DC-DC converters show a much higher efficiency than the two other options I tested, which also indicates a choice of great components.

I missed a reset button on the board, and a USB console connector on the back (there's not much room for this at the back, but maybe some combo connectors now exist with an extra USB-C connector that could appear above / below the RJ45 connectors for example, or maybe atop the existing USB-C one). If / when the board adopts a UEFI installer (the SPI NOR still remains empty), then the console will no longer be needed.

One point that I really disliked is the annoying Roobi installer that made everything more complicated than usual. Furthermore, the fact that it confiscates the only storage available to put an operating system seriously needs to be revisited. This is a totally bogus choice. I'm definitely not going to install the OS on a micro-SD, that's the place for an installer. And I'm not going to put the OS on a data disk either. Having had to deal with that painful experience in the past, making it super complicated to exchange data disks when trying to recover data or just for a migration, I seriously don't want to do that again.

Thomas Kaiser showed me that Radxa recently merged a patch in their kernel to disable the eMMC, and only reserve it for the installer. Not only this makes no sense, but it just voids the interest of the product for me if it only leaves me with the option to remove one data disk and cut the I/O performance and storage capacity by 1/3. And by the way, the 16MB SPI NOR is still unused and 16MB is plenty to store an installer, I'm personally stuffing full-featured OS on that on other machines!

Just like with the ROCK 5B, I'll keep this device reserved for testing for as long as it will not have an LTS mainline kernel available (likely by the end of the year). I've ordered new SSDs to run better tests, and I'll have to run many more tests and also to test other distros (notably Slackware ARM64). I'll also evaluate how it deals with HTTP/HTTPS load balancing now that it has a good NIC ;-)