2013-02-24

Mirabox: much better than GuruPlug

This quick review aims at describing my first contact with GlobalScale's Mirabox, how it compares with the other machines I've used before, namely the GuruPlug and the Dockstar, then how I managed to unbrick it.

Switching to the Mirabox

The Dockstar is nice but still limited to one port and has limited performance. After all the overheating issues, GlobalScale finally abandonned the GuruPlug Server Plus and replaced it with the DreamPlug which was a much nicer and safer design. I had one in hands and was considering buying one. I'm among the people who complain a lot about poor quality and point the finger at companies who put awful products on the market and do nothing to fix them. But when these people go back to the blackboard to completely redesign the product, I applaud. So I was OK with ordering a new product from them again.

When wandering on GlobalScale's web site last year, I noticed some teasing for the upcoming Smile Box, then the Mirabox. Both were using the same platform, a new Marvell Armada370 (ARMv7) at 1.2 GHz. The Mirabox has 1GB of RAM, 1GB of NAND flash, 2 GigE ports, a PCIe port, a MicroSD slot, an RTC with a battery, 2 USB3 ports, well it looked really nice. It was not much more expensive than the DreamPlug, so I finally decided to order one as well as a JTAG adapter in case things go wrong.

As soon as I received the box, I couldn't resist opening it. I was quite impressed by the quality of the hardware design. There is a very clean PCB with BGA chips on both sides, not a single wire at all, not even a heatsink. The device is very thin, basically the thickness of the RJ45 ports. There are jumpers inside, as well as serial and JTAG connectors that are compatible with the GuruPlug's adapter. Be careful when opening, the small plastic part which conducts the light from the leds sits in an unstable position and is annoying to reinstall. I finally glued it to the case.

Board inside
Enclosure
Unstable parts
WiFi antenna
Board bottom
Board top
Among some nice things I noted, I found that the internal serial ports were connected to the same serial port as the USB console (which goes to a PL2303 chip), and since they're using pull-ups, both are usable simultaneously. The PL2303 chip is powered by the USB and not by the Mirabox so that you don't lose the ttyUSB from the client when you power-cycle the Mirabox. This is much appreciated, the Snowball should adopt such a design. The device does not heat much and the CPU can always be touched. The MicroSD and MMC internal connectors are directly connected to the USB2.0 ports of the SoC. The jumpers are there to change the CPU/cache/DRAM frequencies though I only identified a few of them at the moment.

The Good, the Bad and the Ugly

A debian is installed on the device. I quickly installed haproxy to see if the network performance was any better than with the Dockstar. I noticed that traffic would not flow at all ! After installing strace, I discovered that the splice() system call would systematically fail in an unexpected way, meaning that some nasty untested patches were applied to the 2.6.35.9 kernel. So I went to their site to download the sources an found none at all. The only thing I could find there were a binary kernel image and a broken boot loader image (Note that a few days ago, the boot loader image was fixed there, and the kernel and U-Boot sources were finally released on Plugcomputer.org).

So I continued the tests by disabling splice() in haproxy and found that the performance was very low due to iptables and conntrack being hard-linked and impossible to disable.

So I looked on the net to find another kernel. I found one in Arch Linux ARM. Fine! Tried to boot it, it booted correctly and UBIFS complained about a lot of errors, then the kernel died consecutive to the inability to mount the root FS. After that the original kernel would also fail to mount the rootfs. Thanks to the captures I had taken earlier, I finally found that the config in the boot loader is wrong about the partitions sizes, which must have been hard-coded into their proprietary kernel that ignores the boot loader's settings. And it seems that UBIFS performs some recovery attempts before failing, resulting in a corrupted FS. Pfff.....

I could boot it with a Formilux rootfs and their proprietary kernel to recover the installation by reformating the partition and reflashing the original rootfs which is provided on their site. I got a bit angry at the product because it's full proprietary and bogus. I want to use it as a gigabit network sniffer and its network performance sucks because of the proprietary kernel!

First hope

Looking at kernel 3.7-rc sources, I found that the platform was recently introduced by the cool guys at Free Electrons, namely Thomas and Grégory, which are already known for porting Linux to a number of devices. So I contacted them to get some pointers and they told me about the Git repository where all their work is. I could test their latest work and could start hacking the device and providing them some feedback on some of their patches.

Second crash

Unfortunately I managed to boot a kernel with the incorrect partition table a second time and it again destroyed my rootfs and marked a lot of blocks as bad... Except that this time even after several passes of flash_erase and nand_write, there were some remains of "UBI" on some blocks, and I couldn't get rid of the fake bad blocks. Probably that the NAND driver is still a bit young... I finally used the nand scrub option in the boot loader to completely erase the rootfs partition... Error! This one is bogus too, it randomly erases other areas, and marks block 0 as bad ! (the boot loader). So now I must not cut power! I only had one attempt left! I reformated the whole flash using nand scrub again and all went fine. I had to reflash the U-Boot boot loader, but the one on the site above was defective. I finally found one on another site. It looked right, contained references to the reference design board. I crossed fingers and flashed it, checked that it was correctly flashed, then rebooted... Aie, Bricked!

BootROM 1.08                                                                    
Booting from NAND flash
BootROM: Bad header at offset 00000000
BootROM: Bad header at offset 00004000
BootROM: Bad header at offset 00008000
BootROM: Bad header at offset 0000C000
...

Second disappointment

My only hope was flashing the NAND via JTAG. I took my GuruPlug JTAG adapter which directly connects to this board and to the GPIO board. But Armada370 is totally unknown from OpenOCD, and no patches are available. Then I was really angry at GlobalScale. They sold me a JTAG device to unbrick the mirabox and which I cannot use at all because the software to use it does not exist! I contacted the support, they offered me to reprogram it for $25, except that shipping costs are as high as the device's price. And they'd reinstall the same bogus kernel that I cannot use. So I declined and thought that I'd rather find some time to try to reverse-engineer the JTAG TAP and the flash controller on a spare week-end. (Note: I was recently told that GlobalScale is considering sponsoring a port to OpenOCD, which is nice then).

New hope

One or two months after leaving the device unused on my desk, I decided to get it to work using OpenOCD. I downloaded all the doc, read it, started editing some board and target files, and could detect the TAP ID, which means the JTAG link is OK. I managed to reset the board via JTAG, which was good.

While reading the OpenOCD doc, I noticed something changed in an xterm behind. It was my minicom connected to the Mirabox which stopped scrolling on the "BootROM Bad Header" messages, and which displayed "Trying to boot from UART" ! UART ? I searched the net and found references to kwboot which is a tool made for loading firmwares into Marvell SoC via the serial port. In fact, Thomas had suggested it to me a while ago but I forgot since I couldn't figure how I could use it. Anyway now I got hope again and left OpenOCD to completely focus on kwboot.

Booting an image using kwboot

So I downloaded kwboot. I initially didn't find it alone, it's integrated into U-Boot in this tree and depends on a number of includes from this tree. I was about to clone the whole repository when I found a precompiled binary version here.

This utility sends a "magic" sequence to the BootROM boot loader installed in the Armada370's ROM (or maybe in the small I2C flash that's next to it, I don't know). The magic is a 8-byte sequence : 0xBB 0x11 0x22 0x33 0x44 0x55 0x66 0x77. The boot loader reads this image before the first attempt to boot, and each time it loops over the whole flash image (which is quite long on the 1 GB NAND). So it's better to start sending the sequence with the Mirabox powered down, and then to power it up. That's why it's very important that USB consoles are powered by the PC and not by the device!.

Fixing the boot loader image

The original U-Boot image could not be loaded via kwboot. The reason is that the first byte of the image indicates the boot device. Here I have 0x8B which indicates the image boots from the NAND. To boot on the serial port, we need 0x69. Changing it by hand will not work because there's a checksum. kwboot is able to patch this byte and recompute the checksum. But it looks like the cheksumming algorithm has changed on this new platform, because a fixed image does not boot either, and the original image does not show the correct checksum.

I found another tool, kwuartboot. I tried it in case it would handle a different checksumming algorithm, but that did not work either, it failed very similarly. So I concluded that I had to find how to regenerate my boot loader image to boot from the serial port and write the correct checksum.

Find new doimage

I found various incompatible versions of the "doimage" utility used to produce the boot image. I finally found that a version packaged for ArmadaXP was compatible with my Mirabox. (Note that since the patched U-Boot sources have finally been released, there is no need anymore to use the ArmadaXP image). Here's how I managed to rebuild a working doimage utility :

1. retrieve this U-Boot patch

2. rebuild the sources from the patch :

$ mkdir tmp-tools-doimage
$ cd tmp-tools-doimage
$ patch -Ntp1 < ../u-boot-2009.08-mv78460-20110404.patch >/dev/null 2>&1
$ cd tools
$ tar cf - doimage_armada_xp | gzip -9 > ../../doimage_armada_xp.tar.gz
$ cd ../..
$ rm -rf tmp-tools-doimage
3. rebuilt the executable from the sources :
$ tar zxf doimage_armada_xp.tar.gz
$ cd doimage_armada_xp
$ make

Find the various parts in the original image

The sources helped me a lot understand the image format. There are two binary images included in the U-Boot image, one is the DDR3 initialization code, and another one is the boot loader itself. There are several headers in the image and checksums that are computed on the global image. So it is possible to extract the embedded images, if necessary to modify them, then to reassemble them together and put the 32-bit checksum on them. After probably more than one hundred of attempts, I found that I needed the following parts of my original mtd0 partition :
TypeOffsetLengthmd5sumBegins with
DDR30x2448584f4b165ce...02 00 00 00 5B 00 00 00
U-Boot0xC000675420c739a005...12 00 00 EA 14 F0 9F E5
These images can then be assembled together using the shiny new doimage utility, to produce a new u-boot.bin image :
$ doimage -T uart -D 0x600000 -E 0x6A0000 -G mtd0-hdr.bin \
  mtd0-uboot.bin u-boot.bin

Still fails to load at 48k and issue DDR3 on the console

I then tried to flash the image this way :
$ kwboot -b u-boot.bin /dev/ttyUSB0

Unfortunately, the mirabox would still reject this image, but in a new and more consistent way :   it systematically rejects the image after 48kB of data, and the green LED "D5" turns on   just at the moment of the hang. I switched to kwuartboot to see if I got the same error, and it   behaved exactly the same way. So I modified it to display the invalid characters it got that   confused it, and I saw "DDR3 Training Sequence". Wow! This means that the DDR3 code was   executed, because this line is normally displayed at boot when the Mirabox boots. So I suspected   that the DDR3 initialization code is loaded into the cache or some SRAM on the device, which must   then initialize DDR3 to load the rest. So if this code is executed early during the boot sequence   and displays things on the console port I'm using to upload an image, it is conceivable that it   breaks the upload sequence... 

Patch the DDR3 thing to talk to ttyS1 instead

So I wondered how to shut that DDR3 code down. Since the device has two serial ports, I thought that it could be easier to make it chat on ttyS1 instead. ttyS0 is located at 0x10000 and ttyS1 is at 0x12000. I looked for occurrences of 0x10000 in the image and patched them to use 0x12000 instead. I was a bit scared because on ARM you don't have many bits for immediate values, and I was afraid not to be able to add a 2 there if some higher bit was set. This was not the case, the values were used absolute. I found 4 locations which needed to be patched in mtd0-hdr.bin : 0x1BC0, 0x1C08, 0x1D0C, 0x1D30. I then rebuilt the whole image using doimage :
$ doimage -T uart -D 0x600000 -E 0x6A0000 -G mtd0-hdr-uart1.bin \
  mtd0-uboot.bin u-boot-uart1.bin

New attempt to boot

Last step was to try to boot the image again :
$ ./kwboot  -b u-boot-uart1.bin -t /dev/ttyUSB0
Sending boot message. Please reboot the target...\
Sending boot image...
0 % [......................................................................]
1 % [......................................................................]
2 % [......................................................................]
...
Well, it goes further than 48kB this time, let's cross fingers... After about 45s, I got this :
99 % [.........................................................]
[Type Ctrl-\ + c to quit]

__   __                      _ _
|  \/  | __ _ _ ____   _____| | |
| |\/| |/ _` | '__\ \ / / _ \ | |
| |  | | (_| | |   \ V /  __/ | |
|_|  |_|\__,_|_|    \_/ \___|_|_|
        _   _     ____              _
       | | | |   | __ )  ___   ___ | |_ 
       | | | |___|  _ \ / _ \ / _ \| __| 
       | |_| |___| |_) | (_) | (_) | |_ 
        \___/    |____/ \___/ \___/ \__| 
** LOADER **

U-Boot 2009.08 (Sep 16 2012 - 22:50:06)Marvell version: 1.1.2 NQ
U-Boot Addressing:
       Code:            00600000:006AFFF0
       BSS:             006F8E40
       Stack:           0x5fff70
       PageTable:       0x8e0000
       Heap address:    0x900000:0xe00000
Board: DB-88F6710-BP
SoC:   MV6710 A1
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1200Mhz, L2 @ 600Mhz
       DDR @ 600Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access
PEX 0: Detected No Link.
PEX 1: Root Complex Interface, Detected Link X1
DRAM:   1 GB
       CS 0: base 0x00000000 size 512 MB
       CS 1: base 0x20000000 size 512 MB
       Addresses 14M - 0M are saved for the U-Boot usage.
NAND:  1024 MiB
Bad block table found at page 262016, version 0x01
Bad block table found at page 261888, version 0x01
FPU not initialized
USB 0: Host Mode
USB 1: Host Mode
Modules/Interfaces Detected:
       RGMII0 Phy
       RGMII1 Phy
       PEX0 (Lane 0)
       PEX1 (Lane 1)
phy16= 72 
phy16= 72 
MMC:   MRVL_MMC: 0
Net:   egiga0 [PRIME], egiga1
Hit any key to stop autoboot:  0 
Marvell>>

Yesss! For those who want to experiment with this without bricking their devices,   I'm putting here  this working image with a modified prompt to display "Recover>>" instead of   "Marvell>>" so that it's always easy to tell what boot loader you're running from. 

Now reflash the recovered device

Now that I'm on the device again for the first time in a few months, I don't want to see it go away. It's urgent to erase and reflash it. WARNING! do not copy-paste what follows if you don't understand what it is about, it will wipe out your whole device and may render it unusable! First, erase the whole flash :
Marvell>> nand erase clean
Marvell>>

Let's check that the flash correctly shows 0xff everywhere :
Marvell>> nand dump 0
Page 00000000 dump:
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
        ...
        ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
OOB:                                                                            
        ff ff ff ff ff ff ff ff
        ...
        ff ff ff ff ff ff ff ff
Marvell>>

OK. We'll pre-fill the memory with 0xFF before loading the boot loader there, to avoid   flashing crap : 
Marvell>> mw.l 0x7000000 0xffffffff 0x00100000
Marvell>> md.b 0x7000000 100
07000000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ...............
...
070000f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ...............
Marvell>>
Now copy the original mtd0 image from a TFTP server (from a file called "mtd0" there) to that location :
Marvell>> tftpboot 0x7000000 mtd0
Marvell>> md.b 0x7000000 100
07000000: 8b 00 00 00 60 4e 0a 00 01 00 00 c0 00 c0 00 00    ....`N..........
07000010: 00 00 60 00 00 00 6a 00 00 02 01 00 00 00 01 0a    ..`...j.........
...
070000f0: aa 25 ad 00 ed 19 2b 60 00 23 ab 20 80 00 c0 19    .%....+`.#. ....
Marvell>>

The contents look fine (note the 8b at the beginning which means that   this is a NAND flash image). If everything looks OK and only in this case, you   can write the memory contents down to the flash then control that it looks similar to   the dump above : 
Marvell>> nand write 0x7000000 0 0x00400000
Marvell>> nand dump 0

Then reset the device, it should reboot directly from the flash. If it fails, you just   have to retry the procedure above and figure what you got wrong. 
Marvell>> reset

Back to hacking the device again

Since the device is fixed and I don't fear losing it anymore, I'm back playing with it. I'm running it with a 3.8 kernel with all development patches from Free-Electrons, as well as a recent attempt I made to port Marvell's NAND Flash Controller driver to this kernel. Right now it works but the code is ugly, contains many copy-pastes and I don't trust it a lot. I found it safer to buy a microSD card and install my FS there.

I could also remove some of Marvell's patches from their ugly kernel (I could get splice() to work again) and rebase it on 2.6.35.14, but at this point in time, I think it does not make sense to spend more time with this dead kernel, better try to get most features working with 3.8 and 3.9-rc. Oh, and BTW, UBIFS managed to destroy my rootfs again using Marvell's kernel when it tried to mount an uncleanly shutdown rootfs! I don't know if it's the FS or the Flash controller which is to blame, but what's certain is that a filesystem driver should not destroy the data it's responsible for, and that it should at least offer tools to fix devices which report errors. Here, after any minimal error, the file system is definitely lost. So that's one more reason for switching to a replaceable microSD for the root FS.
There are still a number of issues that I would like to see fixed in future versions :
  • the mtdparts variable in the boot loader does not match the real partition size, which apparently is responsible for UBIFS corrupting my root FS.
  • the Ethernet MAC addresses at the bottom of the box are different from those on the sticker on the board ! I don't know which ones are the right one, so if I plug this device on a network with another one sharing the same addresses, it could cause trouble.
  • their bogus kernel needs some fixing. I don't understand why they patched (and broke) the splice() system call. Also, having iptables+conntrack hard-linked really is problematic (and there is no NOTRACK target).
  • the boot loader only reserves 4 MB for the kernel, which is too small for development kernels. Anyway you'll probably have to repartition the flash after the rootfs gets corrupted.
  • the internal jumpers and connectors are not documented.
  • I tried a dual-gigE mini-PCIe boards (i350-based Jetway ADMPEIDLA) and it did not work, it was not even detected. I don't know which one is the culprit since the NIC works on an Atom board and other cards work on this board.

A few captures

Here come a few captures of what people always want to see from a new board :-)

cpuinfo

# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 1 (v7l)
BogoMIPS        : 1196.85
Features        : swp half fastmult vfp edsp vfpv3 vfpv3d16 tls
CPU implementer : 0x56
CPU architecture: 7
CPU variant     : 0x1
CPU part        : 0x581
CPU revision    : 1

Hardware        : Marvell Armada 370/XP (Device Tree)
Revision        : 0000
Serial          : 0000000000000000

meminfo

# cat /proc/meminfo
MemTotal:        1034348 kB
MemFree:          998484 kB
Buffers:            4476 kB
Cached:             9820 kB
SwapCached:            0 kB
Active:             7116 kB
Inactive:           8576 kB
Active(anon):       1768 kB
Inactive(anon):       76 kB
Active(file):       5348 kB
Inactive(file):     8500 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:        270336 kB
HighFree:         248016 kB
LowTotal:         764012 kB
LowFree:          750468 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          1424 kB
Mapped:             2484 kB
Shmem:               448 kB
Slab:               4388 kB
SReclaimable:        952 kB
SUnreclaim:         3436 kB
KernelStack:         280 kB
PageTables:          152 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      703356 kB
Committed_AS:       5236 kB
VmallocTotal:     245760 kB
VmallocUsed:        6120 kB
VmallocChunk:     237500 kB

lspci

# lspci -nnv
00:01.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (pro
g-if 00 [Normal decode])
        Flags: bus master, 66MHz, user-definable features, ?? devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Capabilities: [fc] <chain broken>

00:02.0 PCI bridge [0604]: Marvell Technology Group Ltd. Device [11ab:7846] (pro
g-if 00 [Normal decode])
        Flags: bus master, 66MHz, user-definable features, ?? devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        Memory behind bridge: c1000000-c10fffff
        Capabilities: [fc] <chain broken>

02:00.0 USB Controller [0c03]: Device [1b73:1009] (rev 02) (prog-if 30)
        Subsystem: Device [1b73:0000]
        Flags: bus master, fast devsel, latency 0, IRQ 105
        Memory at c1000000 (64-bit, non-prefetchable) [size=64K]
        Memory at c1010000 (64-bit, non-prefetchable) [size=4K]
        Memory at c1011000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: xhci_hcd

iomem

# cat /proc/iomem
00000000-3fffffff : System RAM
  00008000-0044b073 : Kernel code
  00738000-007f6393 : Kernel data
c1000000-c100ffff : xhci_hcd
d0010300-d001031f : d0010300.rtc
d0012000-d001201f : serial
d0018100-d001813f : /soc/gpio@d0018100
d0018140-d001817f : /soc/gpio@d0018140
d0018180-d00181bf : /soc/gpio@d0018180
d0050000-d00504ff : ehci_hcd
d0051000-d00514ff : ehci_hcd

interrupts

# cat /proc/interrupts
           CPU0
 16:      29435  armada_370_xp_irq  armada_370_xp_per_cpu_tick
 17:       2592  armada_370_xp_irq  serial
 23:        684  armada_370_xp_irq  mvneta
 26:          1  armada_370_xp_irq  d0010300.rtc
 27:        231  armada_370_xp_irq  ehci_hcd:usb1
 28:          0  armada_370_xp_irq  ehci_hcd:usb2
105:          0  armada_370_xp_irq  xhci_hcd:usb3
106:          2  armada_370_xp_irq  d0060800.xor
107:          2  armada_370_xp_irq  d0060800.xor
108:          2  armada_370_xp_irq  d0060900.xor
109:          2  armada_370_xp_irq  d0060900.xor
Err:          0

dmesg

Booting Linux on physical CPU 0x0
Linux version 3.8.0-mbx (willy@pcw) (gcc version 4.5.2 (Sourcery G++ Lite 2011.0
3-41) ) #2 Sun Feb 24 11:58:31 CET 2013
CPU: ARMv7 Processor [561f5811] revision 1 (ARMv7), cr=10c53c7d
CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
Machine: Marvell Armada 370/XP (Device Tree), model: Globalscale Mirabox
Memory policy: ECC disabled, Data cache writeback
On node 0 totalpages: 262144
free_area_init_node: node 0, pgdat c075ba60, node_mem_map c0b90000
  Normal zone: 1520 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 193040 pages, LIFO batch:31
  HighMem zone: 528 pages used for memmap
  HighMem zone: 67056 pages, LIFO batch:15
pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
pcpu-alloc: [0] 0
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 260096
Kernel command line: console=ttyS0,115200 mtdparts=armada-nand:4m(uboot),4m(uima
ge),8m(nv),16m(rescue),480m(rootfs),-(pad) ubi.mtd=4 root=/dev/ram0
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
__ex_table already sorted, skipping sort
Memory: 1024MB = 1024MB total
Memory: 1020756k/1020756k available, 27820k reserved, 270336K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    vmalloc : 0xf0000000 - 0xff000000   ( 240 MB)
    lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
    pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
    modules : 0xbf000000 - 0xbfe00000   (  14 MB)
      .text : 0xc0008000 - 0xc044b074   (4365 kB)
      .init : 0xc044c000 - 0xc0736934   (2987 kB)
      .data : 0xc0738000 - 0xc075c480   ( 146 kB)
       .bss : 0xc075c480 - 0xc07f6394   ( 616 kB)
NR_IRQS:16 nr_irqs:16 16
Aurora cache controller enabled
l2x0: 4 ways, CACHE_ID 0x00000100, AUX_CTRL 0x1a086302, Cache size: 262144 B
sched_clock: 32 bits at 18MHz, resolution 53ns, wraps every 229064ms
Calibrating delay loop... 1196.85 BogoMIPS (lpj=5984256)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
Setting up static identity map for 0x354c20 - 0x354c78
devtmpfs: initialized
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
DMA: preallocated 1024 KiB pool for atomic coherent allocations
irq: Cannot allocate irq_descs @ IRQ33, assuming pre-allocated
irq: Cannot allocate irq_descs @ IRQ69, assuming pre-allocated
irq: Cannot allocate irq_descs @ IRQ102, assuming pre-allocated
Initializing Coherency fabric
bio: create slab <bio-0> at 0
mvebu-pcie pcie-controller.1: PCIe0.0: link down
mvebu-pcie pcie-controller.1: PCIe1.0: link up
mvebu-pcie pcie-controller.1: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x0000-0xffff]
pci_bus 0000:00: root bus resource [mem 0xc1000000-0xc8ffffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci_bus 0000:00: scanning bus
pci 0000:00:01.0: [11ab:7846] type 01 class 0x060400
pci 0000:00:01.0: calling pci_fixup_ide_bases+0x0/0x11c
pci 0000:00:02.0: [11ab:7846] type 01 class 0x060400
pci 0000:00:02.0: calling pci_fixup_ide_bases+0x0/0x11c
pci_bus 0000:00: fixups for bus
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: scanning [bus 00-00] behind bridge, pass 0
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:02.0: scanning [bus 00-00] behind bridge, pass 0
pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:01.0: scanning [bus 00-00] behind bridge, pass 1
pci_bus 0000:01: scanning bus
pci_bus 0000:01: fixups for bus
PCI: bus1: Fast back to back transfers enabled
pci_bus 0000:01: bus scan returning with max=01
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
pci 0000:00:02.0: scanning [bus 00-00] behind bridge, pass 1
pci_bus 0000:02: scanning bus
pci 0000:02:00.0: [1b73:1009] type 00 class 0x0c0330
pci 0000:02:00.0: reg 10: [mem 0x42000000-0x4200ffff 64bit]
pci 0000:02:00.0: reg 18: [mem 0x42010000-0x42010fff 64bit]
pci 0000:02:00.0: reg 20: [mem 0x42011000-0x42011fff 64bit]
pci 0000:02:00.0: calling pci_fixup_ide_bases+0x0/0x11c
pci 0000:02:00.0: supports D1
pci 0000:02:00.0: PME# supported from D0 D1 D3hot
pci 0000:02:00.0: PME# disabled
pci_bus 0000:02: fixups for bus
PCI: bus2: Fast back to back transfers disabled
pci_bus 0000:02: bus scan returning with max=02
pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
pci_bus 0000:00: bus scan returning with max=02
pci 0000:00:01.0: fixup irq: got 135
pci 0000:00:01.0: assigning IRQ 135
pci 0000:00:02.0: fixup irq: got 135
pci 0000:00:02.0: assigning IRQ 135
pci 0000:02:00.0: fixup irq: got 105
pci 0000:02:00.0: assigning IRQ 105
pci 0000:00:02.0: BAR 8: assigned [mem 0xc1000000-0xc10fffff]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:02:00.0: BAR 0: assigned [mem 0xc1000000-0xc100ffff 64bit]
pci 0000:02:00.0: BAR 0: set to [mem 0xc1000000-0xc100ffff 64bit] (PCI address [
0xc1000000-0xc100ffff])
pci 0000:02:00.0: BAR 2: assigned [mem 0xc1010000-0xc1010fff 64bit]
pci 0000:02:00.0: BAR 2: set to [mem 0xc1010000-0xc1010fff 64bit] (PCI address [
0xc1010000-0xc1010fff])
pci 0000:02:00.0: BAR 4: assigned [mem 0xc1011000-0xc1011fff 64bit]
pci 0000:02:00.0: BAR 4: set to [mem 0xc1011000-0xc1011fff 64bit] (PCI address [
0xc1011000-0xc1011fff])
pci 0000:00:02.0: PCI bridge to [bus 02]
pci 0000:00:02.0:   bridge window [mem 0xc1000000-0xc10fffff]
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
pci 0000:00:01.0: enabling bus mastering
PCI: enabling device 0000:00:02.0 (0140 -> 0143)
pci 0000:00:02.0: enabling bus mastering
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
Switching to clocksource armada_370_xp_clocksource
NET: Registered protocol family 2
TCP established hash table entries: 8192 (order: 4, 65536 bytes)
TCP bind hash table entries: 8192 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 8192 bind 8192)
TCP: reno registered
UDP hash table entries: 512 (order: 1, 8192 bytes)
UDP-Lite hash table entries: 512 (order: 1, 8192 bytes)
NET: Registered protocol family 1
pci 0000:02:00.0: calling quirk_usb_early_handoff+0x0/0x8c4
PCI: CLS 64 bytes, default 64
Trying to unpack rootfs image as initramfs...
rootfs image is not initramfs (junk in compressed archive); looks like an initrd
Freeing initrd memory: 10608K
bounce pool size: 64 pages
squashfs: version 4.0 (2009/01/31) Phillip Lougher
msgmni has been set to 1486
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
io scheduler noop registered
io scheduler deadline registered
io scheduler cfq registered (default)
armada-370-pinctrl d0018000.pinctrl: registered pinctrl driver
mv_xor d0060800.xor: Marvell XOR driver
mv_xor d0060800.xor: Marvell XOR: ( xor cpy )
mv_xor d0060800.xor: Marvell XOR: ( xor fill cpy )
mv_xor d0060900.xor: Marvell XOR driver
mv_xor d0060900.xor: Marvell XOR: ( xor cpy )
mv_xor d0060900.xor: Marvell XOR: ( xor fill cpy )
Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
d0012000.serial: ttyS0 at MMIO 0xd0012000 (irq = 17) is a 8250
console [ttyS0] enabled
brd: module loaded
loop: module loaded
libphy: orion_mdio_bus: probed
mvneta d0070000.ethernet eth0: mac: f0:ad:4e:01:a5:f3
mvneta d0074000.ethernet eth1: mac: f0:ad:4e:01:a5:f4
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
orion-ehci d0050000.usb: Marvell Orion EHCI
orion-ehci d0050000.usb: new USB bus registered, assigned bus number 1
orion-ehci d0050000.usb: irq 27, io mem 0xd0050000
orion-ehci d0050000.usb: USB 2.0 started, EHCI 1.00
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: Marvell Orion EHCI
usb usb1: Manufacturer: Linux 3.8.0-mbx ehci_hcd
usb usb1: SerialNumber: d0050000.usb
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
orion-ehci d0051000.usb: Marvell Orion EHCI
orion-ehci d0051000.usb: new USB bus registered, assigned bus number 2
orion-ehci d0051000.usb: irq 28, io mem 0xd0051000
orion-ehci d0051000.usb: USB 2.0 started, EHCI 1.00
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: Marvell Orion EHCI
usb usb2: Manufacturer: Linux 3.8.0-mbx ehci_hcd
usb usb2: SerialNumber: d0051000.usb
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 1 port detected
ehci-pci: EHCI PCI platform driver
xhci_hcd 0000:02:00.0: enabling bus mastering
xhci_hcd 0000:02:00.0: xHCI Host Controller
xhci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 3
xhci_hcd 0000:02:00.0: enabling Mem-Wr-Inval
xhci_hcd 0000:02:00.0: irq 105, io mem 0xc1000000
usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: xHCI Host Controller
usb usb3: Manufacturer: Linux 3.8.0-mbx xhci_hcd
usb usb3: SerialNumber: 0000:02:00.0
xHCI xhci_add_endpoint called for root hub
xHCI xhci_check_bandwidth called for root hub
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
xhci_hcd 0000:02:00.0: xHCI Host Controller
xhci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 4
usb usb4: New USB device found, idVendor=1d6b, idProduct=0003
usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: xHCI Host Controller
usb usb4: Manufacturer: Linux 3.8.0-mbx xhci_hcd
usb usb4: SerialNumber: 0000:02:00.0
xHCI xhci_add_endpoint called for root hub
xHCI xhci_check_bandwidth called for root hub
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
Initializing USB Mass Storage driver...
usbcore: registered new interface driver usb-storage
USB Mass Storage support registered.
rtc-mv d0010300.rtc: rtc core: registered d0010300.rtc as rtc0
i2c /dev entries driver
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
IPv4 over IPv4 tunneling driver
TCP: bic registered
TCP: cubic registered
NET: Registered protocol family 17
8021q: 802.1Q VLAN Support v1.8
VFP support v0.3: implementor 56 architecture 2 part 20 variant 9 rev 6
UBI error: ubi_init: UBI error: cannot initialize UBI, error -19
rtc-mv d0010300.rtc: setting system clock to 2013-02-24 14:03:59 UTC (1361714639
)
Warning: unable to open an initial console.
Freeing init memory: 2984K
usb 1-1: new high-speed USB device number 2 using orion-ehci
usb 1-1: New USB device found, idVendor=1a40, idProduct=0101
usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
usb 1-1: Product: USB 2.0 Hub
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usb 1-1.1: new high-speed USB device number 3 using orion-ehci
usb 1-1.1: New USB device found, idVendor=05e3, idProduct=0723
usb 1-1.1: New USB device strings: Mfr=3, Product=4, SerialNumber=0
usb 1-1.1: Product: USB Storage
usb 1-1.1: Manufacturer: Generic
usb-storage 1-1.1:1.0: Quirks match for vid 05e3 pid 0723: 8000
scsi0 : usb-storage 1-1.1:1.0
usb 1-1.2: new high-speed USB device number 4 using orion-ehci
usb 1-1.2: New USB device found, idVendor=05e3, idProduct=0723
usb 1-1.2: New USB device strings: Mfr=3, Product=4, SerialNumber=0
usb 1-1.2: Product: USB Storage
usb 1-1.2: Manufacturer: Generic
usb-storage 1-1.2:1.0: Quirks match for vid 05e3 pid 0723: 8000
scsi1 : usb-storage 1-1.2:1.0
scsi 0:0:0:0: Direct-Access     Generic  STORAGE DEVICE   9451 PQ: 0 ANSI: 0
sd 0:0:0:0: [sda] Attached SCSI removable disk
scsi 1:0:0:0: Direct-Access     Generic  STORAGE DEVICE   9451 PQ: 0 ANSI: 0
sd 1:0:0:0: [sdb] 15523840 512-byte logical blocks: (7.94 GB/7.40 GiB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 03 00 00 00
sd 1:0:0:0: [sdb] No Caching mode page present
sd 1:0:0:0: [sdb] Assuming drive cache: write through
sd 1:0:0:0: [sdb] No Caching mode page present
sd 1:0:0:0: [sdb] Assuming drive cache: write through
 sdb: sdb1 sdb2 sdb3 < sdb5 > sdb4
sd 1:0:0:0: [sdb] No Caching mode page present
sd 1:0:0:0: [sdb] Assuming drive cache: write through
sd 1:0:0:0: [sdb] Attached SCSI removable disk
mvneta d0070000.ethernet eth0: link up

Jumpers

Zoom on jumpers
Jumpers are numberred like this : J7 J4 J6 J5 J9 J8 J1 J2 J3. Pin 1 is at the top, pin 2 in the middle, and pin 3 at the bottom. We'll note 1 for jumpers between pins 1 and 2, and 0 for jumpers connecting pins 2 and 3.

The only combinations I found to work so far are the following ones. Overall we could say that J6/J5/J2 affect the CPU/L2/DDR frequencies, and athat J7/J4/J9 prevent the system from booting if changed.

CPU@1200 MHz, L2@600, DDR@600 (Default settings)

J7J4J6J5J9J8J1J2J3
011000101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1200Mhz, L2 @ 600Mhz
       DDR @ 600Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1200, L2@800, DDR@400

J7J4J6J5J9J8J1J2J3
011000111
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1200Mhz, L2 @ 800Mhz
       DDR @ 400Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1000, L2@500, DDR@500

J7J4J6J5J9J8J1J2J3
010000101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1000Mhz, L2 @ 500Mhz
       DDR @ 500Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1067, L2@534, DDR@534

J7J4J6J5J9J8J1J2J3
010100101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1067Mhz, L2 @ 534Mhz
       DDR @ 534Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

CPU@1333, L2@667, DDR@667

Note: this configuration did not boot.
J7J4J6J5J9J8J1J2J3
011100101
CPU:   Marvell PJ4B v7 UP (Rev 1) LE
       CPU @ 1333Mhz, L2 @ 667Mhz
       DDR @ 667Mhz, TClock @ 200Mhz
       DDR 16Bit Width, FastPath Memory Access

Useful links

Some news from the GuruPlug

Some of you probably remember my very disappointing experience with the Guruplug Server Plus 2.5 years ago. This device was overheating and very poorly designed. I was very angry at GlobalScale's for their crappy design and quite clearly I was not the only one considering the number of people reporting such issues with various fixing methods.

I finally had to resort to similar methods :-(. I had to remove the dangerous power supply from the box, and use its space to install a power plug, a large heatsink, and a thermally-controlled fan. I bought an external 5V/2A power supply and that resulted in a device that I could carry with me for various network experiments requiring two ports. Recently I also installed a small 4-pin connector to access the I2C bus from the outside to ease development.
Power plug
Fan
Heatsink

Switching to the Dockstar

I discovered the Seagate Dockstar, too late, when it was almost impossible to find one. I could put my hands on 4 of them, 3 of which were for friends. This device is tiny, uses the same CPU as the GuruPlug, can be powered from a single USB port with an easy mod, has free space to add a serial port, and is cheap. I always carry it with me everywhere and it's my network testing peer. It's much more convenient than the GuruPlug when a single ethernet port is required, and it can be powered by the GuruPlug's USB port when both are needed. Unfortunately it's impossible to find it now, and if you have one you don't use anymore, feel free to send it to me, I will be really pleased. Here are a few photos of the mods I made to a few devices.

2011-12-18

Elastic Binary Trees - ebtree

Administrivia

This article was initially posted on Wikipedia in 2008, which explains why it's written at the 3rd person and looks familiar to articles on binary trees. A few moderators decided that the article was only self-promotion and removed it without notification a few days after its publication. After many arguments with them, I was told that Wikipedia cannot hold original work, only copies of what can be found somewhere else, and I gave up the fight, realizing that for once I found dumber than me! At least they accepted to let me retrieve the original work so that I can publish it somewhere else (I didn't even have a copy of it). Needless to say I have stopped donating to Wikipedia since then. Three years later, I spent part of a week-end putting this online here. When I have time, I'll rewrite it differently.

Introduction

In computer science, an elastic binary tree (or EB tree or EBtree) is a binary search tree specially optimized to very frequently store, retrieve and delete discrete integer or binary data without having to deal with memory allocation. It is particularly well suited for operating system schedulers where fast time-ordering and priority-ordering are strong requirements. Insertion and lookups are performed in O(log n) while removal is done in O(1). The tree is not balanced but its height is bound by the type of data to store in it. Ordered duplicate entries are also natively supported.

The elastic binary tree was invented between 2002 and 2007 by Willy Tarreau as part of a research work on an event-based task scheduler for user-space network applications. This type of components require a sorted list of future events, and insertion and removal are very common operations which must be optimized. An early implementation relied on a naive linked list approach, soon replaced with a fast radix tree. The disadvantages of the radix tree is that it is often necessary to allocate memory for one node to store a value, and that some garbage collection must be performed after removal in order to eliminate unused nodes. There are situations where this is not desirable nor even possible. An alternative was to use a balanced tree, but the very frequent removal operation costs O(log n).

The solution was to find some sort of hybrid type of tree between both, where each inserted entry would carry both an intermediate nodes and a leaf, both of which would stray from each other as the tree leaves, thus stretching the initial node across levels, hence the name "elastic binary tree".

License

The concept and algorithms have been released under public domain. The author's implementation was first released under GPL then switched to LGPL for more openness. The algorithm is simple enough to allow other implementations to be initiated for very specific needs.

Definitions

In an EB tree, data are stored in EB nodes. An EB node contains two parts :
  • the node part, which is responsible for tying together upper and lower nodes or leaves ;
  • the leaf part, which carries the data (an integer key), and keeps a pointer to its upper node (leaf_p).

The node part itself is composed of a parent pointer (node_p), a level (bit), and a root, which itself links to lower nodes constituting a subtree.

The level part represents the lowest bit position in the key, above which all bits are equal for all keys in the lower subtrees.

A root only contains two links called branches, which represent the value of the bit below current node's. There is always a left and a right branch, except for the top of the tree. Left branch represents bit value 0 while right branch reprensents value 1.

Branches are typed. This means that a node knows whether its branches point to other nodes or to leaves. Parent pointers are typed too. This means that each node or leaf knows if it is attached to the left or the right branches of the upper root.

A typical EB tree begins with a root at its head, containing 0 or 1 branch, below which the tree is complete. The right branch of an EB tree root is always empty.

Features

EB trees offer a wide range of useful data manipulation features :
  • Lookup :
    • lookup first or last key
    • exact match: lookup exact key (signed/unsigned integer, poitner, string or memory block)
    • longest match: lookup longest matching key (network address match)
    • lookup first matching prefix: retrieve first entry matching the beginning of a key
    • lookup closest smaller value
    • lookup closest greater value
    • lookup previous or next different value: quickly skip duplicates
    • lookups of duplicate keys always performed in key insertion order
  • Insertion :
    • standard key insertion : if the key exists, create a duplicate entry
    • unique key insertion : if the key exists, return the existing one

Complexity

EB trees algorithmic complexity can be derived from 3 variables :
  • the number of possible different keys in the tree : P
  • the number of entries in the tree : N
  • the number of duplicates for one key : D
EB trees are deliberately not balanced. For this reason, the worst case may happen with a small tree (for instance, 32 distinct keys of one bit). But the operations required to manage such data are so much cheap that they make it worth using these trees even under such conditions. For instance, a balanced tree may require only 6 levels to store those 32 keys when an EB tree will require 32. But if per-level operations are 5 times cheaper, EB tree wins.
Minimal, Maximal and Average times are specified in number of operations. Minimal is given for best condition, maximal for worst condition, and the average is reported for a tree containing random keys. An operation generally consists in jumping from one node to another.
Complexity :
  • lookup  : min=1, max=log(P), avg=log(N)
  • insertion from root : min=1, max=log(P), avg=log(N)
  • insertion of dups  : min=1, max=log(D), avg=log(D)/2 after lookup
  • deletion  : min=1, max=1, avg=1
  • prev/next  : min=1, max=log(P), avg=2 :

 N/2 nodes need 1 hop=> 1*N/2
 N/4 nodes need 2 hops => 2*N/4
 N/8 nodes need 3 hops => 3*N/8
 ...
 N/x nodes need log(x) hops => log2(x)*N/x
 Total cost for all N nodes : sum[i=1..N](log2(i)*N/i) = N*sum[i=1..N](log2(i)/i)
 Average cost across N nodes = total / N = sum[i=1..N](log2(i)/i) = 2

Current EB tree design is limited to only two branches per node. Most of the tree descent algorithm would be compatible with more branches (eg: 4, to cut the height in half), but this would probably require more complex operations and the deletion algorithm would be problematic.

Properties

EB trees have several useful properties :
  • a node is always added above the leaf it is tied to, and never can get below nor in another branch. This implies that leaves directly attached to the root do not use their node part, which is indicated by a NULL value in node_p. This also enhances the cache efficiency when walking down the tree, because when the leaf is reached, its node part will already have been visited (unless it's the first leaf in the tree).
  • pointers to lower nodes or leaves are stored in "branch" pointers. Only the root node may have a NULL in either branch, it is not possible for other branches. Since the nodes are attached to the left branch of the root, it is not possible to see a NULL left branch when walking up a tree. Thus, an empty tree is immediately identified by a NULL left branch at the root. Conversely, the one and only way to identify the root node is to check that it right branch is NULL. Note that the NULL pointer may have a few low-order bits set on some implementations.
  • a node connected to its own leaf will have one and only one of its branches pointing to itself, and leaf_p pointing to itself.
  • a node can never have node_p pointing to itself.
  • a node is linked in a tree if and only if it has a non-null leaf_p.
  • a node can never have both branches equal, except for the root which can have them both NULL.
  • deletion only applies to leaves. When a leaf is deleted, its parent must be released too (unless it's the root), and its sibling must attach to the grand-parent, replacing the parent. Also, when a leaf is deleted, the node tied to this leaf will be removed and must be released too. If this node is different from the leaf's parent, the freshly released leaf's parent will be used to replace the node which must go. A released node will never be used anymore, so there's no point in tracking it.
  • the bit index in a node indicates the bit position in the key which is represented by the branches. That means that a node with (bit == 0) is just above two leaves. Negative bit values are used to build a duplicate tree. The first node above two identical leaves gets (bit == -1). This value logarithmically decreases as the duplicate tree grows. During duplicate insertion, a node is inserted above the highest bit value (the lowest absolute value) in the tree during the right-sided walk. If bit -1 is not encountered (highest < -1), we insert above last leaf. Otherwise, we insert above the node with the highest value which was not equal to the one of its parent + 1.
  • the "eb_next" primitive walks from left to right, which means from lower to higher keys. It returns duplicates in the order they were inserted. The "eb_first" primitive returns the left-most entry.
  • the "eb_prev" primitive walks from right to left, which means from higher to lower keys. It returns duplicates in the opposite order they were inserted. The "eb_last" primitive returns the right-most entry.
  • a tree which has 1 in the lower bit of its root's right branch is a tree with unique nodes. This means that when a node is inserted with a key which already exists will not be inserted, and the previous entry will be returned.

Principles of operations

Insertion



Initially, a tree is empty. It only consists in a root with two NULL pointers (one for each branch).

When the first EB node is inserted, its leaf is directly linked to the root 's left branch. Thus, its node part is not used. It is the only situation where a node part is not used. After that, a new EB node is inserted. A node part is needed between the root and existing leaf to connect the new leaf. This is where the node part provided with the new EB node will be used. The second EB node's node part will have one of its branch pointing to the previous leaf, and the other branch pointing to its own leaf. For this reason, it is not possible to have an empty branch below a node.

When the third EB node is added, it may very well insert itself between the node and the leaf of previous EB node, which will be stretched by the operation. It is important to note that the node part of a stretched EB node is always located somewhere between the root and the leaf of the same EB node. The same principle goes on with further keys, as shown on the diagrams below.
An EB tree with keys 1 to 5
Adding keys 21 to 25 to a previous tree.
In Yellow, the links which changed.

Insertion of duplicates

An EB tree supports duplicates and keeps them ordered. This is a very important property for a fair scheduler because events stored with the same wake-up date must be processed in the same sequence to avoid risks of starvation.
In order to store a duplicate key, we grow a binary sub-tree at the place where the leaf was to be located, using the same algorithm as for the upper tree, but with negative bit offsets. This allows duplicates to be stored without overhead data, and to be walked through using the same functions, without any processing overhead. The nodes order is respected and all nodes will be walked in their insertion order.

Deletion

When a leaf is deleted, its parent must be released too (unless it's the root), and its sibling must attach to the grand-parent, replacing the parent. Also, when a leaf is deleted, the node tied to this leaf will be removed and must be released too. If this node is different from the leaf's parent, the freshly released leaf's parent will be used to replace the node which must go. EB trees do not need to be rebalanced after deletion. This results in tree deletion being performed in O(1). A released node will never be used anymore, so there's no point in tracking it.
In yellow, links affected by the
removal of key 1 from previous EB tree.
In yellow, links affected by removal
of keys 2 to 5 from previous tree.

Tree walk

A tree walk may be performed from left to right or right to left, always following numerical order and duplicates insertion order if applicable. It relies on the following primitives :
  • eb_first(root): return the first node of the tree starting at [root]
  • eb_last(root): return the last node of the tree starting at [root]
  • eb_next(node): return next node after [node]
  • eb_prev(node): return previous node before [node]
All of these functions return either a node or NULL when the end of tree is reached. Additionnally, two primitives are particularly useful to skip duplicates :
  • eb_next_unique(node) : return the next node with a different key from current [node].
  • eb_prev_unique(node) : return the previous node with a different key from current [node].

Implementation specifics

The only currently known implementation is the author's. It works on 32- and 64-bit platforms, either little or big endian. It uses the fact that pointer storage is 32-bit aligned at least, causing structure pointers to always have their two lowest bits cleared. This allows for pointer type to be stored in the pointer itself :
  • a node pointer will be stored verbatim (with lowest bit cleared)
  • a leaf pointer will be stored with lowest bit set (which equals pointer + 1)
  • a parent pointer in a left branch will be stored verbatim (with lowest bit cleared)
  • a parent pointer in a right branch will be stored with lowest bit set (which equals pointer + 1)
This is an important optimisation which reduces the number of memory dereferences and the memory structures sizes.
The right branch of the root must always be NULL. However, applying the same principle as above lets us put global flags in this location. Currently, the only used flag is EB_ROOT_UNIQUE, indicated by setting the lowest bit of the right branch pointer in the root. It indicates that duplicates are not desired and that when a key already exists during an insertion, nothing may be inserted and the existing node will be returned instead.
The current implementation is split between generic code and type-specific code. This makes it easier to later add support for newer types. Currently, pointers, signed/ unsigned 32/64 bit integers, strings and arbitrary memory blocks are supported.
Sixth version of the author's implementation brought support for prefix-based matching of memory blocks, which supports prefix insertion and longest match which are still compatible with the rest of the common features (walk, duplicates, delete, ...). This is typically used for network address matching or route selection. This update is not (yet) covered by this document.
Seventh version of the author's implementation brings support for a remappable variant of the ebtree called rebtree (rremappable elastic binary tree). This version is slightly slower but works on relative pointers, making it suitable for use in mapped files.
The current implementation distinguishes between eb_node which is type-agnostic and ebXX_nodewhich is a node of type XX. In practise, an eb_node has everything except the key itself. Only insertion and lookup need the key, all other operations only operate on the structure of the tree. This allows most primitives to be shared between type-specific implementations :

/* generic types */
typedef void eb_troot_t;

struct eb_root {
 eb_troot_t*b[EB_NODE_BRANCHES]; /* left and right branches */
};

struct eb_node {
 struct eb_root branches; /* branches, must be at the beginning */
 eb_troot_t*node_p; /* link node's parent */
 eb_troot_t*leaf_p; /* leaf node's parent */
 int bit; /* link's bit position. */
};

/* 32-bit specific types */
typedef unsigned int u32;

struct eb32_node {
 struct eb_node node; /* the tree node, must be at the beginning */
 u32 key;
};

The structures were designed to be located at the very beginning of any structure making use of EB trees. It is important for cache efficiency and memory access time that the EB tree data are located at the beginning of a CPU cache line, especially the 4 first pointers.

Performance considerations

The author uses EB trees in areas where red-black trees were burning too many CPU cycles. The haproxy load balancer has switched to EB trees and shows minor CPU usage at speeds up to 500000 insertions/removals per second, and a firewall log analyser relies on it to sort logs at speeds above one million lines per second. An insertion typically takes less than 100 nanoseconds on modern machines.
EB trees are much faster than conventional balanced trees when looking up strings or memory blocks because during the tree descent, only the bits specific to the new node are compared, while in a balanced tree, the string compare has to be performed on the whole string for each step. This advantage is emphasized in the halog log analyser which is able to process more than 4 million lines per second while indexing URLs.
The ability to lookup prefixes has made EB trees able to perform an operation which previously was possible only in radix trees but not in binary trees. Having the ability to use the same tree for various purposes can sensibly reduce complexity in some implementations, and reduce the number of operations (eg: network routing and filtering). Tests have shown that network lookups within a BGP table containing 450000 routes could be performed several million times a second on a modern machine.
However, a tree node is typically 20-bytes long on a 32-bit system. On systems with very small caches or low memory bandwidth, EB trees can be slower than red-black trees if insertions/removals are not much frequent and do not compensate for the increased bandwidth usage. Also, red-black trees rely on the ability to compare values, which make them suitable for many data types. Conversely, EB trees heavily rely on bit-addressable data (typically integers), which confer them a very high performance due to the cheap operations involved, but limit their application field. Trying to use EB trees to store non-bit-adressable data would completely defeat their purpose and will likely provide poor performance.

Practical uses

The HAProxy load balancer uses EB trees for the following features :
  • timers: timers are stored ordered, which ensures that checks for timeouts are always performed in O(1).
  • scheduler: the scheduler maintains the list of runnable tasks by weighting their wake-up order with their nice value. EB trees make this operation very cheap, typically O(Log(N)), allowing priorities to be used with huge amounts of tasks without any noticeable cost.
  • ACL: string and IP address matches rely on EB trees so that even very large pattern file do not cause any noticeable slowdown (eg: full internet BGP routing table).
  • stick-tables: data learned from traffic (cookies, SSL ID, headers, URLs, ...) are stored and retrieved using EB trees
  • The multi-peer sync protocol uses the ability to lookup ranges in EB trees.

The halog log analyser provided with HAProxy uses EB trees to sort counters, stats, URLs and servers.
The Scalable TLS Unwrapping Daemon (stud) uses EB trees for its shared session cache.
At least one proprietary log analysis solution makes extensive use of the remappable version of the EB trees for fast log indexing and lookup.
At least one thread-safe variant is known to exist, though very little is known about it at this time.

See also

External links

2010-05-30

GuruPlug Server Plus : don't waste your money on it!

Introduction

Guruplug Server and Guruplug Server Plus are small computers that fit into a power plug and which are made by GlobalScale. I was first informed about them from someone who asked on the haproxy mailing list if haproxy had ever been tested on them. These devices run on a Marvell 88F6281 processor, which is a system-on-chip (SoC) powered by an ARM-derived Sheeva processor core at 1.2 GHz, coming with 512 MB of RAM and as much of flash. The Server Plus also has 2 Gigabit Ethernet ports, and the whole is supposed to consume just a few watts, so I found it very appealing for building high performance, low consumption haproxy servers.
I pre-ordered one at NewIT about two months ago and it finally arrived on Friday. The packaging was clean and the device looked compact and solid. A free JTAG+console adapter board was also provided, which is absolutely required to be able to connect to the serial port. The box also contains a european plug, as well as power and Ethernet cables. The devices come pre-installed with a debian system so that it's not needed to install anything to test it.

First connection

I plugged my Guruplug on my network and observed my DHCP server's logs in order to find the address I should connect to (no doc is shipped with the device, everything has to be downloaded from the net or guessed). Well, the dhcp client looked a bit buggy, it was sending DHCP_DISCOVER packets forever, ignoring the DHCP_OFFER it got in response to all of its packets. So I couldn't connect to the device over the network. I had to use the serial console.
Trying to connect using the serial console was another adventure. The JTAG adapter was detected by the usbserial driver, but all I got were endless "1". I finally found that I had to build and load the ftdi_siodriver too to use the JTAG adapter. Nice, I can now connect via minicom, using login root and password nosoup4u. I manually killed dhclient3 and restarted it by hand without all the strange arguments on its command line and this time it worked. I did not spend more time investigating this, probably that some of these args passed via the init script are not very functional.

First network test

Being interested in running haproxy on it, I had to run a benchmark. I installed the haproxy-1.4.4 package which was already packaged for this distro, because it was faster than building a cross-compiler on my machine. I started it with a simple test configuration which only forwards traffic to a single server without logging nor playing with headers or cookies. This is the configuration with which I push my ALIX to 2700 hits/s, and a Celeron 1.3 GHz to about 11000 hits/s. Being running at 1.2 GHz with some DDR2 memory, I expected to see something between 8000 and 10000 hits/s. I carefully unloaded iptables and fired the test. What a disappointment : 2200 hits/s at 100% CPU ! Even 20% less than my ALIX running at 500 MHz on DDR-400 and 100 Mbps ports ! I tried changing some TCP optimisations in the configuration but I could only hardly reach 2400 hits/s with a config in which my ALIX reaches 3000. Still 20% less.
Then I decided to run a bit rate test which should favor this CPU which integrates the network controllers. Instead of fetching empty objects, I fetch large ones (10 MB). Result : only 310 Mbps at 100% CPU where the ALIX gives 200 Mbps with some CPU margin (two 100 Mbps ports), and the Celeron slightly more than 800. Quite obviously this CPU is a donkey !

Test on other kernels

I finally decided to give this box a second chance by booting another distro's kernel on it, just in case we'd have a problem with debian's 2.6.32. Fedora and Slackware have pre-built kernels and images that should be able to run there. Following the Slackware's howto hangs at boot. The howto also says that some plugs were shipped with a buggy bootloader which hangs during USB detection. Mine does and is 6 months old. A new bootloader is available on plugcomputer.org so I carefully updated it, hoping not to brick the box. That was successful, I could boot the Slackware and run my test with the same haproxy binary.It was slightly faster, 2250 hits/s and 320 Mbps, but still far from what I'd expect.
After a reboot, I discovered that the debian kernel shipped from factory does not boot anymore after the bootloader has been upgraded. There is a platform identifier which is different between what is documented as being a Guruplug (0x0A63) and what was installed in this beast (0x0A29). I was starting to get a bit fed up with the low level of platform preparation at GlobalScale, so I unplugged the thing.

Massive heat dissipation

While unplugging wires, I burnt my fingers on the RJ45 plug! The metal was at something like 70 or 80 degrees celsius, I don't know. It was simply untouchable, even after a long idle period. The Geode in my ALIX runs under less than 1W at full load and around 0.1W when in idle. It does not even have a heatsink. Many people have reported major heating issues on these Guruplugs in the past, with power supplies dying after less than 3 months, but GlobalScale said they had replaced the PSUs in recent units. Check here for photos taken with a thermal camera and reporting more than 72 degrees C on external parts.

So I decided to open the unit to see how it was inside. There were two signs that the power supply got replaced. The first one is that the "Warranty void" seal was a new one stuck over the previous one, whose remains were pinched in the plastic. So this box was reopened, closed and had a new stick on it.






The second sign is that two plastic legs in it were cut to accomodate the new, larger power supply. Click on the images on the right to get a full sized view.






I ran the tests again with the box open. This power supply does not heat at all. It's only the CPU which heats like mad. The small aluminum plate it completely untouchable, it really hurts to touch it even half a second. And the power supply is located just on top of that plate. So now I realized that it was not the power supply that was killing the Guruplugs, but the CPU which is cooking the power supplies. I can't leave that running when I'm not at home, I would fear that it puts my flat into fire !

Poor design

Considering the amount of heat emitted by this Marvell CPU, the box would have needed a large heatsink. I don't think anybody would have complained if the box had been 1.5cm thicker to accept a normal heatsink, with aeration holes on the sides so that the heat dissipates. Instead we have dangerous plugs which are too hot to touch, and which die in a few months of idling because capacitors can't stand sustained heat.
Also, the internal wiring is of poor quality. Two of the power output wires broke while I was taking photos of them, and a third one is about to break too. The mains input wires are similarly bad and risk to break too. And it looks like the PSU was modded by adding resistors to it, maybe to slightly increase output power. This PSU features a 105 degree capacitor though, so it may last slightly longer than previous ones.



Close-up of the 5V connector

Inscriptions at the bottom

The PSU in the bottom block

Bottom with battery, SD connector,
Blutooth controller and WiFi

Connectors side

Left side with UART and JTAG
connectors

Right side with reset button and
Micro-SD

Motherboard at the top


PSU cover removed

PSU removed

The 105°C capacitor

The PSU was partched

The PSU

First 5V wire broke instantly

Second 5V wire followed

Next one will be the ground wire

Mains wire about to break

Conclusion

The Guruplug was announced by some sites as the NSLU2 killer, but for me it's just right now a Slow Heater, and by no way will it replace even one NSLU2 as long as it heats like that. It's slower than my old ALIX, and heats a lot more. I think that one of the reasons for it to be so slow despite the high frequency is the RAM bus width : at first I thought it would be a fast system because it runs on DDR2-800, but when you see that it's only a 16-bit bus, it is equivalent to only 200 MHz for a 64-bit bus, but with higher latencies due to DDR2. My ALIX's Geode LX processor runs on a 64-bit, 400 MHz bus, so it has twice the memory bandwidth the Guruplug has. Marvell also has another CPU in the family (MV78200) with a 64-bit RAM bus, which should be better, but I have not seen it anywhere yet. The fact that the Marvell CPU consumes 5 times more power than the Geode LX and is slower is also an indication of something wrong with the design. Apparently this CPU makes it way into entry-level NAS appliances. It's probably one of the applications where the heat issues and limited memory bandwidth will not be too much of a concern. I'm really disappointed, I would have expected it either to replace my old NSLU2 (but it heats too much) or to become sort of an ALIX upgrade with Gigabit ports, but now I think that if I need Gigabit, I'll turn to these dual-gigabit mini-PCI cards. They will give me about the same level of performance without having an overheating CPU.

I don't even think I will waste my time trying to repair and adapt that Guruplug, it will probably spend the next 10 years lying in a box with other dead boards, waiting for my solering iron to pick some components out of it. However, I think that if/when the manufacturer finally decides to do a complete redesign of the product (ie include a BIG heatsink and leave some holes for airflow), then it may be an acceptable replacement for the NSLU2.

External links