ARMing Haskell

Posted on December 15, 2015 by Calvin Beck
Tags: Haskell, ARM, QEMU, Raspberry Pi

I have spent the last few weeks desperately trying to get Haskell working on a Raspberry Pi 2 with Raspbian (Jessie). I have had… some problems

Edit 2015-12-16: Thanks to the help of the wonderful Haskell community, my issues have been resolved. Make sure you have LLVM 3.5.2, or you may encounter the same problems that I did!

Edit 2015-12-17: Thanks to slyfox on #ghc on freenode we now have a sort of working frankenstein’s cross-compiler!

Cross-Compilation

There are a few useful guides available for setting up cross-compilation:

It’s fairly easy to get a cross-compiler up and running, as long as you carefully follow instructions. This is all well and good, but there are still issues with Template Haskell. Template Haskell essentially requires that you can run code built from the compiler on the system running the compiler. Roughly speaking Template Haskell uses Haskell code to generate more Haskell during compilation. Since you can’t run an ARM executable on x86 / x86_64, you can’t compile anything that uses Template Haskell with the cross-compiler as of yet. More on this later, but here’s some more information.

In addition to the lack of Template Haskell, there are problems with certain FFI libraries, like zlib. So, while my 7.10.2 can produce an acceptable ARM “Hello, World!”, I can’t compile my more complicated project. Thus I am forced to look for another option!

Cross-Compilation with QEMU

So, let’s say you have read the section on using a QEMU ARM user chroot with binfmt_misc to run ARM code on your non-ARM machine. If we can 1) have ARM libraries on our machine, and 2) run ARM executables, then it seems like we should be able to run the cross compiler from above, and then when we execute ARM code, drop into qemu-arm.

The only thing we should have to do is tell qemu-arm to load dynamic libraries from somewhere else, so that we can load ARM libraries, and not x86 / x86_64 libraries. Turns out there is an option for this:

$ qemu-arm

...
-L path       QEMU_LD_PREFIX    set the elf interpreter prefix to 'path'
...

Yeah, that will work. Using the cross compiler above, and the ARM Gentoo environment set up below I was able to compile my program just fine using:

export QEMU_LD_PREFIX=$HOME/arm-chroot

cabal sandbox init
cabal --with-ghc=arm-unknown-linux-gnueabihf-ghc --with-ghc-pkg=arm-unknown-linux-gnueabihf-ghc-pkg --with-ld=arm-linux-gnueabihf-ld --with-strip=arm-linux-gnueabihf-strip install

Which is AWESOME because the native compiler is so much faster than the one emulated with QEMU. This is not without caveats. For instance when compiling with multiple jobs it took forever and then ran out of memory, setting -j1 with cabal install fixed this particular issue.

A bigger problem is that there appears to be issues with the C FFI. I’m not yet sure how this works, but when I try to run an ARM binary which uses JuicyPixels to write a PNG (which relies upon zlib), I get this error:

user error (Codec.Compression.Zlib: incompatible zlib version)

but otherwise this works surprisingly well.

After a bit of digging I have found that this error comes from here, and since we have got an actual failure, it’s probably from calling the failIfError function. That means our issue is from here.

I have checked the zlib version on all of my environments, and the version is 1.2.8 everywhere. So the version isn’t the issue. In fact if we look at the code for c_deflateInit2_ we’ll notice that it’s only checking the first version number anyway:

if (version == Z_NULL || version[0] != my_version[0] ||
    stream_size != sizeof(z_stream)) {
    return Z_VERSION_ERROR;
}

But there’s our problem. The size of our z_stream must differ somehow. If we check the Haskell Zlib library we notice the following:


c_deflateInit2 :: StreamState
               -> CInt -> CInt -> CInt -> CInt -> CInt -> IO CInt
c_deflateInit2 z a b c d e =
  withCAString #{const_str ZLIB_VERSION} $ \versionStr ->
    c_deflateInit2_ z a b c d e versionStr (#{const sizeof(z_stream)} :: CInt)

I’m on an x86_64 machine, so it’s pretty much guaranteed that the z_stream structure on my machine differs in size to the one on the 32 bit ARM machines. This should be processed by hsc2hs, so that’s where the issue is. I need to be running an hsc2hs that targets ARM, not my native one. I had an arm-unknown-linux-gnueabihf-hsc2hs, but it seemed to loop forever, as did the hsc2hs from my chroot (but not when running it in my chroot). So that’s no good either.

hsc2hs by default generates a C program, which when run spits out the appropriate Haskell file. So we just need to get hsc2hs to generate ARM code. I can pass hsc2hs options with cabal, which means I can tell the native hsc2hs to compile using a cross compiler and linker.

--hsc2hs-option="-c arm-linux-gnueabihf-gcc -l arm-linux-gnueabihf-ld"

Unfortunately that still doesn’t work! We’re on the right track, but not quite there yet. The problem now is that arm-linux-gnueabihf-ld doesn’t actually know where to look for libc (we need the ARM one). We can use the --sysroot option to make it use the ARM chroot. I actually had to change to gcc for the linker as well, because it handles linking with the C runtime much more nicely.

--hsc2hs-option="-c arm-linux-gnueabihf-gcc -l arm-linux-gnueabihf-gcc -C "--sysroot=$HOME/arm-chroot/" -L "--sysroot=$HOME/arm-chroot/""

This loops forever with qemu-arm, however the executable when run on the Raspberry Pi 2 works perfectly fine. This seems to be a QEMU bug.

Note that hsc2hs actually has cross compilation options (-x). When these are used instead of creating a C program hsc2hs uses tricks to figure out what’s going on with the target. Unfortunately this doesn’t handle const_str, which we need, so this won’t work for us.

Running on the Raspberry Pi

Raspbian has an old version of GHC in its repos, GHC 7.6.3, which works with simple pieces of code. For instance “Hello, World!” might compile and run perfectly well. However, my small image processing program encountered nasty, randomly changing, run time errors, and segmentation faults. Sometimes, if the stars aligned, the program would run to completion producing correct results, other times it seemed to loop forever. This is not good. One such error that I received was:

allocGroup: free list corrupted

which, if we look at the source code, should definitely never happen. Everything that is happening here SCREAMS that memory is getting stomped on somewhere. This is not something which is going to be easy to debug as the problem could quite literally be anywhere in the code, or in fact in a different library entirely. Not good.

What about a newer compiler?

So, the Raspbian compiler is horribly ill, and we have to try something different. I attempted to compile GHC from scratch, but this is an effort which takes a very long time on a Raspberry Pi, and after fixing build errors I still had issues. Fortunately since GHC 7.10.2 there are binaries for ARMv7. Additionally, I found a guide which suggested that there were no issues on a Scaleway server:

http://statusfailed.com/blog/2015/11/29/haskell-and-servant-on-scaleway-arm-servers.html

Following these suggestions I was able to install 7.10.2, and 7.10.3 on the Raspberry Pi. Unfortunately these compilers have given me quite a bit of grief, as I am incapable of producing a working “Hello, World!” with them (despite the 7.10.2 cross-compiler working just fine for this):

https://ghc.haskell.org/trac/ghc/ticket/11190

Cabal-install hangs, and this too is a dead end for me.

Or it would be if it wasn’t for Ben Gamari, who replied to my issue in the GHC trac. As it turns out the LLVM version available with Raspbian is 3.5.0. This version has an issue on ARM which breaks the binaries that GHC spits out. Upgrading to 3.5.2 should fix this problem!

QEMU

QEMU is an attractive option. We can emulate an ARM machine on a fast x86 processor, which may be faster. At any rate it’s nice to be able to compile, and even test ARM binaries, on your development system, without having to dedicate a Raspberry Pi as a build bot. I only have access to one Raspberry Pi 2 at the moment, and it’s occupied with work stuff. QEMU would make things easier, and hell, it might even work.

QEMU System Emulation

This is also a dead end. I am horrendously bad at getting QEMU machine emulation to work. I got it working with Raspbian Wheezy, but it wouldn’t work with Jessie for some reason. I spent a lot of time trying to get an ARM machine emulated, but all of the guides are filled with archaic options, vmlinuz kernels, and magic. Ultimately nothing worked, and this was a huge waste of time that would be slow anyway. Here be dragons. All who dare enter should beware.

QEMU User Emulation

Initially I thought that system emulation would be much easier to set up and get working than user emulation. I was wrong. QEMU system emulation will emulate an entire ARM machine, whereas user emulation essentially just lets you run ARM binaries on a different machine as a regular user process. User emulation translates all of the instructions, and sys-calls, but uses the same kernel and filesystem as before. This is cool. What’s especially cool is that Linux has a feature called binfmt_misc, which lets you run arbitrary executable formats. So, we can actually get our normal x86 / x86_64 Linux system to transparently execute ARM binaries. Essentially we use binfmt_misc to tell the kernel to run ARM executables with the qemu-arm program. This is pretty awesome.

There are some interesting possibilities with this, as we might be able to leverage this to make the cross compiler work with Template Haskell. After turning this on, instead of getting Exec format error’s with Template Haskell and the cross compiler, it was instead complaining about missing ARM libraries. Cool. This is something that might be worth exploring at a later date, but I don’t really want to mess with mixing ARM and x86_64 binaries on my computer. That’s bad mojo. Worth messing with in the future, because it has the possibility of being faster, but it’s probably filled with caveats and it seems a bit scary.

So, what’s the next best option? How about a chroot!? Yeah. That’ll work. So, here’s the plan! In order to make sure we get all of the ARM libraries we need, we’ll just install an ARM Linux distribution in a chroot, and use binfmt_misc and qemu-arm to transparently execute any ARM binaries in the chroot. I got some information for how to set up binfmt_misc on Gentoo here:

https://wiki.gentoo.org/wiki/Crossdev_qemu-static-user-chroot

This guide seems to be doing more complicated stuff with LXC, which I don’t really understand, so I have deviated and just went with a simple chroot Linux install. I’m going to install Gentoo, because it’s what I know, and I find that it doesn’t hide problems from me, and it will let me try different versions of LLVM and other libraries which might be causing problems quite easily.

We need access to qemu-arm in our chroot if we want to use it to run ARM binaries. The simple way to do this is to statically link qemu-arm, which will allow us to copy the qemu-arm binary into the chroot, and not have to worry about copying any dynamically loaded libraries as well. In order to do this on Gentoo we simply add the static-user use flag to the qemu package, your distro may vary. We can check that we are good to go with ldd:

$ ldd `which qemu-arm`
    not a dynamic executable

Looking good! Make sure you set up binfmt_misc first, but now we can make our chroot.

# First we make the directory for our chroot:
mkdir ~/arm-chroot
cd ~/arm-chroot

# Get the Gentoo stage3
wget http://distfiles.gentoo.org/releases/arm/autobuilds/20151116/stage3-armv7a_hardfp-20151116.tar.bz2
tar -xjf stage3-armv7a_hardfp-20151116.tar.bz2

# Copy qemu-arm into the chroot, so we can use ARM binaries when chrooted.
cp `which qemu-arm` ~/arm-chroot/usr/bin/qemu-arm

# Then we do the magical mounting: https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Base#Mounting_the_necessary_filesystems
mount -t proc proc proc
mount --rbind /sys /sys
mount --make-rslave sys
mount --rbind /dev dev
mount --make-rslave dev

# And now we chroot!
sudo -s
chroot . /bin/bash

# At this point you can pretty much just follow the Gentoo Handbook https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Base#Configuring_Portage
emerge-webrsync
emerge --sync

# ...
# Configure the system how you want...
# ...

# Note that we need LLVM version 3.5.2, which you may need to add to package.accept_keywords
echo "=sys-devel/llvm-3.5.2 **" >> /etc/portage/package.accept_keywords

# Now we can install whatever we need for GHC, like LLVM
emerge -a llvm binutils zlib

# Grab and install GHC ARMv7 binaries...
wget http://downloads.haskell.org/~ghc/7.10.3/ghc-7.10.3-armv7-deb8-linux.tar.bz2
tar -xjf ghc-7.10.3-armv7-deb8-linux.tar.bz2
cd ghc-7.10.3
./configure
make install

The first time I did this I still encountered the same problem as on the Raspberry Pi with the qemu-arm GHC, however after installing LLVM 3.5.2 I have had no more problems. You can even install Cabal and cabal-install pretty easily. Just download the tarballs from here, and follow the instructions in the README files. It really is that simple. Make sure to move the cabal executable somewhere in your path, and run cabal update, and then you should be set!

I have since been able to compile my image processing program, and stuff it on the Raspberry Pi 2. No problems at all. QEMU is a bit slow, but it’s nice to not have to worry about compiling on the Pi, and now I can do everything from my development machine. This is quite a bit slower than the Pi 2 itself, but works very well.

Current Status

After much pain and suffering I have a way to do Haskell development on ARM! Others have had success come quite a bit more easiely. I have had a conversation with somebody on Twitter whom found success with Scaleway servers and a Beaglebone Black. The difference here seems to be that whatever image they were using on Scaleway had LLVM 3.5.2, and not the broken LLVM 3.5.0.

Good luck to anybody trying to do the same! Ask questions, or boast about your success in the comments.

The Future

So with that done and working, here’s some stuff that could greatly improve the Haskell on ARM development experience: