Adventures in Freebernetes: Will Linux bhyve?

Part 3 of experiments in FreeBSD and Kubernetes: Linux guests

See all posts in this series

Prep Work

In the previous post, we started compiling the sysutils/grub2-bhyve port, required for running Linux guests with bhyve. We also need the ISO 9660 image for a Linux installer (I’m using Arch Linux).

grub-bhyve is the bhyve boot loader for Linux images. Just like we realized in the previous post that we needed to run bhyveload before we could run a FreeBSD guest, we need grub-bhyve so we can boot Linux guests.

We’re assuming you’ve already created the virtual network interfaces (see the previous post) and no other VMs currently exist.

Booting Linux… Maybe

root@nucklehead:/vm # truncate -s 4G linux.img
root@nucklehead:/vm # cat > <<EOF
(hd0) ./linux.img
(cd0) ./images/archlinux-2020.10.01-x86_64.iso
root@nucklehead:/vm # grub-bhyve -m -M 1024M -r cd0 arch
view raw gistfile1.txt hosted with ❤ by GitHub

This should bring up the GRand Unified Bootloader (grub) menu. Now we need to figure out where Arch Linux hides the vmlinuz kernel and load that.

GNU GRUB version 2.00
Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists possible
device or file completions.
grub> ls
(hd0) (cd0) (cd0,msdos2) (host)
grub> ls (cd0)/
arch/ EFI/ isolinux/ loader/ shellx64.efi
grub> ls (cd0)/isolinux/ isohdpfx.bin isolinux.bin isolinux.cfg ldlinux.c32
grub> ls (cd0)/arch/
boot/ pkglist.x86_64.txt x86_64/
grub> ls (cd0)/arch/boot/
amd-ucode.img intel-ucode.img licenses/ memtest syslinux/ x86_64/
grub> ls (cd0)/arch/boot/x86_64/
initramfs-linux.img vmlinuz-linux
grub> linux (cd0)/arch/boot/x86_64/vmlinuz-linux
grub> initrd (cd0)/arch/boot/x86_64/initramfs-linux.img
grub> boot
view raw gistfile1.txt hosted with ❤ by GitHub

And now we can boot the loaded kernel.

root@nucklehead:/vm # bhyve -H -A -P \
-s 0,hostbridge \
-s 1,lpc \
-s 2:0,virtio-net,tap0 \
-s 3,virtio-blk,linux.img \
-s 31,ahci-cd,/vm/images/archlinux-2020.10.01-x86_64.iso \
-l com1,stdio \
-c 2 -m 1024M \
rdmsr to register 0x3a on vcpu 0
rdmsr to register 0x140 on vcpu 0
rdmsr to register 0x3a on vcpu 1
rdmsr to register 0x140 on vcpu 1
rdmsr to register 0x64e on vcpu 0
rdmsr to register 0x34 on vcpu 0
:: running early hook [udev]
Starting version 246.6-1-arch
:: running early hook [archiso_pxe_nbd]
:: running hook [udev]
:: Triggering uevents…
:: running hook [memdisk]
:: running hook [archiso]
:: running hook [archiso_loop_mnt]
:: running hook [archiso_pxe_common]
:: running hook [archiso_pxe_nbd]
:: running hook [archiso_pxe_http]
:: running hook [archiso_pxe_nfs]
:: Mounting '/dev/disk/by-label/' to '/run/archiso/bootmnt'
Waiting 30 seconds for device /dev/disk/by-label/ …
ERROR: '/dev/disk/by-label/' device did not show up after 30 seconds…
Falling back to interactive prompt
You can try to fix the problem manually, log out when you are finished
sh: can't access tty; job control turned off
[rootfs ]#
view raw gistfile1.txt hosted with ❤ by GitHub

Hmmm, that didn’t work. Some searching turns up the fix: add the ISO image’s label to the kernel arguments. I also made grub-bhyve barf at least once before I got it all working. (Arch names its ISOs ARCH_YYYYMM with the datestamp of the release version.)

grub> linux (cd0)/arch/boot/x86_64/vmlinuz-linux archisolabel=ARCH_202010 archisobasedir=arch ro
grub> initrd (cd0)/arch/boot/x86_64/initramfs-linux.img
grub> boot
view raw gistfile1.txt hosted with ❤ by GitHub

And again run the bhyve command to try to boot. Victory!

Arch Linux 5.8.12-arch1-1 (ttyS0)
archiso login: root
To install Arch Linux follow the installation guide:
For Wi-Fi, authenticate to the wireless network using the iwctl utility.
Ethernet and Wi-Fi connections using DHCP should work automatically.
After connecting to the internet, the installation guide can be accessed
via the convenience script Installation_guide.
Last login: Fri Oct 30 15:34:12 on tty1
root@archiso ~ # fdisk /dev/vda
Welcome to fdisk (util-linux 2.36).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x6343b449.
Command (m for help): n
Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-8388607, default 2048):
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-8388607, default 8388607):
Created a new partition 1 of type 'Linux' and of size 4 GiB.
Command (m for help): p
Disk /dev/vda: 4 GiB, 4294967296 bytes, 8388608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 131072 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes
Disklabel type: dos
Disk identifier: 0x6343b449
Device Boot Start End Sectors Size Id Type
/dev/vda1 2048 8388607 8386560 4G 83 Linux
Command (m for help): a
Selected partition 1
The bootable flag on partition 1 is enabled now.
Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
root@archiso ~ #
view raw gistfile1.txt hosted with ❤ by GitHub

I just gave the whole disk to the root partition because lazy.

Next we format and mount the root partition, and because the network interface was configured via DHCP at boot, we can start installing.

root@archiso ~ # mkfs.ext4 /dev/vda1
mke2fs 1.45.6 (20-Mar-2020)
Creating filesystem with 1048320 4k blocks and 262144 inodes
Filesystem UUID: 4bfe330e-53f8-4103-a4f2-f66f1a4ebf98
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736
Allocating group tables: done
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
root@archiso ~ # ls /mnt
root@archiso ~ # mount /dev/vda1 /mnt
root@archiso ~ # pacstrap /mnt base linux linux-firmware
[ … ]
==> Image generation successful
(12/13) Reloading system bus configuration…
Running in chroot, ignoring command 'try-reload-or-restart'
(13/13) Warn about old perl modules
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
pacstrap /mnt base linux linux-firmware 46.66s user 37.43s system 14% cpu 9:29.76 total
root@archiso ~ #
view raw gistfile1.txt hosted with ❤ by GitHub

Ohai perl.

root@archiso ~ # genfstab -U /mnt >> /mnt/etc/fstab
# root@archiso ~ # arch-chroot /mnt
[root@archiso /]# passwd
New password:
Retype new password:
passwd: password updated successfully
[root@archiso /]#
view raw gistfile1.txt hosted with ❤ by GitHub

Then set up time zone, locale, network configuration, etc., and power off the VM.

I take the cd0 entry out of the file and run grub-bhyve again, giving it hd0 as the root device: grub-bhyve -m -M 1024M -r hd0 arch

grub> linux (hd0,1)/boot/vmlinuz-linux root=/dev/vda1
grub> initrd (hd0,1)/boot/initramfs-linux.img
grub> boot
root@nucklehead:/vm # bhyve -H -A -P \
-c 2 -m 1024M \
-s 0,hostbridge \
-s 1,lpc \
-s 2:0,virtio-net,tap0 \
-s 3,virtio-blk,linux.img \
-l com1,stdio \
[ … ]
Arch Linux 5.9.2-arch1-1 (ttyS0)
archlinux login: root
[root@archlinux ~]#
view raw gistfile1.txt hosted with ❤ by GitHub


A Guest on ZFS

Now that I have reacquainted myself with exactly how minimal Arch Linux actually is (-bash: which: command not found — really???), I’ll use Debian to test giving a VM its own ZFS volume instead of using a img file for its virtual disk.

root@nucklehead:/vm # zfs create -V2G -o volmode=dev zroot/debianguest0
root@nucklehead:/vm # cat
(hd0) /dev/zvol/zroot/debianguest0
(cd0) /vm/images/debian-10.6.0-amd64-netinst.iso
root@nucklehead:/vm # grub-bhyve -m -M 1024 -r cd0 debian
view raw gistfile1.txt hosted with ❤ by GitHub

The Debian image already has grub installed and pre-configured, so all we have to do is choose “Install.”

GNU GRUB version 2.00
|Graphical install |
|Install |
|Advanced options … |
|Accessible dark contrast installer menu … |
|Install with speech synthesis |
| |
| |
| |
| |
| |
| |
| |
Use the ^ and v keys to select which entry is highlighted.
Press enter to boot the selected OS, `e' to edit the commands
before booting or `c' for a command-line.
view raw gistfile1.txt hosted with ❤ by GitHub

That option only loads the kernel with the installer arguments. We still have to boot it, which should drop us immediately into the interactive installer.

root@nucklehead:/vm # bhyve -H -A -P \
-s 0,hostbridge \
-s 1,lpc \
-s 2:0,virtio-net,tap0 \
-s 3,virtio-blk,/dev/zvol/zroot/debianguest0 \
-s 31,ahci-cd,/vm/images/debian-10.6.0-amd64-netinst.iso \
-l com1,stdio \
-c 2 -m 1024M \
view raw gistfile1.txt hosted with ❤ by GitHub
Screen shot of the text-based Debian installer

Nobody Panic, But…

When the Debian installation finished, I exited and ran grub-bhyve -m -M 1024M -r hd0 debian and then at the grub menu, run ls … FreeBSD panicked. I rebooted tried again, same outcome.

Fortunately dmesg was capturing the error.

All buffers synced.
lock order reversal:
1st 0xfffff80007dc7c10 zfs (zfs, lockmgr) @ /usr/src/sys/kern/vfs_mount.c:1711
2nd 0xfffff801168baa20 devfs (devfs, lockmgr) @ /usr/src/sys/fs/msdosfs/msdosfs_vfsops.c:943
lock order devfs -> zfs established at:
#0 0xffffffff80c4ddfd at witness_checkorder+0x46d
#1 0xffffffff80bb1e25 at lockmgr_xlock+0x55
#2 0xffffffff80cd7874 at _vn_lock+0x54
#3 0xffffffff80cb72c1 at vfs_domount+0xe71
#4 0xffffffff80cb59c2 at vfs_donmount+0x872
#5 0xffffffff80cb9e27 at kernel_mount+0x57
#6 0xffffffff80cbc521 at parse_mount+0x4a1
#7 0xffffffff80cbaa49 at vfs_mountroot+0x589
#8 0xffffffff80b718bf at start_init+0x1f
#9 0xffffffff80b9b590 at fork_exit+0x80
#10 0xffffffff80ffe68e at fork_trampoline+0xe
lock order zfs -> devfs attempted at:
#0 0xffffffff80c4e75c at witness_checkorder+0xdcc
#1 0xffffffff80bb1e25 at lockmgr_xlock+0x55
#2 0xffffffff80cd7874 at _vn_lock+0x54
#3 0xffffffff80a8bae4 at msdosfs_sync+0x1d4
#4 0xffffffff80a8b69f at msdosfs_unmount+0x2f
#5 0xffffffff80cb839f at dounmount+0x41f
#6 0xffffffff80cc2f6a at vfs_unmountall+0x6a
#7 0xffffffff80c9817b at bufshutdown+0x2cb
#8 0xffffffff80bdff43 at kern_reboot+0x213
#9 0xffffffff80bdfcd4 at sys_reboot+0x3a4
#10 0xffffffff8102abe5 at amd64_syscall+0x135
#11 0xffffffff80ffdf5e at fast_syscall_common+0xf8
Uptime: 32m2s
view raw gistfile1.txt hosted with ❤ by GitHub

Some searching turned up this FAQ, which explains that the witness(4) lock diagnostic watches for potential deadlocks in the kernel. (After reading The Design and Implementation of the FreeBSD Operating System about 10 years ago, the word “mutex” immediately makes me think of the FreeBSD kernel.)

witness(4) is enabled by default in CURRENT kernels for development and debugging purposes, and as I’m running a build of FreeBSD 13.0-CURRENT from last week, that fits.

Anyway, I am now down the FreeBSD maintainers rabbit hole, so I will throw this post out there for now. The next post should have resolution.

The next post in this series, as we work toward, yes, someday actually running Kubernetes on FreeBSD, will hopefully show a working Linux-with-ZFS-disk VM and then look at CBSD, which helps manage your bhyve VMs.

Sources / References

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at

Up ↑

%d bloggers like this: