My hypervisor is named nucklehead (it’s an Intel NUC) and is running FreeBSD 13.0-CURRENT
My home network, including the NUC, is in the 192.168.0.0/16 space
The Kubernetes cluster will exist in the 10.0.0.0/8 block, which exists solely on my FreeBSD host.
The controllers and workers are in the 10.10.0.0/16 block.
The internal service network is in the 10.20.0.0./24 (changed from 10.50.0.0/24) block.
The cluster pod network is in the 10.100.0.0/16 block.
The cluster VMs are all in the something.local domain.
The kubernetes.something.local endpoint for kube-apiserver has the virtual IP address 10.50.0.1, which gets round-robin load-balanced across all three controllers by ipfw on the hypervisor.
Yes, I am just hanging out in a root shell on the hypervisor.
Fixes
I Need This in a Larger Size
First off, the disks on the controllers filled up. The bulk of the usage came from etcd data in /var/lib/etcd. As I noted, I had only allocated 10Gb to each disk, because I don’t have an infinite amount of storage on the NUC. However, I still have enough space on the hypervisor’s disk to add space to each controller, especially because their virtual disks are ZFS clones of a snapshot base image. ZFS clones use copy-on-write (COW); they only consume disk space when a change is made on the cloned volume.
However, as I also noted, using ZFS volumes and CBSD makes it pretty easy to increase the size of the guest VM’s virtual disk.
Stop the VM
Use the cbsd bhyve-dsk command to resize the VM’s virtual disk
Run gpart against the volume’s device to increase the size of the filesystem. Note this only works safely if it’s the last partition on the disk. We’re doing this from the hypervisor so we don’t have to re-write the partition table for the live VM, which would either involve booting a rescue CD (which I’m too lazy to figure out) or make the change from the live VM, which is a bit scary when modifying a mounted partition, but can be done.
Restart the VM
Log in to the VM and run resize2fs on the resized partition.
root@nucklehead:~ # cbsd bstop controller-0
Send SIGTERM to controller-0. Soft timeout is 30 sec. 0 seconds left […………………………]
I increase the virtual disk on all three controllers and everything is up and running again, except etcd on one controller won’t restart because of a corrupted data file. I read more docs and figure out I need to stop etcd, remove the bad member from the cluster, delete the contents of /var/lib/etcd, then re-add the member to the cluster. etcdctl will output several lines that need to be added to the errant member’s /etc/etcd/etcd.conf before restarting etcd on that host. (I forgot to grab the shell output.) etcd came up, joined the existing cluster, and started streaming the data snapshot from an existing member.
Except kube-apiserver keeps writing its own TLS certificate in /var/run/kubernetes and it keeps using only the primary IP address on the primary interface plus the gateway IP address in the SAN (Subject Alternative Name) list, while connections to etcd and other K8s controller services go over the loopback interface.
Space
I check the documentation, and as long as kube-apiserver has a certificate and key specified by the --tls-cert-file and --tls-private-key-file options passed at start time. They’re both present and correctly set in /etc/systemd/system/kube-apiserver.service, so I am comfused. I happen to be running ps on one of the controllers while checking the etcd cluster and the command’s arguments look odd. As in, they end with a trailing \. I look at /etc/systemd/system/kube-apiserver.service and sure enough, there’s a trailing space after the end-of-line continuation \ of the last argument passed to kube-apiserver at start. The ‘\ ‘ would have been interpolated not as an end-of-line continuation, but as a string.
systemd supports a list of start commands, so it would have started kube-apiserver without error, throwing an error when it couldn’t run the trailing options as an actual command. But since the service process itself started without issue, I didn’t notice. Removing the trailing space from the file fixed the issue.
Most of this section is straightforward, other than updating IP addresses and ranges.
When I created the worker VMs, I added, through some hackery of CBSD’s cloud-init data handling, a pod_cidr field to the instance metadata to configure each worker with its unique slice of the pod network. cloud-init puts the metadata in /run/cloud-init/instance-data.json. We need this value now to configure the CNI (Container Network Interface) plugin.
I still haven’t found a better terminal type
After finishing the configuration, everything looks as expected.
This part gets a little more complicated. In the tutorial, it relies on the Google Compute Engine’s inter-VM networking and routing abilities. However, since all the inter-VM networking for this cluster goes through the FreeBSD bridge1 interface where ipfw is already doing all kinds of heavy-lifting, I can also use ipfw to handle the routing for the pod network.
root@nucklehead:~ # for i in 0 1 2; do
ipfw add 25${i} fwd 10.10.0.2${i} ip from any to 10.100.${i}.0/24 keep-state
done
00250 fwd 10.10.0.20 ip from any to 10.100.0.0/24 keep-state :default
00251 fwd 10.10.0.21 ip from any to 10.100.1.0/24 keep-state :default
00252 fwd 10.10.0.22 ip from any to 10.100.2.0/24 keep-state :default
In this section, since my IP blocks differ, I had to download and edit the YAML for the DNS deployment.
At this point I end up massaging my cluster network a bit, changing the cluster network’s netmask from /8 to /16 and adding the alias 10.10.0.1 to the bridge interface on the FreeBSD hypervisor to put a gateway in the 10.10.0.0/16 VM network as it was no longer in the same CIDR block as the old gateway, 10.0.0.1.
Oddly, though, my cluster ended up assigning 10.0.0.1 to the cluster’s internal Kubernetes API proxy Service, even though that is outside the configured 10.50.0.0/24 Service network. I wonder if that happened because 10.50.0.1 was already a routable IP address within the cluster, as I had configured it as the external Kubernetes API endpoint and as a virtual IP on the controllers?
Either way, I need to move the “public” API endpoint address out of the services block. It’s arguably easier to change the service CIDR block for the existing cluster as it’s only set as arguments for kube-apiserver and kube-controller-manager. I will also need to update the kubernetes.pem used by kube-apiserver to add the new internal service IP address to the list of accepted servernames.
napalm@nucklehead:~ $ kubectl get service -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
I was so focused on getting the updated certificate files copied around and restarting dependent services that I initially forgot to update the --service-cluster-ip-range option for kube-apiserver and kube-controller-manager. I’m not completely sure why the service was initially given the cluster IP 10.0.0.1, which I could have tested by deleting the kubernetes Service before making the changes, but I didn’t think of it until afterward. Once everything was updated, the service was recreated with the 10.20.0.1 address, as expected.
This series of posts show one way you can run a Kubernetes cluster on FreeBSD using OS-level virtualization so we can create the traditional, supported Linux environment for Kubernetes. My next post will look at some potential alternatives in various stages of development and support.