RKE and iSCSI Race Condition

Resolving a race between the kubelet and the host

May 29, 2018

rancher kubernetes iscsi

Pods backed by iSCSI storage fail to mount the storage after a reboot of the host. The error in kubectl describe is something like:

MountVolume.WaitForAttach failed for volume "mqtt-data" : failed to get any path for iscsi disk, last err seen:
iscsi: failed to attach disk: Error: iscsiadm: Could not login to [iface: default, target: iqn.2006-04.us.monach:nas.mqtt-data, portal: 10.68.0.11,3260].
iscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)
iscsiadm: Could not log into all portals
Logging in to [iface: default, target: iqn.2006-04.us.monach:nas.mqtt-data, portal: 10.68.0.11,3260] (multiple)
 (exit status 12)

Note the iSCSI driver not found error. Strange, right? It’s there - lsmod shows iSCSI modules loaded into the kernel. In order to resolve this, I have to log into the host and stop/disable the iscsid service on the host itself. This allows the kubelet container to load/execute the drivers directly.

The part that I haven’t yet figured out is why the service keeps reactivating. I disable it with systemctl disable iscsid, and the next time the host boots….it’s running again.

Update 2018/06/14: This is resolved, and the answer is…well…easy. It appears that Ubuntu automatically installs and enables open-iscsi, which I noticed during another installation on a different system. This led me to the idea that the service I need to disable is not iscsid, but in fact is open-iscsi. Disabling that service prevented the restart of the service, and the next time that I brought the RKE nodes back up, Kubernetes correctly mounted the iSCSI targets for the pods without my intervention.

Sweet.

Update 2018/06/27: I’m rolling out a new Rancher cluster, and this problem reappeared. It now seems that I have to disable both open-iscsi and iscsid services in order for this to work.