Last week, my former colleague Gabrie van Zanten wrote a very interesting article about using USB sticks to boot an ESXi host.
After reading this excellent post, I remembered that in the early ESX 2.5 days, it was (and still is…) good practice to install ESX on a RAID-1 volume to keep your host in business in case of a disk failure. One day we received a first generation IBM Blade Center. Each blade has just one 2,5 inch disk and a small amount of memory. It was decided to install ESX (ESXi did not even exist…) and to get some experience with blades. The blades suffered several disk crashes, which immediately led to host failure and disk replacement…
In recent years, servers with embedded USB storage have become common practice. Today, all major hardware vendors deliver servers with embedded ESXi. Even in my home lab, servers are equipped with an onboard USB connector, USB stick and ESXi. Recently, on one host, the USB stick was moved to an external connector. So after watching an episode of Myth Busters, I was wondering, what would happen with an ESXi host with USB stick failure. Or even worse, pulling the USB stick.
So, after booting up my 2 node cluster, I made a fresh backup of a few important VMs and checked the vCenter Service Status. Now, it is time to remove the USB stick from one host. And this is what happened:
- VMs on the affected host are still running.
- Task & Events of the affected host shows this message “Lost connectivity to storage device mpx.vmhba32:C0:T0:L0. Path vmhba32:C0:T0:L0 is down. Affected datastores: “Hypervisor1”, “Hypervisor2”, “Hypervisor3”.”.
- Followed by 3 Alarms “Cannot connect to storage”.
- Another message in Tasks & Events is “Boot partition /bootbank cannot be found (0:02:33:03.304 cpu1:30722)”.
- Time for some testing, all these actions do work: Power On a VM, Migrate a VM, host in Maintenance Mode, Exit Maintenance Mode (HA Agent is configured correctly).
- Also the ESXi console is doing fine, System Customization is in place, and so are the System Logs.
- From time to time above messages are repeated and in some occasions while migrating VMs “The Operation is not allowed in the current state” messages are received.
- After 24 hours, the host is still running, and performing. So finally, I decided to enter the host in Maintenance Mode and shut it down. The power down took about 10 minutes ( less then 2 minutes is normal).
- After insertion of the USB stick, the host was powered on and was automatically reconnected to the cluster.
At this time, my tentative conclusion is that failure, or even an missing USB stick does not have much impact on a ESXi host. Thanks for reading and I’m very interested in your experience and opinions concerning this subject.
P.S. A few days after posting, I stumbled onto this post, written by Alan Renouf. In the first part it is explained why ESXi keeps running without USB boot device.