Following on from my disaster with returning a failed disk to the SATA RAID in my lab ESXi hosts (“Mark Online” may cause corruption, should have chosen “Rebuild Disk”), I had the above issue with my vCentre server and reconnecting the resurrected ESXi host.
The Process
When I reconnected my ESXi host the vCentre server service crashed, leading to the vSphere client disconnecting. With the service restarted and the client reconnected the ESXi host was still in the inventory and showing disconnected. The vCentre eventlogs were full of not very useful messages, and since I’d had disk corruption issues with the ESXi host I decided to rebuild it completely:
- Built a new NAS host, OpenFiler
- Moved the VMs to NAS
- Rebuilt the ESXi boot USB disk and SATA RAID array
- Moved VMs back to SATA RAID
- Registered VMs with ESXi and powered on minimum VMs to return service
Very disappointingly I found that the vCentre service still crashed when I reconnected the ESXi host, so I decided to rebuild the vCentre host. I had had some VM disc corruption issues due to my mistake with the RAD controller and VC was the one thing I couldn’t restore using VDR. After rebuilding vCentre I added the ESXi host and again the vCentre service crashed. At this point I spent a while Googleing for the events in the vCentre event log, comparing errors when I successfully added an ESXi host with the entries when the add failed. The first difference in messages I saw was “Win32 exception: Stack overflow (0xc00000fd)” and a quick Google of that and “vCentre” lead me to the answer in a KB article. It seems that vCentre can’t cope with some types of disk changes. I first removed every snapshot from every VM, no change. Next I unregistered every VM that I didn’t need for basic service (vCentre and SBS server all that remained registered and running) , no change.
The Resolution
Having recently been reading the Snapshot troubleshooting guide (linked by the extremely useful Eric Sloof ) I took the quick fix approach, I snapshotted my running VM’s and then used the snapshot manager to “Delete All” snapshots. This time I had fixed it, I was able to connect the ESXi host to vCentre. I was then able to re-register and power on all the remaining VMs.
The moral of the storey is:
If things are going weird study the logs and Google carefully.
© 2010, Alastair. All rights reserved.