Another simple little posting that has grown, it doesn’t cover every eventuality but it can eb a life saver.

This one comes up a bit, someone has lost a VMFS Datastore from all their ESX servers, it was there yesterday and now it’s gone! The hosts can still see the LUN, but not a Datastore.

First off some advice:

Don’t Panic

and don’t create a new VMFS over the existing one, if you do that the data on the disk is lost and only heroic effort by VMware support will be able to get it back for you.

Reducing the Risk

If you still have ESX 3.5U3 or later (not vSphere) then you can backup the VMFS metadata as well as documenting the partitioning. The backup process is in this VMware KB article. In theory you should be able to use the same thing on vSphere, however the tool does not ship with ESX 4.0.

Background

The first thing to understand is how the VMKernel recognises a VMFS Datastore.

First the VMKernel looks in the disk partition table for a partition of type “fb”, this partition type number identifies the contents as a VMFS datastore.
Within the partition the VMKernel looks for VMFS Metadata, if this partition is the first extent of the VMFS datastore then it will have VMFS metadata at the start of the partition.

When the Datastore is deleted only the partition table entry is deleted, so the metadata is still in the right place on the disk. As a side note only the first extent (partition) is deleted when the Datastore is deleted, any additional extents are left behind.

To get the Datastore back we need to recover the partition table, either from a backup or by manually recreating it using the Linux fdisk tool.

Things to know before you start

The correct VMKernel path to the LUN
The correct Linux disk ID for the partition
The correct starting block for the partition

VMKernel Path

The VMKernel path is the path shown to the now empty LUN in the “Storage” view in the VI Client. Ideally you would document this with the other Datastore documentation. There may be multiple paths to the LUN, in which case you want the path with the lowest HBA and Target numbers.

Linux Disk ID

Hopefully you now know the VMKernel path to the LUN that held the Datastore. The Linux disk ID can be resolved from the VMKernel disk ID (vmhba…. number). On VI3 use esxcfg-vmhbadevs, on vSphere use “esxcfg-scsidevs –c”.

The VI3 example the VMKernel device vmhba0:0:25 maps to the Linux device /dev/sdb.

[root@desx1 root]# esxcfg-vmhbadevs
vmhba0:0:0     /dev/sda
vmhba0:0:25    /dev/sdb
vmhba0:0:26    /dev/sdc
vmhba0:0:31    /dev/sdd
vmhba1:0:31    /dev/sde
vmhba2:0:0     /dev/cciss/c0d0

The vSphere example has the VMKernel device is vmhba0:C0:T1:L0 maps to the Linux device /dev/sdb.

[root@localhost ~]# esxcfg-scsidevs -c
Device UID           Device Type      Console Device Size      Plugin Display Name
mpx.vmhba0:C0:T0:L0 Direct-Access    /dev/sda        16384MB   NMP     Local VMware, Disk (mpx.vmhba0:C0:T0:L0)
mpx.vmhba0:C0:T1:L0 Direct-Access    /dev/sdb 8192MB    NMP     Local VMware, Disk (mpx.vmhba0:C0:T1:L0)
mpx.vmhba32:C0:T0:L0 CD-ROM           /dev/sr0        0MB       NMP     Local NECVMWar CD-ROM (mpx.vmhba32:C0:T0:L0)

On both VI3 and vSphere , determining the starting block requires documentation, or an educated guess. To find the starting block of an existing VMFS partition using the command “fdisk –lu /dev/sdx” where /dev/sdx is the Linux device ID for the LUN that contains the Datastore. The start column will show this, below is an example with a start block of 128, this should be recorded against the rest of the documentation for the Datastore. The value of 128 is a good guess if you don’t have documentation, it is the default start block for all VMFS partitions created using the VI Client.

Starting Block

[root@localhost ~]# fdisk -lu /dev/sdb

Disk /dev/sdb: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 128 16771859 8385866 fb VMware VMFS

Recovery Process

First off if you are uncertain log a call with VMware Support, they fix these sorts of problems routinely and will make sure you don’t do more damage. Repair for yourself only if a support call is not an option.

Before you try to recover the Datastore make sure it is really gone, refresh storage on every ESX server that can see the LUN and make sure none can see the Datastore.
Choose an ESX server that is not too critical to complete the remaining steps, preferably one in maintenance mode so no VMs can be effected.
Logon to the service console of the ESX server, elevate to root
run “fdisk –l /dev/sdx” where /dev/sdx is the linux device ID for the LUN that used to contain the Datastore. This will list the current partition table of the disk, if you see partitions then stop.
If there is no partition table or if it is blank then run “fdisk /dev/sdx”
In fdisk create a new primary partition occupying all of the available space
In fdisk press “t” to chnage the partition type to “fb”
In fdisk “x” to go to Expert mode and then use “b” to change the start block of the partition to the correct block
In fdisk use “p” to print the current partition table and confirm that it corresponds to the recorded information
Use “w” to quit fdisk and write the new partition table to disk.
Use the VI Client to refresh storage on the ESX server, you should see your Datastore return

Below is a sequence that may help make it clearer:

The VM before the Datastore was deleted:

The VM after the Datastore was deleted:

Command sequence on the ESX server console:

[root@localhost ~]# esxcfg-scsidevs -c
Device UID           Device Type      Console Device Size      Plugin Display Name
mpx.vmhba0:C0:T0:L0 Direct-Access    /dev/sda        16384MB   NMP     Local VMware, Disk (mpx.vmhba0:C0:T0:L0)
mpx.vmhba0:C0:T1:L0 Direct-Access    /dev/sdb        8192MB    NMP     Local VMware, Disk (mpx.vmhba0:C0:T1:L0)
mpx.vmhba32:C0:T0:L0 CD-ROM           /dev/sr0        0MB       NMP     Local NECVMWar CD-ROM (mpx.vmhba32:C0:T0:L0)
[root@localhost ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 8192.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 8589 MB, 8589934592 bytes
64 heads, 32 sectors/track, 8192 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id System

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-8192, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-8192, default 8192):
Using default value 8192

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fb
Changed system type of partition 1 to fb (VMware VMFS)

Command (m for help): x

Expert command (m for help): b
Partition number (1-4): 1
New beginning of data (32-16777215, default 32): 128

Expert command (m for help): p

Disk /dev/sdb: 8589 MB, 8589934592 bytes
64 heads, 32 sectors/track, 8192 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id System
/dev/sdb1               1        8192     8388544   fb VMware VMFS

Expert command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
[root@localhost ~]# fdisk -lu /dev/sdb

Disk /dev/sdb: 8589 MB, 8589934592 bytes
64 heads, 32 sectors/track, 8192 cylinders, total 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id System
/dev/sdb1             128    16777215     8388544   fb VMware VMFS
[root@localhost ~]#

3 Responses to Recovering a deleted VMFS Datastore

Claudio says:

April 30, 2010 at 11:10 am

MANY THANKS !!!!
sharon says:

February 16, 2011 at 9:41 am

Thanks for the article,
i fixed it with
esxcfg-volume -l
esxcfg-volume -m

Sharon
Alastair says:

February 22, 2011 at 11:03 pm

Hi Sharon,

Those commands will restore to service a datastore where the LUN presentation has changed rather than the datastore being deleted.

More information in the vmware KB, kb.vmware.com/kb/1011387

Comments are closed.

Recovering a deleted VMFS Datastore

Reducing the Risk

Background