Had a weird issue brought to my attention a few weeks ago, that I thought was worth a blog post.
A farm of identical vSphere hosts, storage (FC) presentation to them was identical, could see all the LUNs on all the servers, but for one of the datastores, only some of the hosts could see it. I think it was 4 out of 6 could see it and 2 couldn’t.
It was discovered when someone tried to power on a VM, and it was erroring because it couldn’t get a lock on one of the VMDK files:
An unexpected error was received from the ESX host while powering on VM vm-nnnn.
Reason: Failed to lock the file.
Cannot open the disk '/vmfs/volumes/nnnnnnnn-nnnnnnnn-nnnn-nnnnnnnnnnnn/xxxxxxxx/xxxxxxxx.vmdk' or one of the snapshot disks it depends on.
Investigating issues with datastore visibility through VCenter is generally awkward, as when you try to view the contents of a datastore, it seems inconsistent which host it connects to to view it. It may do it through the host you have open in the gui, or it may use a different one. The only way I’ve been sure of it connecting to the host I’m investigating is to point the viclient directly at a host and checking it there.
Investigating the VMFS filesystem with the following command:
vmkfstools -Ph -v1 /vmfs/volumes/VMFS-volume-name
and comparing with others, highlighted that the affected VMFS had it’s mode set to “public ATS-Only” whereas all the others were set to “public”
A bit of googling found a number of articles about this situation, and this one showed how to remove it.
A little more digging found that on certain arrays, ATS (Atomic Test and Set) locking is supported, unless certain features are used, in which case the support is no longer available. It would appear that when this particular VMFS was created, ATS was supported, then maybe replication was configured and ATS no longer supported, and until the hosts were rebooted the datastore was still visible. After a reboot, the array no longer allowed them to use ATS locking so they couldn’t mount the datastore.
The command to remove the ATS-Only setting is:
vmkfstools --configATSOnly 0 /vmfs/devices/disks/disk_ID:P
Once this had been run against the disk partition, a rescan brought the datastore back into view, and the affected VM could be started up.