The case of the disappearing datastore

I was asked this week to investigate an issue where out of multiple hosts and multiple datastores, there was one host which couldn’t access a single datastore.

In the past, I’ve seen issues where a datastore was only visible to a single host in the cluster, or a host had lost access to all datastores, but never a single datastore unavailable on a single host.

Talking someone else through diagnosing something obscure like this is never easy, but getting the end of the vmkernel.log was enlightening:

2014-06-17T11:03:05.457Z cpu31:64368)WARNING: HBX: 1889: Failed to initialize VMFS3 distributed locking on volume 529dc221-e68ab3c4-3e52-b499baa3e4c6: Not supported
2014-06-17T11:03:05.457Z cpu31:64368)FSS: 890: Failed to get object f530 28 1 529dc221 e68ab3c4 99b43e52 c6e4a3ba 0 0 0 0 0 0 0 :Not supported
2014-06-17T11:03:05.457Z cpu31:64368)WARNING: Fil3: 2034: Failed to reserve volume f530 28 1 529dc221 e68ab3c4 99b43e52 c6e4a3ba 0 0 0 0 0 0 0
2014-06-17T11:03:05.457Z cpu31:64368)FSS: 890: Failed to get object f530 28 2 529dc221 e68ab3c4 99b43e52 c6e4a3ba 4 1 0 0 0 0 0 :Not supported
2014-06-17T11:03:05.583Z cpu36:64370)HBX: 676: Setting pulse [HB state abcdef02 offset 3710976 gen 35 stampUS 85863025463 uuid 539ed180-c09ff563-6044-e4115b10555a jrnl <FB 0> drv 14.54] on vol ‘Datastorename’ failed: Not supported
2014-06-17T11:03:05.583Z cpu36:64370)WARNING: FSAts: 1263: Denying reservation access on an ATS-only vol ‘Datastorename’

A bit of a google found this

Basically, what appears to have happened is that when the datastore was created, Hardware Assisted Locking was available on the storage array. Because it was available, the VMFS filesystem was created with the flag set to use it (ATS-Only)

At some point since, it was no longer supported, and it would seem that this host had lost access and attempted to reconnect (maybe a reboot) to the datastore and failed, because ATS-Only was set, and the array no longer supported that as a locking mechanism.

A few days later, and before the fix was implemented, a power outage took out half the hosts in the farm, when these came back, none of them could access the datastore (unsurprisingly).

Implementing the fix in that link (vmkfstools –configATSOnly 0 /vmfs/devices/disks/device-ID:Partition) removed the setting and restored access to the datastore for all hosts.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s