ESXi 6 – weird host HA error

I came across a strange fault with VMware HA today, where a host was reporting an error in its ability to support HA, and  wouldn’t “Reconfigure for HA”

Attempts to perform the reconfigure failed and generated a failed task with the status “Cannot install the vCenter Server agent service. Cannot upload agent”

Screen Shot 2016-08-04 at 15.59.32

Taking the host in and out of Maintenance Mode had no effect, and I could find no pertinent errors in the host logs.

I couldn’t find anything particularly relevant in a google search either, but on digging through the VCenter logs I found the following:

 2016-08-04T15:29:28.567+01:00 info vpxd[16756] [Originator@6876 sub=HostUpgrader opID=909E5426-000012CB-b0-7d] [VpxdHostUpgrader] Fdm on host-6787 has build 3018524. Expected build is 3634793 - will upgrade
2016-08-04T15:29:28.725+01:00 info vpxd[16756] [Originator@6876 sub=HostAccess opID=909E5426-000012CB-b0-7d] Using vpxapi.version.version10 to communicate with vpxa at host guebesx-dell-001.skybet.net
2016-08-04T15:29:28.910+01:00 warning vpxd[16756] [Originator@6876 sub=Libs opID=909E5426-000012CB-b0-7d] SSL: Unknown SSL Error
2016-08-04T15:29:28.911+01:00 info vpxd[16756] [Originator@6876 sub=Libs opID=909E5426-000012CB-b0-7d] SSL Error: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
2016-08-04T15:29:28.911+01:00 warning vpxd[16756] [Originator@6876 sub=Libs opID=909E5426-000012CB-b0-7d] SSL: connect failed
2016-08-04T15:29:28.911+01:00 warning vpxd[16756] [Originator@6876 sub=Default opID=909E5426-000012CB-b0-7d] [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: The remote host certificate has these problems:
-->
--> * The host certificate chain is incomplete.
-->
--> * unable to get local issuer certificate
-->
2016-08-04T15:29:28.912+01:00 error vpxd[16756] [Originator@6876 sub=vpxNfcClient opID=909E5426-000012CB-b0-7d] [VpxNfcClient] Unable to connect to NFC server: The remote host certificate has these problems:
-->
--> * The host certificate chain is incomplete.
-->
--> * unable to get local issuer certificate
2016-08-04T15:29:28.913+01:00 error vpxd[16756] [Originator@6876 sub=HostAccess opID=909E5426-000012CB-b0-7d] [VpxdHostAccess] Failed to upload files: vim.fault.SSLVerifyFault
2016-08-04T15:29:28.918+01:00 error vpxd[16756] [Originator@6876 sub=DAS opID=909E5426-000012CB-b0-7d] [VpxdDasConfigLRO] InstallDas failed on host guebesx-dell-001.skybet.net: class Vim::Fault::AgentInstallFailed::Exception(vim.fault.AgentInstallFailed)
2016-08-04T15:29:28.919+01:00 info vpxd[16756] [Originator@6876 sub=MoHost opID=909E5426-000012CB-b0-7d] [HostMo::UpdateDasState] VC state for host host-6787 (uninitialized -> init error), FDM state (UNKNOWN_FDM_HSTATE -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
2016-08-04T15:29:28.950+01:00 info vpxd[16756] [Originator@6876 sub=vpxLro opID=909E5426-000012CB-b0-7d] [VpxLRO] -- FINISH task-internal-15007334
2016-08-04T15:29:28.950+01:00 info vpxd[16756] [Originator@6876 sub=Default opID=909E5426-000012CB-b0-7d] [VpxLRO] -- ERROR task-internal-15007334 -- -- DasConfig.ConfigureHost: vim.fault.AgentInstallFailed:
--> Result:
--> (vim.fault.AgentInstallFailed) {
--> faultCause = (vmodl.MethodFault) null,
--> reason = "AgentUploadFailed",
--> statusCode = <unset>,
--> installerOutput = <unset>,
--> msg = ""
--> }
--> Args:
-->  

I’m not sure what had caused the certificate error, but a simple disconnect and reconnect of the host cleared the fault and allowed the HA agent to configure successfully.