vSphere/vCenter 6.5 Announced

While I’m sadly not at VMWorld this year, I’m following the announcements quite closely and it’s fab to see some blog posts already on the VMware blog for the newly launched version 6.5 product.

It’s great to see the VMware Update Manager now included in the VCSA (this has been anticipated for some time), as well as direct REST APIs, the HTML5 client and a new HA option for vCenter.

I’m sure there will be further announcements and analysis during this week, but for most people in the VMware community, this should fix a significant number of ‘pain points’ within the VMware base product set.

ESXi 6 – weird host HA error

I came across a strange fault with VMware HA today, where a host was reporting an error in its ability to support HA, and  wouldn’t “Reconfigure for HA”

Attempts to perform the reconfigure failed and generated a failed task with the status “Cannot install the vCenter Server agent service. Cannot upload agent”

Screen Shot 2016-08-04 at 15.59.32

Taking the host in and out of Maintenance Mode had no effect, and I could find no pertinent errors in the host logs.

I couldn’t find anything particularly relevant in a google search either, but on digging through the VCenter logs I found the following:

 2016-08-04T15:29:28.567+01:00 info vpxd[16756] [Originator@6876 sub=HostUpgrader opID=909E5426-000012CB-b0-7d] [VpxdHostUpgrader] Fdm on host-6787 has build 3018524. Expected build is 3634793 - will upgrade
2016-08-04T15:29:28.725+01:00 info vpxd[16756] [Originator@6876 sub=HostAccess opID=909E5426-000012CB-b0-7d] Using vpxapi.version.version10 to communicate with vpxa at host guebesx-dell-001.skybet.net
2016-08-04T15:29:28.910+01:00 warning vpxd[16756] [Originator@6876 sub=Libs opID=909E5426-000012CB-b0-7d] SSL: Unknown SSL Error
2016-08-04T15:29:28.911+01:00 info vpxd[16756] [Originator@6876 sub=Libs opID=909E5426-000012CB-b0-7d] SSL Error: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
2016-08-04T15:29:28.911+01:00 warning vpxd[16756] [Originator@6876 sub=Libs opID=909E5426-000012CB-b0-7d] SSL: connect failed
2016-08-04T15:29:28.911+01:00 warning vpxd[16756] [Originator@6876 sub=Default opID=909E5426-000012CB-b0-7d] [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: The remote host certificate has these problems:
--> * The host certificate chain is incomplete.
--> * unable to get local issuer certificate
2016-08-04T15:29:28.912+01:00 error vpxd[16756] [Originator@6876 sub=vpxNfcClient opID=909E5426-000012CB-b0-7d] [VpxNfcClient] Unable to connect to NFC server: The remote host certificate has these problems:
--> * The host certificate chain is incomplete.
--> * unable to get local issuer certificate
2016-08-04T15:29:28.913+01:00 error vpxd[16756] [Originator@6876 sub=HostAccess opID=909E5426-000012CB-b0-7d] [VpxdHostAccess] Failed to upload files: vim.fault.SSLVerifyFault
2016-08-04T15:29:28.918+01:00 error vpxd[16756] [Originator@6876 sub=DAS opID=909E5426-000012CB-b0-7d] [VpxdDasConfigLRO] InstallDas failed on host guebesx-dell-001.skybet.net: class Vim::Fault::AgentInstallFailed::Exception(vim.fault.AgentInstallFailed)
2016-08-04T15:29:28.919+01:00 info vpxd[16756] [Originator@6876 sub=MoHost opID=909E5426-000012CB-b0-7d] [HostMo::UpdateDasState] VC state for host host-6787 (uninitialized -> init error), FDM state (UNKNOWN_FDM_HSTATE -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
2016-08-04T15:29:28.950+01:00 info vpxd[16756] [Originator@6876 sub=vpxLro opID=909E5426-000012CB-b0-7d] [VpxLRO] -- FINISH task-internal-15007334
2016-08-04T15:29:28.950+01:00 info vpxd[16756] [Originator@6876 sub=Default opID=909E5426-000012CB-b0-7d] [VpxLRO] -- ERROR task-internal-15007334 -- -- DasConfig.ConfigureHost: vim.fault.AgentInstallFailed:
--> Result:
--> (vim.fault.AgentInstallFailed) {
--> faultCause = (vmodl.MethodFault) null,
--> reason = "AgentUploadFailed",
--> statusCode = <unset>,
--> installerOutput = <unset>,
--> msg = ""
--> }
--> Args:

I’m not sure what had caused the certificate error, but a simple disconnect and reconnect of the host cleared the fault and allowed the HA agent to configure successfully.

One chapter closes, another begins…

Today a chapter closes on my career at CSC.

I’ve been working in the same office for 22 years, originally starting straight from Uni in the Unix Support team for the Post Office, looking after NCR and HP Unix servers around the country – much of which was on dial-up modem rather than IP networking.

While part of the Post Office/Consignia/Royal Mail I moved through NT Infrastructure and Internet Infrastructure teams, picking up Windows Server and Internet technology skills, then we were outsourced to CSC in June 2003.

I spent a short time in the Firewall management team in CSC, but then moved to the team looking after Windows server infrastructure for the NHS account. This team was almost entirely formed from ex-Royal Mail staff, and had set up a significant amount of automation and standardisation already. It was here that I was first exposed to VMware ESX, and it immediately resonated with me.

Due to the similarities with Unix, and because I could see the future benefits of virtualised infrastructure, I decided to try and become the team expert in VMware ESX. I learned a lot along the way, and I’m grateful to the TAM team at VMware for the learning opportunities they made available – Joshua Lory, Adrian Voss, Jesse Shapiro and Liam Farrell, I thank you all.

I was by no means the only VMware expert though, having colleagues with the same thirst for knowledge really pushed me along and we have been pretty competitive in our quest for certification and recognition. I wouldn’t even have thought to apply for vExpert if my colleague Darry Cauldwell hadn’t done so, and I believe my achieving double VCAP-DCV and VCIX-NV has pushed others along the certification path.

But 22 years is a long time to spend in one location, and I’ve felt for a while that it was time to find a new challenge, so I will be starting a new role on Monday, with Sky Betting and Gaming. It will be a very different working environment compared with a global outsourcer, but one I’m really looking forward to.

NSX LoadBalancer – character “/” is not permitted in server name


This was an odd error that a colleague brought to me while testing automation around the configuration of an NSX Edge.

He had created the Edge successfully, and configured the Load Balancer, but on trying to enable it, it was erroring. When he tried enabling it through the Web Client, the above error was displayed, and the change was automatically reverted.

After a lot of digging, I discovered that the configuration for the Load Balancer had a Pool where the “IP Address / VC Container” object was a Service Group, and one of the members of that Service Group was an IPSet for the CIDR block that NSX was trying to include in the server name.

I’m not sure whether that is even a supported configuration, but I changed it to point to a Service Group that included the members of the target web farm, and the Load Balancer could then be configured successfully.

PowerCLI code snippet to get storage driver details

This is just a brief post to share a code snippet that I built to display the storage driver in use.

The driver and it’s version are critical for VMware VSAN, and I needed a quick and easy way of checking it. I might revise the code at a later date to run across multiple hosts in a cluster, and output the results in a table, but for now, here’s the basics.

connect-viserver <vcname>
$esxcli = Get-EsxCli -vmhost <esxihostname>
$adapter = $esxcli.storage.core.adapter.list() |
select Description,Driver,HBAName | where {$_.HBAName -match "vmhba0"}
$driver = $adapter.Driver -replace "_", "-"
$esxcli.software.vib.list() |
Select Name,Version,Vendor,ID,AcceptanceLevel,InstallDate,ReleaseDate,Status |
Where {$_.Name -match ($driver + "$")}

This displays output such as

Name            : scsi-megaraid-sas
Version         : 6.603.55.00-1OEM.550.0.0.1331820
Vendor          : LSI
ID              : LSI_bootbank_scsi-megaraid-sas_6.603.55.00-1OEM.550.0.0.1331820
AcceptanceLevel : VMwareCertified
InstallDate     : 2016-05-03
ReleaseDate     :
Status          :

This works for the servers I’ve tried it on (Dell) but as usual YMMV…

Github Desktop from behind a corporate proxy server

After having just helped a colleague get through the tortuous path of configuring Github Desktop to work through a proxy, I thought it might be worth blogging it all.

Different parts of Github Desktop require the proxy information to be provided in different ways, and without all 3 pieces of configuration, you will find that some things work, but not others.

  1. Internet Explorer proxy setting
    This *has* to be set to a specific proxy server, and not using an autoconfig script.
  2. .gitconfig
    This is found in your user home directory (usually C:\Users\<Username>) and requires the following lines:
    proxy = http:// <proxy-address>:<port>
    proxy = http:// <proxy-address>:<port>
  3. HTTPS_PROXY/HTTP_PROXY environment variable
    You can set this in your local environment, or in the system environment settings, as long as it’s visible to the Github Desktop processes.
    set HTTPS_PROXY=http://<proxy-address>:<port>

If a userid/password is required, it’s recommended that you run something like CNTLM to do the authentication, rather than adding the plaintext credentials to the proxy string.

Once you’ve configured all that, if you’re using Enterprise Github, you will probably need to use a Personal Access Token, rather than your password, to authenticate Github Desktop. This can be created by logging in with a browser and going to Settings / Personal Access Tokens.

I hope that helps someone out, but if not, I’m sure I’ll be using it as a reminder when I have to change it all between using it at Home and at Work…

ESXi 6.0 – Switching from persistent scratch to transient scratch

KB article 1033696 is very helpful when you want to configure persistent scratch on your USB/SDCard/PXE booted ESXi host, however when you want to go the other way, things can be slightly complicated.

Consider the following situation. You have installed ESXi onto a local USB stick, and have temporarily retasked a drive from what will become your VSAN array, to be used to run up VCenter and a PSC.
On the next reboot, ESXi will see the persistent local storage, and automatically choose to run scratch on it.
From that point onwards, how do you switch back and release the disk for use by VSAN?

You can’t set the advanced configuration "ScratchConfig.ConfiguredScratchLocation" to  blank (eg “”), that was the first thing I tried. It accepts the command, but the setting remains pointed at the VMFS location.

You can’t just unmount or delete the VMFS filesystem, it’s in use

You can’t set the advanced configuration "ScratchConfig.ConfiguredScratchLocation" to  /tmp/scratch, it accepts the value, but on reboot, it’s discovered the VMFS filesystem again.

Other combinations of advanced configuration settings, and editing or removing the /etc/vmware/locker.conf also failed to stop it from loading the scratch onto the VMFS filesystem at boot.

In the end, I was able to get around this, by using storcli to offline the disk. The server could then be rebooted without mounting the VMFS filesystem, so scratch was then running from /tmp/scratch (on the ramdisk). The disk could then be brought online again, and the VMFS filesystem destroyed. I guess an alternative approach would be to point the scratch location at an NFS location, which should take precedence over a “discovered” local persistent VMFS filesystem, and allow the VMFS filesystem to be deleted.

I hope that helps someone else, as I spent far more time than I should have going round in a loop, steadily losing my marbles, because there didn’t seem to be any information around about how to do it.