vSphere LifeCycle Manager Image-based Updates – Automation

When vSphere 7 introduced vLCM to replace VUM (VMware Update Manager) it was announced that the Baseline update approach would be deprecated in a later version, in favour of Image-based updates.

We extensively use automation around Baselines, to show a measurable compliance against our quarterly patching cycle. This went something like this:

  • On the first day of each calendar quarter (eg 1st Jan, 1st April etc) a script is run against each vCenter to create a new baseline which includes all the latest VMware patches. This is attached to all datacenters, and any baselines older than a year are removed
  • Scripts are then run against each vCenter, to apply the new baseline – starting with the Lab environment, and working through in order of criticality, finishing on the production one. For example, the Lab would be done as soon as the baseline is available, then a week later, non-production environments, then DR a week later, and then finally Production a week after DR.
  • Reports are run weekly, showing the status of each host against the baselines, enabling us to track the compliance, and the progress of the rollout.

Image-based updates work in quite a different way, so this approach needed some rethinking. The only feasible way seems to be to just measure compliance against the defined image, so that is what I’ve gone with.

The main purpose of this post, is to cover the PowerCLI cmdlets and structures that were used to achieve this, both as an aide-memoir for me, and to share my findings for anyone doing a similar migration.

Checking if a cluster is using Image-based updates

This is a simple one

(Get-Cluster $cluster).CollectiveHostManagementEnabled

returns $true

Updating the Image

I had to use a Try…Catch here to detect whether there was an update to apply

try {
	$rec = Get-LcmClusterDesiredStateRecommendation -Current -Cluster $cluster -ErrorAction:Stop
} catch {
	Write-Output "Cluster $cluster has no recommended updates"
	continue
	# move on to the next vCenter
}

$update=Set-Cluster -Cluster $cluster -BaseImage $rec.Image -VendorAddOn $rec.VendorAddOn -Confirm:$false

This only updates the Base Image and Vendor AddOn, I’ve not touched Components, or the Firmware and Drivers Addon at this time.

Testing compliance

Test-LcmClusterCompliance returns an object containing the status of the compliance, and arrays of the hosts matching certain states.

$comp=Get-Cluster -Name $cluster |Test-LcmClusterCompliance
$comp.Status
$comp.CompliantHosts
$comp.NonCompliantHosts

I’ve then used this output to produce a report (and sorry, this is quite verbose!)

$statuses=@()

$statuses+=$comp.NonCompliantHosts | %{ $_ | select VMhost, 
            @{N="CurrentImage";E={[string]$_.BaseImageCompliance.Status + " " + 
            $_.BaseImageCompliance.Current.Name + " - " + $_.BaseImageCompliance.Current.Version}}, 
            @{N="CurrentAddOn";E={[string]$_.AddOnCompliance.Status + " " + 
            ($_.AddOnCompliance.Current.Name).Replace("PowerEdge Servers running ","") + " - " + 
            $_.AddOnCompliance.Current.Version}}, 
            @{N="TargetImage";E={$_.BaseImageCompliance.Target.Name + " - " + 
            $_.BaseImageCompliance.Target.Version}},
            @{N="TargetAddOn";E={($_.AddOnCompliance.Target.Name).Replace("PowerEdge Servers running ","")+ 
            " - " + $_.AddOnCompliance.Target.Version}} } 
                                        
$statuses+=$comp.CompliantHosts | %{ $_ | select VMhost, 
            @{N="CurrentImage";E={[string]$_.BaseImageCompliance.Status + " " + 
            $_.BaseImageCompliance.Current.Name + " - " + $_.BaseImageCompliance.Current.Version}}, 
            @{N="CurrentAddOn";E={[string]$_.AddOnCompliance.Status + " " + 
            ($_.AddOnCompliance.Current.Name).Replace("PowerEdge Servers running ","") + " - " + 
            $_.AddOnCompliance.Current.Version}}, 
            @{N="TargetImage";E={$_.BaseImageCompliance.Target.Name + " - " + 
            $_.BaseImageCompliance.Target.Version}},
            @{N="TargetAddOn";E={($_.AddOnCompliance.Target.Name).Replace("PowerEdge Servers running ","")+ 
            " - " + $_.AddOnCompliance.Target.Version}} } 

This is then formatted into a report like:

Checking when the Image was updated

This is necessary to allow a staged approach through the environments, while keeping the quarterly updates across the estate. It’s not something that I found possible to do via normal LCM cmdlets, and had to dig around in the SDK cmdlets

Anything using the SDK cmdlets has to use the bare moref IDs, which means cropping the object type off the front of the ID

$comp=Invoke-GetClusterSoftwareCompliance -Cluster `
(Get-Cluster $cluster).Id.Replace("ClusterComputeResource-","")

$comp

incompatible_hosts  : {}
hosts               : @{host-7829=; host-7830=}
non_compliant_hosts : {}
impact              : NO_IMPACT
commit              : 10
compliant_hosts     : {host-7829, host-7830}
scan_time           : 13/10/2022 12:18:26
unavailable_hosts   : {}
notifications       :
host_info           : @{host-7829=; host-7830=}
status              : COMPLIANT
$commit=Invoke-GetClusterCommitSoftware -Cluster `
(Get-Cluster $cluster).Id.Replace("ClusterComputeResource-","") -commit $comp.commit

author           apply_status description commit_time
------           ------------ ----------- -----------
xxxxxx@xxx.xxx   APPLIED                  13/10/2022 12:30:25

That commit_time is the time the Image was updated

Applying the image – Whole Cluster

This can be done with a one-liner

Get-Cluster -Name $cluster | Set-Cluster -Remediate -AcceptEULA

Applying the image – Individual Host

This is significantly more complicated, but our VMware TAM pointed me in the direction of the SDK cmdlets again for this.

# The SDK cmdlets need the moref ID's trimming
$clusterid=(Get-Cluster $cluster ).Id.Replace("ClusterComputeResource-","")
$vmhostid=(Get-VMHost $vmhost).Id.Replace("HostSystem-","")

# Get the time just before we start the task, so we can filter the Get-Task output
$start = Get-Date

# Create a specification object - you can supply more than one $vmhostid, comma separated
$SettingsClustersSoftwareApplySpec = Initialize-SettingsClustersSoftwareApplySpec -Hosts `
 $vmhostid -AcceptEula $true 

# Apply the specification object to the cluster
Invoke-ApplyClusterSoftwareAsync -Cluster $clusterid -SettingsClustersSoftwareApplySpec `
 $SettingsClustersSoftwareApplySpec

# The apply task runs async, and the output doesn't seem to match to a task id, so now we find the task
$task=Get-Task |?{$_.ObjectId -eq $cluster.Id -and $_.StartTime -gt $start -and $_.Name -eq "apply`$task"} 

# Loop until the task finishes
While ($task.State -eq "Running") {
    Sleep -Seconds 60 
    $task=Get-Task |?{$_.ObjectId -eq $cluster.Id -and $_.StartTime -gt $start -and `
      $_.Name -eq "apply`$task"} 
}

We do things this way so that we can silence alerting for each host while it patches. If that’s not something you bother with, then it’s far simpler to do the one-liner above!

PowerCLI – Disabling ESXi OpenSLP service for VMSA-2021-0002

OpenSLP has cropped up again as an ESXi vulnerability, and if you want to disable the service the KB article given only has details for doing so via the ESXi command line.

Far easier, if you have many hosts, is to use PowerCLI, and while it’s relatively simple I thought I would share this to help anyone else wanting to do so.

Disabling the service
Connect to the environment with ‘connect-viserver’ and then run:

Get-VMHost | %{
	$_ | Get-VMHostFirewallException -Name "CIM SLP" | Set-VMHostFirewallException -Enabled:$false
	Stop-VMHostService -HostService ($_ | Get-VMHostService | ?{$_.Key -eq "slpd"}) -Confirm:$false
	$_ | Get-VMHostService | ?{$_.key -match "slpd"} | Set-VMHostService -Policy "off"
}

Checking the status
Connect to the environment with ‘connect-viserver’ and then run:

Get-VMHost | %{
	$rule = $_ | Get-VMHostFirewallException -Name "CIM SLP"
	$serv = $_ | Get-VMHostService | ?{$_.Key -eq "slpd"}
	$_ | select Name,@{N="Rule";E={$rule.enabled}},@{N="ServiceRunning";E={$serv.Running}},@{N="ServiceEnabled";E={$serv.Policy}}
}

Edit : As per the comment from Zeev, I’d missed disabling the service, I’ve updated the Disabling and Checking scripts above to include the correct information now.

PowerCLI: Find VMs with xHCI controller

The ESXi vulnerability found at the 2020 Tianfu Cup was a Critical one, with a CVSSv3 base score of 9.3.

VMware lists an article with the fixes and workarounds here:
https://www.vmware.com/security/advisories/VMSA-2020-0026.html
The fix is to apply the latest patch, and the workaround is to remove the xHCI (USB 3.0) controller from any VMs that have it.

To quickly determine whether you have an exposure you can run the following PowerCLI against your environment and it will list the VMs which have that particular controller type attached.

Get-VM | ?{$_.ExtensionData.Config.Hardware.Device.DeviceInfo.Label -match "xhci"}

Extracting Dell Original Configuration with PowerShell

This would probably have taken less time if I’d just input each tag to the website and done “Export to csv” however I hate repetitive tasks and thought I’d be able to reuse some existing PowerShell scripting that does HTML mechanisation.

The Dell Support web page allows you to submit a server tag, and view the original configuration for the server when it was delivered. This is displayed in a series of expandable sections, but with an option to export to csv.

I had a list of servers that were available for re-use, with serial numbers, but no hardware specs, so decided to download the specifications to decide whether any of them would be suitable for my requirement.

The first hurdle was the I could only run the PowerShell from my Mac, so that meant using PowerShell Core. In turn I quickly discovered that I couldn’t use the more useful HTML parsing methods as they utilise the Internet Explorer engine.

Next I found that with the UserAgent field left as default, the request was being intercepted by the outbound proxy, fortunately Invoke-WebRequest allows you to spoof the UserAgent, enabling the page to be retrieved.

After a few abortive attempts, I discovered that there is a field used by the “Export to csv” function in the web page which appears to contain the hardware inventory in an html encoded JSON string.

Firstly I had to string this specific line out of the raw HTML, then strip the data out of the HTML code. Next I had to fix a corruption caused by the entry relating to the 2.5″ drives (the escaping of the double quote wasn’t working), and then HTML decode the text and convert from JSON to a PowerShell object.

The PowerShell object had an entry for each different type of component, but within this it had an embedded list on some lines, for example a hard drive might have parts for Carrier, Label, Drive, Screws, so this list then had to be parsed into a readable text line.

The resultant code is here:

$tag = "<dell_tag>"
$url = "http://www.dell.com/support/home/us/en/04/product-support/servicetag/$tag/configuration"
$useragent = "Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405"
$html = Invoke-WebRequest -uri $url -UserAgent $useragent
$str = $html.Content.Split( [Environment]::NewLine ) | Select-String 'hdnParts'
$json = ([string]$str).Split("=")[8].split("/>")[0] -replace "2.5\\&quot","2.5inch" | convertfrom-json
$table = [System.Web.HttpUtility]::HtmlDecode($json) | convertfrom-json
$table | Select SkuNumber,SkuDescription,@{N="Qty";E={$_.Parts | %{"$($_.Qty) $($_.Description)"}}} | export-csv -path "$($Tag).csv" -includetypeinformation:$false

A lot of work for a few lines of code, but re-usable (until Dell change their website) and more interesting than a bunch of copy/paste/click

 

 

ESXi UDP Source Port Pass Vulnerability

I’ve not blogged for a while, one of the main reasons was because I had a VCAP exam fail to launch. I was intending to use it to recertify, and have spent a significant part of this year going back and forth with certification support trying to sort out resolution. Anyway, after 3 extensions to my cert expiry, I went and did the VCP6.5DCV Delta exam, so I’m recertified until Sept 2020 (jeepers, 2020, that’s like, the future!) and can think about other stuff again…

Our security team do very regular vulnerability scans, which keeps us on our toes with deploying patches, and tweaking baseline configs. One of the on-going vuln reports we’ve had for some time is for ESXi UDP Source Port Pass Firewall, for UDP port 53.

This issue comes about because the ESXi firewall is stateless and doesn’t know whether inbound traffic is related to an existing outbound connection. An attacker could use this to probe multiple UDP ports by setting the UDP Source Port of a packet to 53 so that the firewall treats it as a reply to an outbound DNS lookup.

VMware support came back with a recommendation to restrict the ‘DNS Client’ firewall rule for ESXi to only allow communication with the DNS Servers, so that any other agent (such as the vulnerability scanner) would not be able to pass through traffic to or from UDP 53.

While this is achievable through the web client, it wouldn’t be practical to update a large number of hosts in that way, so I decided to look at PowerCLI

The following code will fetch a list of hosts, then cycle through each one and set the list of allowed IPs for the DNS Client service to be the IPs which have been set as it’s DNS servers.

# Get list of hosts
$vmhosts = get-vmhost 

foreach ($vmhost in $vmhosts) {
	# Connect ESXCLI
	$EsxCli = Get-EsxCli -VMhost $vmhost
	
	# List DNS Servers 
	$EsxCli.network.firewall.ruleset.list("dns")

	# List existing allowed IP addresses for firewall rule
	$EsxCli.network.firewall.ruleset.allowedip.list("dns").AllowedIPAddresses
	
	# If allowed IPs is currently 'All' then disable that setting
	if($EsxCli.network.firewall.ruleset.allowedip.list("dns").AllowedIPAddresses -eq "All") {
		$EsxCli.network.firewall.ruleset.set($false, $true, "dns")
    }

    # Add any missing DNS server entries as allowed targets
    foreach($dnsaddr in ($vmhost | get-vmhostnetwork).dnsaddress) {
    	if($EsxCli.network.firewall.ruleset.allowedip.list("dns").AllowedIPAddresses | where {$_ -contains $dnsaddr}) {
    		write-output "$dnsaddr already in the list"
    	} else {
		    $EsxCli.network.firewall.ruleset.allowedip.add($dnsaddr, "dns")
		}
    }
 }

 

Clearing old Host Profile answer files

We recently had a problem where the Fault Tolerance logging service seemed to be randomly getting assigned to the VMotion vmknic, instead of it’s dedicated vmknic. This obviously prevented FT state sync from occuring, a fact that I discovered in a 20 minute change window at 4.30AM 😦

I found the cause of the state sync failure by reading through th vmware.log file for the affected VM, and noticing that the sync seemed to be trying to happen between source and destination IPs on different subnets. Looking at the host IP services configuration within the cluster I found a host which was correct (fortunately the host the FT primary was on was correct too), and used that for the secondary VM which enabled sync to occur.

The problem was affecting roughly 50% of the cluster, and had apparently happened a number of times earlier and been corrected. I noticed that these hosts also had remnants of a host profile answer file – just the Hostname and VMotion interface details, whereas the hosts that were still configured correctly didn’t have any answer file settings stored in VCenter.

Easy I though, bit of PowerCLI will sort that, so had a look for cmdlets for viewing/modifying answer file settings. I hit a blank pretty much straightaway. There are cmdlets for host profiles, one of which allows you to include answerfiles as part of applying a host profile, but nothing for viewing/modifying/removing answer files.

So to the Views we go. A bit of searching turned up this which was helpful, and after a bit of testing I came up with:

$hostProfileManagerView = Get-View "HostProfileManager"
$blank = New-Object VMware.Vim.AnswerFileOptionsCreateSpec

foreach ($vmhost in (Get-Cluster <cluster> | Get-VMhost | sort Name)) {
   $file = $hostProfileManagerView.RetrieveAnswerFile($vmhost.ExtensionData.MoRef)
   if ($file.UserInput.length -gt 0) {
     $file = $hostProfileManagerView.UpdateAnswerFile($vmhost.ExtensionData.MoRef,$blank)
     $file = $hostProfileManagerView.RetrieveAnswerFile($vmhost.ExtensionData.MoRef)
     Write-Output "$($vmhost.Name) $([string]$file.UserInput)"
   }
}

This iterates through each host in the cluster, and if it has an answerfile, it replaces it with a blank one.

PowerCLI shortcuts

I’ve just set up some shortcuts for connecting to our various VMware environments, as I was sick of typing out the full

connect-viserver vcsa-name.dns.name

every time.

If you want this to apply for just your userid, you can create (or edit if it already exists)  %UserProfile%\Documents\Windows­PowerShell\profile.ps1

And if you want it to apply for all users, you can create (or edit)
%windir%\system32\Windows­PowerShell\v1.0\profile.ps1

I created the latter, and added lines such as:

function ENV1 {connect-viserver vcsa-name-1.dns.name}
function ENV2 {connect-viserver vcsa-name-2.dns.name}

Now to connect to a VCenter, all I have to type is ENV1
Do you have any favourite powershell/powerCLI shortcuts like this?