Reconfiguring VSAN storage on Dell PERC H710P Mini Array Controlller

I recently had to reorganise the storage on one of our VSAN clusters. The hosts have H710P array controllers, which don’t have pass-thru capability, so each disk has to be created as a RAID0 Virtual Disk on the array controller.

In addition, the 2 SSD drives had been placed into a single RAID0 array, which needed breaking apart, to enable the use of 2 VSAN Disk Groups (giving 2 separate failure domains instead of 1 great big one!)

On top of this, with only 3 hosts in the farm at this point in time, there was no option to fully evacuate the data from each host, I had to treat each server as a “failure” and allow VSAN to create new mirror copies after the reconfiguration of each host.

Here are the steps I went through:

Original – 2x SSD in RAID0, 4 separate RAID0 HDD drives – in one disk group

New – 2 separate RAID0 SSD, 10 separate RAID0 HDD drives – equally divided between 2 disk groups

Steps

  1. Place host in maintenance mode, with “Ensure accessibility” option.

    (can only choose “Full data migration” if there are more than 3 hosts in the cluster and sufficient storage)

  2. To complete entering maintenance mode, it will be necessary to power down the NSX Controller running on this host.
  3. Attach remote server console (iDRAC) and reboot server
  4. Enter the BIOS (F2 at server boot)
  5. Select “Device Configuration”
  6. Select “Integrated RAID Controller 1: Dell PERC < PERC H710P Mini>”
  7. Delete the old SSD Virtual Disk:
    1. Select “Select Virtual Disk Operations”
    2. Choose the SSD disk from the “Select Virtual Disk” dropdown
    3. Select “Delete Virtual Disk”
    4. Tick the checkbox to Confirm and Select Yes
    5. Click Back
  8. Select “Create Virtual Disk”
  9. For each SDD to add as a VSAN SSD disk perform the following:
    1. Leave RAID Level at RAID0
    2. Select “Select Physical Disks”
    3. Select Media Type “SSD”
    4. Select the appropriate disk from the list
    5. Select “Apply Changes”
    6. Select “OK”
    7. Enter a “Virtual Disk Name” of “VSAN1_SSD1” or “VSAN2_SSD1”
    8. Leave all other settings at default and choose “Create Virtual Disk”
    9. Tick the checkbox to Confirm and Select Yes
    10. Select “OK”
    11. Repeat for the other SSD
  1. For each HDD to add as a VSAN disk perform the following:
    1. Select “Create Virtual Disk”
    2. Leave RAID Level at RAID0
    3. Leave Media Type at “HDD”
    4. Select the appropriate disk from the list
    5. Select “Apply Changes”
    6. Select “OK”
    7. Enter a “Virtual Disk Name” of “VSAN_HDD”
    8. Leave all other settings at default and choose “Create Virtual Disk”
    9. Tick the checkbox to Confirm and Select Yes
    10. Select “OK”
    11. Repeat for the other HDD drives
  2. Select “Back”, “Back”, “Finish” , “Finish” , “Finish” to leave the BIOS
  3. Allow the host to boot back up
  4. Allow the host to reconnect into VCenter
  5. Select the Cluster the host is in, and choose the Manage tab and Virtual SAN “Disk Management” subheading
  6. Select the disk group showing “Unhealthy” and click the “Remove Disk Group” icon.
  7. Select “Yes” to remove the disk group
  8. Launch PowerCLI and use the following script to change the disk type of the SSDs to SSD:
    $server = “hostname.domain.name”
    Connect-VIServer -Server $server -user root -password *******
    $esxcli = Get-EsxCli -VMHost $server
    $localDisk = Get-ScsiLun | where {$_.CapacityGB -lt 200 -and $_.CapacityGB -gt 180}|foreach {$canName = $_.CanonicalName;$satp = ($esxcli.storage.nmp.device.list() | where {$_.Device -eq $canName }).StorageArrayType;$esxcli.storage.nmp.satp.rule.add($null,$null,$null,$canname,$null,$null,$null,”enable_ssd”,$null,$null,$satp,$null,$null,$null);$esxcli.storage.core.claiming.reclaim($canName)}
    $esxcli.storage.core.device.list()|select Device, Size, IsSSD
    Disconnect-VIServer -confirm:$false
    

    (I found this on a forum post, and which I now can’t locate to give the proper attribution, sorry)

  9. Return to the Web Client and navigate to the host, select the “Manage” tab and the “Storage” and “Storage Devices” subsections. Note the “naa id” of the disks marked as SDD.
    These need the partition tables clearing, so they can be reused by VSAN
  10. Clearing the partition table:
    1. SSH to the host, and login as root
    2. cd /vmfs/devices/disks
    3. Use “ls <id>” to ensure the disk is there
    4. Issue the command “partedUtil mklabel /vmfs/devices/disks/<id> msdos” to clear the old and incorrect GPT table
    5. Repeat for the other SSD.
  11. Return to VCenter Web Client and select the Cluster the host is in, and choose the Manage tab and Virtual SAN “Disk Management” subheading
  12. Select the host and select the “Create a New Disk Group” icon
  13. Select an SSD and 5 HDD drives and click “OK” (if the SSDs aren’t displayed, you may need to do a storage rescan)
  14. Repeat to create a second disk group
  15. Ensure both disk groups are created successfully
  16. Return to the Hosts and Clusters view
  17. Take the host out of “Maintenance Mode”
  18. Select the cluster, and navigate to the “Monitor” tab, and the “Virtual SAN” and “Virtual Disks” subsections.
  19. Monitor until all entries in “Physical Disk Placement” are showing “Active” for all VM disk components. This will not start  until the timer (configurable in Advanced Setting “VSAN.ClomRepairDelay”, default 60 minutes) has expired.

Removing unwanted VMware Tools modules

I had a fault raised to me a few weeks back, over a Windows VM that was flagging a warning in it’s eventlog:

vnetfilter

I set up a test VM using the same base image, and found that it also had the same issue.

Some digging around in the VMware KB turned up this article. We don’t use vShield on this particular environment, and a little more investigation showed the HGFS driver also loaded. Basically, the base VM template had a “Full” VMware Tools install instead of the normal “Typical” install.

I figured there should be a way of removing the unwanted modules, and this page seemed to imply it was possible. We don’t like to do anything interactively, so I moved straight to the command line.

start /wait setup.exe /S /v" /qn REBOOT=R ADDLOCAL=ALL REMOVE=VShield,Hgfs"

..looked like it should do the trick.

A quick trial run though, showed it left the VShield and Hgfs modules installed.

My next attempt, deleting the vShield and vmHgfs “services” and running the same command line, also fundamentally failed. This time it actually reinstalled the drivers when I let it do an automatic tools upgrade to match the host.

My next approach was to perform an uninstall of the VMtools, so that I could do a clean install without the unnecessary modules. This of course failed because the VM was running on VMXNET3, and removing the VMtools removed the drivers and broke the link to the automation server!

The final solution I ended up with, was the following scripting steps:

  1. Copy VMtools install files to target VM
  2. Copy VMXNET3 drivers to target VM
  3. Uninstall VMTools (with a powershell script)
    Install VMXNET3 drivers with “pnputil” utility
    Reboot the VM
  4. Reinstall VMtools without the VShield and Hgfs services (using the command line shown above)

I hope this helps anyone in the same predicament. If anyone has found a way of automating just the removal of the vShield and HFGS drivers, please let me know!

Automating NSX from PowerCLI

I’ve been working on an NSX-based project recently, and given the task of automating the addition of new DLR Logical Switches and Edge devices.

After a discussion around the alternatives with colleagues, we decided the best way forward (for now) was to do it in PowerShell/PowerCLI, and a quick google found Chris Wahl’s post here

This was a great basis to work from (Thanks Chris!), but lacked a number of things I needed: Creating DHCP Pools, attaching a Logical Switch to an existing Edge device, and some relatively minor amendments to DLR/Edge configurations.

Of these the attachment of a new LS to an existing Edge proved the most intellectually taxing, as Chris’ scripts work with building new raw XML to PUT/POST with the REST API, and I soon discovered that the only way to amend an Edge configuration through the REST API is to pull the existing config as an XML, amend it, and PUT it back.

On top of this, the XML retrieved through the “Invoke-WebRequest” PowerShell cmdlet is of type “System.Xml.XmlElement” whereas to do things like “CreateElement” – which we need to do to add new entries into the configuration –  it needs to be of type “System.Xml.XmlDocument”.

After a number of failed workarounds, I found that dumping the XML to a file, and reimporting, gave me the XML in the correct object type – this is a little ugly though, and while the automation is for something that would only be used occasionally, I don’t like ugly hacks in my code!

A little more effort, and I had a suitable alternative – casting the XML to a string object and back to an XML object yielded the result I was looking for.

$edge = Invoke-WebRequest -Uri “$uri/api/4.0/edges/$routerid” -Headers $head -ContentType “application/xml” -ErrorAction:Stop
 [xml]$edgexml = $edge.Content
 $textxml = $edgexml.innerxml
 [xml]$body = $textxml

I could then work with $body as a normal XmlDocument object in PowerShell.

The next issue I had was making the amendments to the XML.

First – make sure the new Logical Switch is not already attached, and then find the first unused interface:

foreach ($vnic in $body.edge.vnics.vnic) {
     if ($vnic.name -match $config.newLS.name) {
          $attached = "true"
          Write-Host -BackgroundColor:Black -ForegroundColor:Red "Warning: $($config.newLS.name) already attached. Skipping."
          break
     if ($vnic.isConnected -match "false") {

Second – setting values for XML entities that were already in the XML. Easy:

$vnic.name = $config.newLS.name
$vnic.isConnected = "true"

Third – adding a new XML entity that wasn’t already there:

$elem = $body.CreateElement("portgroupId")
$vnic.AppendChild($elem)
$vnic.portgroupId = $switchvwire.get_Item($config.newLS.name)

Finally – adding entities to an empty node “<addressGroups />”. Not so easy! This took some considerable time, including many false starts! In the end I discovered that to “find” the empty node using SelectSingleNode I had to set up a namespace. Then I could find it and remove it (this seemed easier than trying to attach entries to the empty node). Then I could create some raw XML and attach it into the Edge configuration XML using ImportNode and AppendChild.

$ns = New-Object -TypeName System.Xml.XmlNamespaceManager -ArgumentList $body.NameTable
$ns.AddNamespace("ns",$body.DocumentElement.NamespaceURI)
$oldAddressGroups = $vnic.SelectSingleNode("//vnic[index=$inf]/addressGroups")
$vnic.RemoveChild($oldAddressGroups)
[xml] $addr = "<addressGroups>
  <addressGroup>
    <primaryAddress>$($config.newLS.edgeip)</primaryAddress>
    <subnetMask>$($config.newLS.mask)</subnetMask>
  </addressGroup>
</addressGroups>"
$vnic.AppendChild($body.ImportNode($addr.addressGroups, $true))

Once that was done all I had to do was send the XML back using Invoke-WebRequest

# Attach new logical switch to existing Edge
try {$r = Invoke-WebRequest -Uri "$uri/api/4.0/edges/$routerid" -Body $body -Method:Put -Headers $head -ContentType "application/xml" -ErrorAction:Stop -TimeoutSec 30} catch {Failure}
if ($r.StatusCode -match "204") {Write-Host -BackgroundColor:Black -ForegroundColor:Green "Status: Successfully attached new Logical Switch to $($config.edge.name)."}
else {
$body
throw "Was not able to add new Logical Switch to existing Edge. API status code was not 204."
}
break}

I’ve no doubt that there are probably some better ways of achieving some of what I’ve done here, but I thought I would post it up in case anyone is looking to do something similar.