Terraform and vSphere – Part 4: More Compute Cluster Resources

Introduction

In my previous posts we talked about Terraform, Desired State (and used a Compute Resource as an example) and Importing resources in a follow-up post. However VM/Host Groups, VM/Host Rules and VM Overrides are also part of the Cluster configuration.

Unlike the resource vsphere_compute_cluster where the state of a large number of options is automatically monitored by Terraform, this is not the case for the options discussed in this post.
Here it’s what you see is what you get. If you use Terraform, then it is more than a good practice to let Terraform take care of everything.
By this I mean the following;
if you have created an affinity rule with Terraform for 2 VMs called VM1 and VM2, then the state of this affinity rule is monitored by Terraform. But if you manually create a second affinity rule for 2 VMs called VM3 and VM4, then this second affinity rule is unknown to Terraform and its state is not monitored. Should this occur, it can of course be solved by importing the second rule into the Terraform configuration afterwards.

Having said this, time for an overview of these cluster resources.
In vSphere Compute Clusters, 4 types of VM/Host rules can be created:

Keep virtual Machines together: Terraform resource:
vsphere_compute_cluster_vm_affinity_rule

Separate Virtual machines: Terraform resource:
vsphere_compute_cluster_vm_anti_affinity_rule

Virtual Machines to Hosts: Terraform resources:
vsphere_compute_cluster_host_group
vsphere_compute_cluster_vm_group
vsphere_compute_cluster_vm_host_rule

Virtual machines to Virtual machines: Terraform resource:
vsphere_compute_cluster_vm_dependency_rule

vSphere Compute clusters also allow you to create Overrides for individual objects. Terraform has 3 separate resources; to add a DPM override to a cluster for an ESXi hosts and to add DRS and HA overrides for virtual machines.
vsphere_dpm_host_override
vsphere_drs_vm_override
vsphere_ha_vm_override

Before showing examples, a few words about the flow of work. If you are working in a greenfield (that is adding resources), the flow is:
– add new resource(s) to the Terraform configuration file,
– run terraform plan and
– run terraform apply.

If you need to import existing cluster rules, the flow is a bit more complicated:

  • Add new resource(s) to the Terraform configuration file. In some cases you can start with a basic configuration and add additional code after importing the resource
  • Import the new resource
  • Run terraform plan to detect missing parts
  • Update the configuration file in case of errors or missing parts
  • Run terraform plan again to check the configuration. If everything went well, terraform plan end with “No changes”

BTW, if you want to import existing cluster rules, you can run a PowerShell script like this to gather info.

Continue reading

Terraform and vSphere – Part 3: Import Resources

In the previous post “Terraform and vSphere – part 2: DSC”, we ended with a cluster created by Terraform called “Cluster-02” and showed that despite minimal configuration, all settings can be monitored by Terraform.

However, there is also a cluster “Cluster-01” in this Datacenter that was manually configured at the time. We would also like to manage this cluster with Terraform, together with our new Cluster.
Fortunately, Terraform offers an option for this too, how that works we will show in this post.

In the previous post, we ended with code for creating a cluster called Cluster-02. Before proceeding with adding Cluster-01, we check the current configuration by running the following command:

$ terraform plan

It should report: No changes
Now add the following lines to the code, these three lines is the minimum for starting the import of  cluster Cluster-01.

 
# Existing Cluster to be imported
resource "vsphere_compute_cluster" "compute_cluster1" {
  name            = "Cluster-01"
  datacenter_id   = data.vsphere_datacenter.dc.id
}

Continue reading

Terraform and vSphere – Part 2: DSC

Desired State monitoring with Terraform?

Some time ago, I wrote a post about Terraform and vSphere. In this post, I showed how appliances (.ova files) can be deployed with Terraform. Another use case for Terraform is deploying virtual machines from templates.
But with this, we would almost forget the most important use of Terraform, deploying vSphere Infrastructure, in Terraform terminology resources like Clusters, vSwitches, Datastores and more. In doing so, I discovered an interesting feature of Terraform.
I’ve long been interested in configuration management for vSphere, see older posts on Vester and DSC Resources for VMware.

In a nutshell, Configuration Management is a systematic process for setting and maintaining the configuration of a resource over its lifetime.
In my experience, maintaining it during its lifetime is the trickiest part.
And that is where Terraform differs from other tools I have seen in recent years. As an example, we compare the configuration of a Compute Cluster in a vCenter Server using “DSC Resources for VMware” (see the example in this post) on the one hand and Terraform on the other.
A simple DSC configuration for creating a Cluster may look like this:

Configuration DCKoedood {
    Import-DscResource -ModuleName VMware.vSphereDSC -ModuleVersion 2.2.0.84

    vSphereNode $AllNodes.NodeName {

        Datacenter "DCKoedood" {
            Name = 'DCKoedood'
            Location = [string]::Empty
            Ensure = 'Present'
        }

        Cluster "Cluster-02" {
            Name = 'Cluster-02'
            Location = [string]::Empty
            DatacenterName = 'DCKoedood'
            DatacenterLocation = [string]::Empty
            Ensure = 'Present'
            HAEnabled = $true
            DrsEnabled = $true
            DependsOn = "[Datacenter]DCKoedood"
        }
   }
}

Continue reading

Terraform and vSphere – Part 1

Recently, I decided to delve into Terraform. A good starting point are Pluralsight’s courses like the “Terraform – Getting Started” by Ned Bellavance. Terraform is an open source tool in the field of Infrastructure-as-code, usefull for the deployment and configuration of datacenter infrastructure, using a declarative configuration language. Terraform comes as a single executable for Linux. Mac and Windows, the power lies in the so-called providers (additional pieces of software) allowing for the management of a huge number of resources, ranging from well known public cloud providers like AWS and Azure as well private clouds like VMware vSphere.

There are already several excellent articles written to get you started with Terraform in a vSphere environment, I’d like to refer to this post by Luke Orellana. This article helps you deploy templates from an existing Windows or Linux VM into a vSphere environment.

A note: the terraform file still uses the old notation, example:

 
data "vsphere_network" "network" {
  name          = "VM Network"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

The new notation is slightly clearer:

 
data "vsphere_network" "network" {
  name          = "VM Network"
  datacenter_id = "data.vsphere_datacenter.dc.id
}

By the way, with the following command, code can be automatically converted to the new notation:

PS> terraform fmt

Continue reading

DSCR for VMware 2.2

Over the past few years I have devoted several posts to configuration management of vCenter Server and ESXi. At that time I also reviewed one of the first versions of DSC Resources for VMware. At the time, I was not undividedly enthusiastic, especially with regard to security aspects.

In February 2021 the latest version 2.2 was released and a lot has changed. Besides support for PowerShell 5.1 and 7.0, there is now also support for PowerShell Core on Linux.

The best improvement in my opinion is that the developers have made good use of the Invoke-DSCResource cmdlet introduced by Microsoft that allows DSC resources to be executed without having to use the PowerShell LCM engine. This eliminates the need for the Windows proxy server (also one of my objections). Cmdlet Invoke-DSCResource is part of the new module PSDesiredStateConfiguration.

Based on these new capabilities, VMware has made available the module Vmware.PSDesiredStateConfiguration. Looking at the contents of this module we see the following features:
Get-VmwDscConfiguration, New-VmwDscConfiguration, Start-VmwDscConfiguration and Test-VmwDscConfiguration. In these we recognize the three basic DSC functions: Test, Set (Start) and Get.

Another interesting enhancement, available only for PowerShell 7, is vSphereNode. vSphereNode is a keyword that represents a connection to a vCenter Server. A configuration can contain one or more vSphereNodes. The advantage, with a normal DSC Resource Server and Credential properties must be declared for each DSC, vSphereNode uses a connection set up with the familiar Connect-VIServer cmdlet to a vCenter Server. This, in my opinion, makes the configuration much more manageable. Here are examples of configuration with and without vSphere Nodes.

Continue reading

Intel NUC – boot from iSCSI LUN

A few hours after a brand new USB flash drive failed for the second time and one of my vSAN NUC nodes couldn’t boot, I came across VMware KB 85685, titled “Removal of SD card/USB as a standalone boot device option”. The message in this KB was clear, time for another way to boot the NUCs. The expansion capabilities of NUCs are limited, both disks are already in use, that leaves Auto Deploy or boot from an iSCSI target. I decided to try the latter option. The work consists of 1. configuring the iSCSI targets (easiest part) and 2. Configuring the NUCs correctly (somewhat more difficult). After some searching and experimenting I found the desired solution, including VLAN configuration.

First step is creating the iSCSI targets. You will need a target for each ESXi host. In this example I will show how the targets are created on a Synology NAS.

Continue reading

Skyline Health Detector

Skyline is VMware’s proactive self-service support technology available to customers with an active Production Support or Premier Services contract, based on using Skyline Collector and Skyline Advisor. But there is more, in the latest vCenter Server 7.x edition you will also find Skyline Health. Skyline Health is built-in in vCenter Server, no additional installation required. To make use of Skyline Health, you must participate in the Customer Experience Improvement Program (CEIP) to use online health checks and vCenter Server must be able to reach the Internet. Skyline Health will run about 136 health checks and present the results grouped in categories. While browsing the results, the “Self support Diagnostics” section caught my attention, in particular the “VMware Skyline Health Diagnostics”.

According to the documentation; “VMware Skyline Health Diagnostics (SHD) is VMware’s self-service diagnostics platform. It uses product logs to detect problems and provides recommendations in the form of KB articles or steps to remediate them. A vSphere Administrator can use this tool to troubleshoot before contacting the VMware Global Support Service.”

SHD can detect issues in vCenter Server, ESXi and vSAN. Some of the benefits of SHD:
1. SHD runs on-prem, it can also work offline without any internet connectivity.
2. Based on the detected symptoms, the tool provides correct VMware Knowledge Base articles/remediation steps.
3. Get recommendations for a problem from VMware support services.
4. Early recommendations and remediation helps business continuity.

Continue reading

Check your Internal Certificates!

In the past year I have experienced two incidents in which important applications were no longer available. In both cases the cause turned out to be an expired internal certificate. Although these incidents can be solved using KB articles, the lesson is to check these critical components at least once a year. With the start of a new year, this is a good time to pay attention to this topic. First vRealize Operations Manager (vROPS).

vROPS

Expiration of vROPS internal certificate has the following symptoms:
– Unable to log into the Admin UI.
– The cluster is Offline and you are unable to bring it Online with the message “Data Retriever is not initialized yet. Please wait.”.

The procedure to Replace expired internal certificate in vRealize Operations can be found in this KB.

The best way to check the validity of the certificate is using a browser; connect to the vROPS master node over port 6061 and check the validity of the certificate.

You can also run the following command, I don’t consider this useful because the script doesn’t return an expiration date.

 
 
# /bin/grep -E --color=always -B1 'java.security.cert.CertPathValidatorException: validity check failed|java.security.cert.CertificateExpiredException' $ALIVE_BASE/user/log/*.log | /usr/bin/tail -20

For both scenarios; “Certificate has expired” or “Certificate has not yet expired”, the procedure for replacing the expired internal certificate is described. The procedure for “Certificate has expired” is unfortunately a bit more cumbersome to perform.

The KB mentioned before, states that “Starting in vRealize Operations 8.0, a pop up is displayed in the UI, warning when certificate expiration will occur.”. But bettter safe then sorry and perform the check on a regular interval.

vCenter Server

Another product that comes with internal certificates is vCenter Server. After expiration of the STS certificate, you cannot login to vCenter Server anymore. In some cases (see KB below for more details), the STS certificate has a lifetime of only 2 years!

VMware KB Checking Expiration of STS Certificate on vCenter Server (79248) is there to help you to identify the expiration date. Attached to the KB, you will find a Python script named checksts.py. Follow the instructions and run the script. In my case (recent vCSA 7.x), no actions are needed.

However, in case the STS certificate is expired, you will find instructions for replacing this certificate for the vCSA or a vCenter Server on Windows.

In VMware KB “Signing certificate is not valid” error in VCSA 6.5.x/6.7.x and vCenter Server 7.0.x (76719) you will find instructions and another script named fixsts.sh for replacing the STS certificate.

Step 5 and 6 in the resolution are important, restart of services may fail if there are other expired certificates. Step 6 presents a one-liner to do the check.

The first certificates are about to expire in July 2022.
Also in this case references to KB’s to replace these certificates using the vSphere Certificate Manager. Also be aware that you may encounter the situation as described in VMware KB “Failed to login to vCenter as extension, Cannot complete login due to an incorrect user name or password”, ESX Agent Manager (com.vmware.vim.eam) solution user fails to log in after replacing the vCenter Server certificates in vCenter Server 6.x (2112577).

If you are on vSphere 7 (and some editions of vSphere 6.7), there is an even more convenient option.
One of my colleagues (thank you Joop Kramp) discovered this pre-configured vSphere Alarm
In more recent versions (at least in vSphere 7).

Hopefully these checks will help you to avoid unexpected downtime of important management applications in your vSphere environment.

As always, I thank you for reading.

vSphere LCM – updating Images

With the release of vSphere 7.0, the vSphere Update manager has been transformed as vSphere Lifecycle Manager. The biggest improvement is the introduction of managing clusters with images besides the familiar baselines concept. This post by Steven Bright is an excellent introduction about the concept of images and how to set up. Of course there is also the official VMware documentation about LCM.
So after reading Steven’s post, I followed instructions, created an image for ESXi 7.0 update 1 – 16850804 added the USB Fling as an additional component and upgraded the NUCs in my home lab without any problem.
Recently ESXi 7.0U1a and 7.0U1b were released, that raised the question how to update the Image?

Screenshot taken after the upgrade…

Under Cluster > Updates > Hosts > Image, next to the EDIT option, is the “Check for recommended images”. This reports no new images to download, so time for something else.

First download the ESXi depot file (I use https://my.vmware.com/group/vmware/patch#search for searching and downloading patches), in my case: Vmware-ESXi-7.0U1b-17168206-depot.zip.

In the vSphere Client go to: Lifecycle Manager, under Actions select Import Updates and import the depot file. After a few moments, the latest updates will appear under Image Depot, ESXi versions. As updates are cumulative, note that also update 7.0 U1a is now available.

Update 7.0 U1a and 7.0 U1b added…

Now go back to: Cluster > Updates, select Image and select EDIT. Now under ESXi Version the latest updates are available. Select the version of your choice (in my case 7.0 U1b).

Select the new ESXi version …

Note that the other components are still in the image and (if needed) can also be changed. If everything is completed, press VALIDATE, to validate the image, if everything is OK, then SAVE the image.

Almost immediately, LCM will notify you that the cluster is not compliant and you can start the remediation.

As Always, I thank you for reading.

Home lab refresh

Since 2010, my VMware home lab was running on two servers; a HP ProLiant ML 110 G5 and a ML 110 G6 . First the G5 was taken out of active duty because of its 8 GB memory limit. Fortunately, it was possible to upgrade the memory of the G6 from the supported 16 GB to 32 GB, so the G6 remained usable quite some time, for labs with a vCenter Server and 3 virtual ESXi hosts. Recently, it became too tedious to run the latest vSphere editions.

A home lab is a valuable resource for various reasons. When preparing for a VMware exam, like the Datacenter VCP, you can practice installation and configuration of ESXi, vCenter Server, but also other tools like NSX, vROPS or LogInsight. A home lab is also useful for investigations which cannot be done at work in a production environment, to practice changes or upgrades and last but not least break and fix (one of my favorite use cases and highly educational).

If you want to practice with vSphere and other products, there are several options, which mainly depend on available budget, but also on other factors. The possibilities vary from a lab-in-the-cloud such as VMware Hands On Labs to VMware Workstation or a 19-inch rack filled with servers and switches. In my situation, decisive factors were limited space (I live in an apartment) low noise production and low energy consumption and the requirement to run a nested ESXi cluster with tools like LogInsight and vROPS. For a full vSphere 7 plus Kubernetes lab, however, a reasonable amount of hardware is required!

The old and the new, small but powerful

After some searching on the Internet you will soon come across the Intel NUCs, although not mentioned on the official VMware HCL, beloved by the community, see here and here.

Intel NUCs currently support 64GB of memory. The tenth generation is besides an i3, available in an i5 (4 cores) and an i7 (6 cores). My choice fell on the i5 (budget). Intel NUCs come with a processor, but without memory and disk(s), the final composition can be found on my Gear page.

The set-up of the Intel NUCs is not difficult, on the previously mentioned blogs of Virtuallyghetto.com and Virten.net you can find enough information for a successful installation.

The NUCs are installed with the latest ESXi 7.0 and are managed by a vCSA. To support the deployment of vSphere 6.7 and 7.0 labs, I use two Windows domain controllers (DNS and DHCP), a Windows scripting host and a pfSense firewall. For the deployment of the labs I gratefully use the nested ESXi appliances and the deployment scripts as provided by William Lam. With this a complete environment will be available in no time.