Troubleshooting CIM on ESXi

11/03/2020

Recently, a number of ESXi hosts were updated from version 6.0 to the latest 6.7 update. Soon after, we detected the following error message “An application (/bin/sfcbd) running on ESXi host has crashed (1 time(s) so far). A core file might have been created at /var/core/sfcb-vmware_bas-zdump.000.”. The core file was indeed created, luckily this was not a PSOD, the host was still up and running, workloads were not impacted. We also noticed that all upgraded hosts were impacted, it also became clear that after (re)booting a host, after about 24 hours the same event re-occurred, creating a new dump file.

After some digging around in the log files, searching for events at the time the dump file was created we found in the syslog.log:
“sfcb-vmware_base[2100157]: tool_mm_realloc_or_die: memory re-allocation failed(orig=400000 new=800000 msg=Cannot allocate memory, aborting”,
followed by: “sfcb-ProviderManager[2100151]: handleSigChld:166681408 provider terminated, pid=2100157, exit=0 signal=6”. This looks like some memory related issue.

As this is not an ideal situation, it was time to engage VMware support. Before we continue, some background; sfcbd stands for “Small Footprint CIM Broker (SFCB) daemon”. For performance and health monitoring ESXi enables an agent less approach using industry standards like CIM (Common Information Model) and WBEM (Web-Based Enterprise Management). At the ESXi side, there is the CIM agent, represented by the sfcbd. CIM providers are the counter part, often supplied by 3rd parties like hardware vendors. CIM providers come as .VIB files. After detecting 3rd party CIM provider, the sfcbd (with that the WBEM services) is automatically started by ESXi.

Read the rest of this entry »


Vester and DSC, a comparison

30/12/2019

Over the past couple of months, I have published several posts about Configuration drift and tools like Vester and DSC Resources for VMware. Because Vester and DSC Resources for VMware serve the same goal, let us review what these tools have in common and see some of the differences.
Some topics; general information about the tool, configuration of the tool, the tool in daily operations, performance and a summary.

Introduction

Both tools are built with PowerShell. Vester has been on the market for the longest time and dates from 2017. Vester comes as a PowerShell module and depends on two other modules; Pester and PowerCLI. Vester consists of three parts;

  • Commands that do the actual work, like creating configuration files, verifying the actual configuration and do remediation in case the actual configuration does not match the desired confguration.
  • Set of Test files. Each test file contains code that checks and applies a configuration item.
  • Config files, are key-value pairs with the desired values of the configuration items. Some examples: NTP settings, DNS servers, etc.

Desired State Configuration (DSC) was introduced in PowerShell 4 and brings a declarative model for the configuration of Windows Servers. DSC can copy files, edit the registry, install Windows features and components. After initial configuration, DSC can also test the desired configuration and if necessary perform remediation.
DSC Resources are what can be configured on a Windows server, but today not only on Windows Servers! DSC Resources for VMware was first released in December 2018. Instead of Windows servers, these resources can configure ESXi hosts and vCenter Servers, although the first edition had only a few resources. The second edition, released in June 2019 offered considerably more resources.
Both tools are available in the PowerShell Gallery and can be found in Github.

Read the rest of this entry »


Securing DSC resources for VMware

28/08/2019

Recently DSC Resources for VMware 2.0 was released. This new version comes with a lot of new resources and other features, like availability in the PowerShell Gallery. If DSC Resources for VMware is completely new,
I recommended reading the “Getting started” blog post, but do not follow the installation instructions. Instead install directly from the PowerShell Gallery, use something like this:

PS> Find-Module *VMware.vSphereDSC* | Install-Module

So after exploring “Vester”, the other DSC solution, it is now time to have a look at the DSC Resources for VMware 2.0.

Disclaimer: Windows PowerShell Desired State Configuration (from now on “DSC”) is often used for configuration management of Windows systems and as such is new to me. This post focuses on the use of DSC in a vSphere environment.

My setup;  I used an old Windows Server 2012R2 as a LCM. The vSphere environment is a VCSA version 6.5 and two ESXi hosts.
This post contains links to some script. All files mentioned in this post can be downloaded from this location. Then on the LCM, create a new folder named C:\VMwareDSC and place all the files in this folder.

One of my first goals was to understand how to create a good configuration. Luckily, the VMware DSC module contains an example folder, and I selected the VMHost_Config.ps1 configuration, an sample script for configuring an ESXi host.

Read the rest of this entry »


ESXi boot fatal error 33 inconsistent data

09/06/2018

Another quick write-up. Recently, while installing patches and rebooting ESXi hosts, I encountered the following error message during the boot process of an ESXi host: “esxi boot fatal error 33 inconsistent data”, accompanied by the filename causing the inconsistency.

A quick search on the Internet returned various useful tips, ranging from re-installation of ESXi, re-running the installer or replacing the damaged file. However, not all workarounds are applicable in all situations.

Perhaps it is too obvious, but this solution is also to be considered, especially while updating hosts;
Revert the ESXi host to it’s previous version. Now the ESXi host can boot and the installation of the patches can be continued.

This VMware KB outlines how to revert an ESXi host to it’s previous state.

If during the update process of ESXi this error shows up: “The host returns esxupdate error code:15. The package manager transaction is not successful. Check the Update Manager log files and esxupdate log files for more details
Also the esxupdate.log shows “esxupdate: esxupdate: ERROR: InstallationError: (”, ‘There was an error checking file system on altbootbank, please see log for detail.’)”, it is time to have a look at this KB.

I hope this will help. Thank you for reading.


vCSA and trusted AD sources

01/04/2018

Just a quick write up for my own convenience. Large organizations tend to have a lot of everything, from buildings and employees to Domain Controllers.
In times were Domain Controllers undergo maintenance, like an upgrade or relocation, dependent services may be impacted.
The way identity sources are configured differs per product, fortunately less often hard-coded by specifying a single domain controller, usually more flexible by specifying the AD domain.

For a vCenter Server Appliance (vCSA), additional identity sources can be configured, one commonly used is the Active Directory (Integrated Windows Authentication).

20180401-01.jpg

BTW, As a prerequisite, the vCSA should be joined to the Windows domain.

Read the rest of this entry »


vCSA, root partition is (almost) full

18/02/2018

hwA short post on a topic that I recently experienced on vCenter Server Appliance, version 6.0.
After receiving an alert that the root “/” partition was quickly filling up, it is time to act quickly. When the root partition reaches 100% of it’s capacity, service disruption can occur.
First step is to check the capacity of the vCSA partitions. Log in to the vCSA through SSH, if you are running the appliance shell, enable and access the Bash shell:

Command> shell.set --enabled true
Command> shell

In the Bash shell run this command to check the capacity of the partitions:

# df -h

The second line of the output (starting with /dev/sda3) shows the status of the root partition. If the value under Use% reaches 100%, you are in trouble. Also notice that the root partition is only 11 GB.
Second step is to determine the root cause of the full partition. A good strategy is to look for large consumers. The next command searches for files larger then 100 MB, only on the root partition:

# find / -xdev -type f -size +100M

In my case some interesting results:

/usr/lib/vmware-sca/wrapper/bin/wrapper.log
/usr/lib/oracle/11.2/client64/lib/libociei.so
/var/log/dnsmasq.log-20180121
/var/log/dnsmasq.log-20180128
/var/log/dnsmasq.log-20180107
/var/log/dnsmasq.log-20180114
/var/log/dnsmasq.log
/etc/vmware-vpx/docRoot/client/Vmware-viclient.exe

The most eye-catching files are: the wrapper.log and the dnsmasq.log files.

Read the rest of this entry »


Get-ClusterRules

16/06/2017

I recently encountered an interesting question, maybe not the one you will see every day. A vCenter Center server runs a large number of Clusters; the VMs on those clusters are controlled by a considerable number of DRS rules. The question that raised; “How do we know if the DRS rules we once designed are still in place?” In the course of time, rules can be disabled, VM or Host groups does not match any more. Trying to answer this question by going through the vCenter Server configuration is not the way to go.

Thankfully, the VMware PowerCLI contains a useful Cmdlet Get-DrsRule that enables you to create a dump of the configured rules for each cluster. This makes checking your configuration a lot easier.

But there is another thing, now we know about the configuration, but what do we know about the actual situation? For instance, VM to Host affinity has “should” and “must” rules, but to what extent is a “should” rule fulfilled?

So time to create a PowerShell script which performs the following tasks; for each Cluster within a vCenter Server, a dump of the configured DRS rule is made. The second part of the script determines on which host a VM is running and compares it to the configured rules. The script will also report if a DRS rule is disabled and displays the power state of each VM. You will probably worry less about a powered down VM.

The script can be found here on GitHub.

I am aware that the script and my programming skills are far from perfect, so expect updated versions in the future.