The importance of good data / How to set-up a baseline document?


Lately I’ve been working on machine learning and more specifically the Python Scikit library.
What I especially learned from this is the need to have a good
data-set before you want to do any kind of analysis or prediction.

But what does that have to do with subjects I usually write about? In the past period I have blogged regularly about configuration drift and tools like Vester and DSC resources for VMware.
We are also working on this within the company where I work.
Recently the assignment came to set up a baseline for the vCenter Server Appliances – you can’t solve configuration drift without thinking about the desired values, so time for a baseline. Apparently this seems simple, a baseline is a finite list of key-value pairs with the setting on one side and the value on the other side. In practice this seems a bit more complicated. I have to add that this baseline is not meant for a single vCenter Server, but for quite a few.

To get started, after connecting to a vCenter Server, the following command produces an overview of all settings for that vCenter:

PS> Get-AdvancedSettig -Entity <vCSA FQDN or IP>

Next to the fields Name and Value, you will also get the Type (of the Value) and sometimes a brief Description. Since vSphere 6.5 and up, you can also collect many appliance related settings using the API.
Now you can think, of all vCenters, collect the settings, set the desired values and done! In practice, however, there soon seemed to be some obstacles, such as:

  1. Not all vCenters are on the same version. Settings come and go. Some settings from vSphere 6.5 have disappeared in version 6.7, new settings have been introduced in version 6.7 and 7.0.
  2. Sometimes a setting exists, but returns an empty string. This is not equal to a setting that does not exist.
    Why worry about a setting with an empty string? What if, for whatever reason, a value does appear at any time?
  3. Not all settings are actually settings, but contain (status)information. We want to filter these out from our Configuration management tooling.

The baseline was created using PowerShell and the PowerCLI. The first step is to collect the settings of all vCenters as described above. The result is a .csv file for each vCenter. Incorporate the name of the vCenter in the filename like “vc01.csv”.

Read the rest of this entry »