Check_MK and vSphere – ESXi

This post is the second part in a series about Check_MK monitor and vSphere. In the first part Check_MK was introduced and some basic Installation and Configuration was explained.

According to the documentation, for monitoring VMware ESXi and vCenter Server, Check_MK has implemented a plugin that uses the vSphere API for retrieving information. This plugin is much more efficient than versions based on the Perl API.

So let’s start and see what can be revealed. To get a clear understanding of the various options, I will perform a step-by-step configuration instead of ticking all options at once.

The first step is to go into WATO and add an ESXi host. Under WATO, choose, Hosts and New Host.

2016-08-10-01Figure 1

You must at least enter the Hostname and an IP address, the Alias is optional. Under Agent Type place a tick and select “No Agent”.

At this time, the result is not very exciting; the ESXi host will be pinged.

2016-08-10-02Figure 2

To enable the advanced monitoring, in WATO configuration go to Host & Service Parameters \ Datasource Programs and select Check state of VMware ESX via vSphere.

2016-08-10-03Figure 3

Create a new rule by clicking the button Create rule in folder. Leave it on “Main directory”.
Enter a descriptive name for the new rule.

2016-08-10-04Figure 4 – rule options

In the next section, some information must be provided.

2016-08-10-05Figure 5 – Check state

  1. You need a username and password to access the ESXi host. In this example the root account is used. It’s better to create a dedicated account with only read rights. For this POC, the root account is OK.
  2. In most cases no need to change the TCP port number, you can also configure certificate checking and set a value for a connection timeout.
  3. Option to retrieve information about; Host Systems, VMs, Datastores, Performance Counters and License Usage.
    Each of these items will be detailed. We start with only the first option Host Systems.
  4. Display ESX Host power state on. Options are: The queried ESX system (vCenter / Host), The ESX Host or The Virtual Machine.
    For now, we leave this option: The queried ESX system (vCenter / Host).
  5. Display VM Host power state on. Options are: The queried ESX system (vCenter / Host), The ESX Host or The Virtual Machine.
    For now, we leave this option: The queried ESX system (vCenter / Host).
  6. Type of Query. Options are: Queried host is a host system, Queried host is the vCenter or Queried host is the vCenter with Check_MK Agent installed.
    For now, we select: Queried host is a host system.
  7. Placeholder VMs are “empty” VMs as used in Site Recovery Manager. Place a tick here.

The Conditions section, we need to specify the host name(s) we want to apply this rule to.

2016-08-10-06Figure 6 -conditions

  1. Specify the host name exactly as used in the Hosts section.

Do not forget to Save your work.

Now return to the Host we have created and change the Agent Type to: Check_MK Agent (Server) and finish with Save & Test.

2016-08-10-07Figure 7

In the Diagnostic window, we can comfortably test the new settings. If everything goes we must now also see output in the Agent section.

2016-08-10-08Figure 8

The Agent section now returns output. From here click the button Services and observe the new checks that are available now.

2016-08-10-09Figure 9

A brief overview of the checks:

  • esx_vsphere_hostsystem.cpu_usage and esx_vsphere_hostsystem.mem_usage will check CPU and memory usage of the host.
  • esx_vsphere_hostsystem.maintenance will have status Critical when host is in Maintenance mode.
  • esx_vsphere_hostsystem.multipath shows the status of the paths to the Datastores.
  • esx_vsphere_hostsystem.state shows the power state of the host.
  • esx_vsphere_objects shows general information about the host.
  • esx_vsphere_objects.count shows the number of VMs present on host.
  • esx_vsphere_sensors shows status of all hardware sensors of the host. Only failed sensors will be shown.

After pressing Automatic Refresh, these checks will be added. See Figure 10.

2016-08-10-10Figure 10

Note: in this example you can see the ESXi host has a rather high memory usage. The Critical state on Multipath storage adapter vmhba32 is caused by a misbehaving USB device which has vSphere ESXi installed.

Now we return to WATO to Host & Service Parameters \ Datasource Programs and select Check state of VMware ESX via vSphere and edit (green pencil) the rule we have just created. Now under Check state of VMware ESX via vSphere, place a tick near Virtual Machines. This results in enabling the check esx_vsphere_object for the VMs on the host. At this time no other changes here.

2016-08-10-11Figure 11

When checking the Services, the new available services will be shown. Now the VMs will show up here. Notice that “Powered Off” VMs have a Warning status.

2016-08-10-12Figure 12

The state of the VMs will now be presented in the ESXi host object, which is not ideal in every scenario as there may be a good reason for powering off some VMs.

The final result is shown in Figure 13.

2016-08-10-13Figure 13

But wait there is more. After finishing the initial installation of the monitoring server, I have also installed a Check_MK monitoring agent on a few Windows servers and added to the monitoring host.

After activating the Virtual Machines option, on each of these Windows servers 8 new Check Plugins have been added.

2016-08-10-14Figure 14

A brief overview:

  • Esx_vsphere_vm.cpu shows the number of vCPU and current demand by the VM.
    This check always returns an status “ OK” .
  • Esx_vsphere_vm.datastores shows which datastores are in use by the VM.
  • Esx_vsphere_vm.guest_tools shows if VMware Tools are installed and current.
  • Esx_vsphere_vm.heartbeat shows the heartbeat status of the VM
  • Esx_vsphere_vm.mem_usage shows the Memory consumption for the VM and also Guest, ballooned, private and shared memory.
  • Esx_vsphere_vm.name shows the name of the VM as used in vSphere inventory.
  • Esx_vsphere_vm.running_on shows on which ESXi host the VM runs.
  • Esx_vsphere_vm.snapshots shows if the VM has snapshots.

After adding these new services the result looks like this:

2016-08-10-15Figure 15

In the Services overview of the ESXi host, the powered down VMs still have a Warning status. To resolve this issue do the following; again return to WATO, Host & Service Parameters \ Datasource Programs, select Check state of VMware ESX via vSphere and edit the rule. Change Display VM power state on to “The Virtual Machine”.

2016-08-10-16Figure 16

In the Services overview of the ESXi host, the VMs are gone, only the Object count shows the total number of VMs (Powered On and Powered Off) on this host.

2016-08-10-17Figure 17

The Check Plugin esx_vsphere_objects and esx_vsphere_objects.count have now been added to the VMs.

2016-08-10-18Figure 18

Resulting in:

2016-08-10-19Figure 19

As Check esx_vsphere_objects.count is not that informative, we can leave this check out of the configuration.

Now it’s time to add the next feature. Checking the option Datastores, results in adding one Check plugin in the ESXi host: esx_vsphere_datastores.

2016-08-10-20Figure 20

This results in showing the status of all VMFS and NFS datastores available on the ESXi host. This check monitors the usage of disk space per Datastore. It also shows trends and overcommitment of the datastores.

2016-08-10-21Figure 21

No changes in the guest VMs this time.

Next are the Performance Counters. Checking this option adds 4 new Check Plugins to the ESXi host.

2016-08-10-22Figure 22

A brief overview of the checks:

  • esx_vsphere_counters, shows a Summary of all Read and Writes (I/O) of the ESXi host on the datastore level. This check is always OK.
  • esx_vsphere_counters,diskio, shows a Summary of all Read and Writes of the ESXi host on the disk level. This check is always OK.
  • esx_vsphere_counters,if shows for each NIC the interface speed, link status, errors and bytes in/out.
  • esx_vsphere_counters,uptime shows the uptime of the host.

2016-08-10-23Figure 23

Adding these services results in:

2016-08-10-24Figure 24

No changes in the guest VMs

Now, we will add the final Check Plugin. With License Usage comes one new check plugin esx_vsphere_licenses. This Check Plugin shows information about the license installed on the ESXi host.

2016-08-10-25Figure 25

2016-08-10-26Figure 26

Now after activating all Check Plugins, this is the result for the ESXi host.

2016-08-10-27Figure 27

In the Services overview, Services with a “graph” icon present extra graphical information. So by not activating all available check plugins at once we have a better understanding of the functionality.

All what we have done so far is right out-of-the-box. Check_MK has many, many options to further tweak the configuration. In the third part of this series, we will add a vCenter Server on Windows and a vCenter Server Appliance to our configuration.

As always, I thank you for reading.

 

Advertisements

5 Responses to Check_MK and vSphere – ESXi

  1. eduardz says:

    Hello,

    Can I check for Disk IO (esx_vsphere_counters,diskio) from vCenter appliance, not a Windows vCenter; so no agent installed?

    There is a Manual Check on Wato “Levels for disk IO” that has this plugin: esx_vsphere_counters.diskio.

  2. marco says:

    Hi Paul,

    amazing post, I sincerely thank you for that it was very useful for me saving lot of time.
    Just a question:

    is possibile to retrieve storage adapter (aka HBA) usage? This has always been something I tried to figure out in the past using other monitoring systems (nagios, zenoss etc…) but I never figure it out and, in my opinion, it would be very very useful. Whithout it I’m forced to switch to virtual center that instead, provides this informations (write rate, read rate, read latency, write latency)

    Do you have any idea?

    • paulgrevink says:

      Hello Marco,

      Thanks for your feedback, I really appreciate. Concerning your question, at his time, apart from HBA multipathing status, other information does not seem available out of the box. I guess the only way to collect HBA statistics and present in Check_MK is by writing some kind of a custom check.

      Best regards,

      Paul

      • marco says:

        What a pity! It would be very useful to have this informations; anyway this plugin is really great.
        Now I have some other topic that I would discuss with you:

        1) Create a clear view of vcenter and its esxi hosts
        I added my VCSA and its ESXi hosts and now clicking on “All hosts” I see all of them but in this way it is not really clear how the cluster is composed (vcenter + esxi hosts). Did you create a service group or hostgroup for each cluster?

        2) Vsphere resource usage
        I know that this is a quite complex topic but, as extreme synthesis (based on my experience), I would say:
        Within a vSphere cluster, it isn’t so much important to monitor CPU or memory usage of single hosts (I won’t enter into details because to much complex); what it does really matters is the overall consumption at the cluster level and there are also considerations about metrics to monitor such as active memory vs consumed or, at vm level, cpu ready etc..
        To do that, there are specialized tools such as vrealize or solarwind but, using Nagios, what I did in the past is to combine active active checks as you described, with traps sent from vcenter (cpu ready on virtual machines, datastore latencies, network connectivity issues etc..). But, for example I disabled cpu/memory checks for ESX hosts and used just those related to virtual machines
        Could you please briefly describe what you did using check_mk?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: