Log Insight – automating Groups and Roles

Adding Directory Groups

A little while ago I wrote about a poc how to use Ansible and a pipeline to upgrade Log Insight. Shortly thereafter, I looked at the capabilities of vRealize Suite Lifecycle Manager (from now on: vRSLCM) and did deployments of Log Insight and vRealize Operations Manager. vRLSCM can take care of some of the configuration of Log Insight, such as NTP, DNS, authentication and Cluster VIPs, in addition to the deployment. Configuration of Directory Groups and Roles, also important, is (currently) not included. If you want to automate the whole process of deployment and configuration of Log Insight, additional action is needed.
In this post, I’ll show you how to configure groups and roles in Log Insight 8.8 with PowerShell 5.x but also with Ansible 2.12.6 (a better understanding of Ansible, is one of my goals for this year and nothing is better than practice).
The starting point is the documentation of the Log Insight REST API, available at link: https: //<fqdn Log Insight or IP address>/rest-api.
The APIs presented here do have a “Supported” status. In addition, Log Insight has an even larger number of APIs with “Tech Preview” status. The Tech Preview APIs are in most cases incompletely documented.
Available documentation can be found by including the word “internal” in the link: https: //<fqdn Log Insight or IP address>/internal/rest-api.
See also my post Log Insight REST API.

Continue reading

vRSLCM – Exception while validating for Scale-Out

Recently, I have been exploring the capabilities of VMware’s vRealize Suite Lifecycle Manager (from now on vRSLCM). vRSLCM is a product for deployment, configuration, upgrading & patching, scale-up and scale-out of VMware products like; vRealize Automation, Orchestrator, Operations Manager, Network Insight, Log Insight and Business for Cloud. See this link for more information.

I usually do this by first installing the product and then running various scenarios, such as this one for Log Insight:
1. Deploy a 3 node Log Insight cluster, version 8.6.0
2. Upgrade to version 8.6.2
3. Scale-Out, by adding an extra worker node to the cluster

My first attempt adding an extra worker node to the existing Log Insight cluster ended, after choosing “ADD COMPONENTS”, with the following message:

Fig. 1

The existing Log Insight nodes were up and running, so what happened?
A log file named
/var/log/vrlcm/vmware_vrlcm.log was very useful and explains what is happening during this action, see the following lines:

2022-05-26 12:57:06.750 INFO  [pool-3-thread-20] c.v.v.l.p.v.VrliScaleoutOvaValidationTask -  -- vRLI instance :: {
  "vrliHostName" : "vRLI-1.virtual.local",
  "port" : "9543",
  "username" : "admin",
  "password" : "JXJXJXJX",
  "provider" : "Local",
  "agentId" : null
}
2022-05-26 12:57:09.817 ERROR [pool-3-thread-20] c.v.v.l.d.v.r.c.VRLIRestClient -  -- Failed to get the vRLI authentication token. No route to host (Host unreachable)
2022-05-26 12:57:12.889 ERROR [pool-3-thread-20] c.v.v.l.p.v.VrliScaleoutOvaValidationTask -  -- Exception while validating the vRLI VA OVA for Scaleout : 

Port 9543 is used while communicating with the Log Insight API, the “Failed to get the vRLI authentication token” makes it clear that communication with the primary node, named vRLI-1.virtual.local, is not possible, hence the “No route to the host”. A ping command from the vRSLCM to the primary node by hostname, yields no results and is a confirmation that the DNS registration has gone haywire.

After the DNS registration is restored, the primary node is resolvable again and the scale-out can be continued.

Bottom line, when you see this message, check DNS and/or network connectivity to the targets.

Upgrade Log Insight using Ansible and Azure DevOps

Intro

Some time ago, I demonstrated how Terraform can be applied in a vSphere environment to deploy .OVA files, like one or more Log Insight nodes. Now it is time to upgrade our Log Insight node(s), and this time we will use Ansible to do the job.

In my daily life, Ansible combined with Azure DevOps is the tool of choice to perform all types of work. This post can be seen as a demonstration how to perform an upgrade of Log Insight with the mentioned tools.

Disclaimer

  • This is a proof of concept, the code shown is a minimal viable product, no effort has been made to protect accounts and passwords. The code can be used as a basis for a production environment, but will need to be modified to meet the requirements of the organization.
  • I am also aware that you can still manually upgrade Log Insight or use tools like the vRealize Suite Lifecycle Manager.
  • Code for creating snapshots is not included, as well as other non-essentials.

Preparation

The first step is to install three Log Insight nodes version 8.4.1. After deployment of the first node, the option “Start new Deployment” was chosen and a basic installation was done, including a Virtual IP address, this is now the primary node. After this a power down was done and a snapshot was taken. The other two nodes have been powered down immediately after deployment and a snapshot has been made of these nodes as well. These nodes are later used as additional nodes to create a 3-node cluster.
Somewhat anticipating the topics to come, I found out pretty quickly that a web server is required to have the upgrade packages (.PAK files) available during an upgrade. In my case, I solved that by installing a simple web server on an existing Windows host, such as Miniweb.
Installation is unzipping the downloaded file, create a new folder under “htdocs” and copy the Log Insight upgrade packages (.PAK) files to this new folder.
Now we can start with the first question, how to upgrade Log Insight with Ansible?

Continue reading

Log Insight Duplicate Webhooks

After upgrading Log Insight to version 8.4.x, something strange happened, let me explain.
Log Insight has the ability to forward Alerts for further processing via a Webhook. For some information about usage of Webhooks, see this post.

Webhooks are configured separately and can be used hereafter in the configuration of an Alert as a notification option, whether or not in combination with email address(es).

Fig. 1 – Alert with email configured, webhook not yet selected

Continue reading

Log Insight Filtering and Masking

In this post, I want to focus on two features of Log Insight that may not be known to everyone, namely Log Filtering and Log Masking. Since I regularly work with Log Insight’s log forwarding functionality, I also want to know the impact of filtering and masking on this functionality.

Both Log Masking and Log Filtering can be found under Administration \ Log Management. Here also the Log Forwarding and Index Partitions can be found.

Log Filtering

Log filtering should not be confused with event filtering, a Log Filter drops ingested events that match defined filter criteria. Dropped events are not stored in Log Insight. The advantages of a Log Filter, saving storage space and only desired events are shown. Disadvantage, an incorrectly configured Log Filter can cause desired events to “disappear”.

To create a new Filter, click “New Configuration”. Each filter needs a Name (no spaces allowed) and a Drop Filter, consisting of a field, an operator and a value (wildcards * and ? are supported).
A Field can be Hostname, text, appname, source, facility etc.
Four operators are available; matches, does not match, starts with and does not start with.
Values are exact matches or can be combined with wildcard * (zero or more characters) and ? (zero or any single character) pre- and/or postfix.
The drop filter can be expanded with additional filter rules, together they form an AND construct.

After saving the configuration, the filter will become active. New filters are enabled by default. Disabling a filter will stop dropping events.

Note: NOT selecting a filter, will drop ALL logs!

As a first example, I created a Log Filter, to drop all VSAN related events from a host named nuc10-01. Other events originating from host nuc10-01 must pass the filter, the same goes for VSAN events from other hosts like nuc10-02 and nuc10-03.

Fig.1

Continue reading

Log Insight 8.6 permissions

With the upgrade of Log Insight to version 8.6, there is a major change in terms of user permissions as permissions have become much more granular as we were used to.

Some time ago, I wrote a post about the Log insight REST API, one reason for this was to be able to control the authorizations of the users. With these changes in permissions, it is time to see what has actually been changed.

Refresher

Let’s start with a brief refresher regarding the old situation. At the lowest level we find the permissions, which determine what a user is allowed to do within the application. Permissions cannot be associated directly with a user or group, but only through a role. Roles are separate entities and consist of a name and one or more permissions. During the creation of a new user or group, one or more roles are selected.

As you can see in Figure 1, only 5 permissions are available: Edit Admin, View Admin, Edit Shared Content, Interactive Analytics and Dashboard.

Fig. 1

Continue reading

Terraform and vSphere – Part 1

Recently, I decided to delve into Terraform. A good starting point are Pluralsight’s courses like the “Terraform – Getting Started” by Ned Bellavance. Terraform is an open source tool in the field of Infrastructure-as-code, usefull for the deployment and configuration of datacenter infrastructure, using a declarative configuration language. Terraform comes as a single executable for Linux. Mac and Windows, the power lies in the so-called providers (additional pieces of software) allowing for the management of a huge number of resources, ranging from well known public cloud providers like AWS and Azure as well private clouds like VMware vSphere.

There are already several excellent articles written to get you started with Terraform in a vSphere environment, I’d like to refer to this post by Luke Orellana. This article helps you deploy templates from an existing Windows or Linux VM into a vSphere environment.

A note: the terraform file still uses the old notation, example:

 
data "vsphere_network" "network" {
  name          = "VM Network"
  datacenter_id = "${data.vsphere_datacenter.dc.id}"
}

The new notation is slightly clearer:

 
data "vsphere_network" "network" {
  name          = "VM Network"
  datacenter_id = "data.vsphere_datacenter.dc.id
}

By the way, with the following command, code can be automatically converted to the new notation:

PS> terraform fmt

Continue reading

Log Insight node fails to start

Recently I experienced the following situation, the primary node of a
3-node Log Insight Cluster (version 8.4.1) would not start. The OS started but the loginsight service was stuck in an endless start/stop loop.

It was also not possible to log in from the primary node’s web interface. The other clusternodes were still accessible via the internal load balancer. After logging in, it appeared that the status of the cluster (Administration > Cluster Nodes) was also unavailable.

Time to set up an SSH session to the impacted node and examine the log files.

In the /storage/var/loginsight/runtime.log, I noticed this event, which does not come as a surprise:

[2021-10-26 11:03:48.462+0200]
["main"/10.11.12.13 FATAL]
[com.vmware.loginsight.daemon.LogInsightDaemon] 
[Error starting services]
com.vmware.loginsight.daemon.LogInsightDaemon$StartupFailedException:
Daemon startup failed: 
All host(s) tried for query failed (tried: /10.11.12.13:9042
(com.datastax.driver.core.exceptions.TransportException: 
[/10.11.12.13:9042] Cannot connect))

Note: 10.11.12.13 being the IP address of the failed node.

In the /storage/var/loginsight/Cassandra.log, I noticed this event:

ERROR [HintsDispatcher:1] 2021-10-26 09:52:34,026 
HintsDispatchExecutor.java:243 - 
Failed to dispatch hints file 8a6f9de8-5d96-455d-a709-3e9d54826031-1634732135629-1.hints:
file is corrupted ({})

So it seems that the hints file is corrupt. Some research on Google shows that in that case the hints file should be deleted.

The hints file is part of the Cassandra database and can be found in folder:

/usr/lib/loginsight/application/lib/apache-cassandra-3.11.10/data/hints

and has a name like: 8a6f9de8-5d96-455d-a709-3e9d54826031-1634732135629-1.hints

Continue reading

DSCR for VMware 2.2

Over the past few years I have devoted several posts to configuration management of vCenter Server and ESXi. At that time I also reviewed one of the first versions of DSC Resources for VMware. At the time, I was not undividedly enthusiastic, especially with regard to security aspects.

In February 2021 the latest version 2.2 was released and a lot has changed. Besides support for PowerShell 5.1 and 7.0, there is now also support for PowerShell Core on Linux.

The best improvement in my opinion is that the developers have made good use of the Invoke-DSCResource cmdlet introduced by Microsoft that allows DSC resources to be executed without having to use the PowerShell LCM engine. This eliminates the need for the Windows proxy server (also one of my objections). Cmdlet Invoke-DSCResource is part of the new module PSDesiredStateConfiguration.

Based on these new capabilities, VMware has made available the module Vmware.PSDesiredStateConfiguration. Looking at the contents of this module we see the following features:
Get-VmwDscConfiguration, New-VmwDscConfiguration, Start-VmwDscConfiguration and Test-VmwDscConfiguration. In these we recognize the three basic DSC functions: Test, Set (Start) and Get.

Another interesting enhancement, available only for PowerShell 7, is vSphereNode. vSphereNode is a keyword that represents a connection to a vCenter Server. A configuration can contain one or more vSphereNodes. The advantage, with a normal DSC Resource Server and Credential properties must be declared for each DSC, vSphereNode uses a connection set up with the familiar Connect-VIServer cmdlet to a vCenter Server. This, in my opinion, makes the configuration much more manageable. Here are examples of configuration with and without vSphere Nodes.

Continue reading

Intel NUC – boot from iSCSI LUN

A few hours after a brand new USB flash drive failed for the second time and one of my vSAN NUC nodes couldn’t boot, I came across VMware KB 85685, titled “Removal of SD card/USB as a standalone boot device option”. The message in this KB was clear, time for another way to boot the NUCs. The expansion capabilities of NUCs are limited, both disks are already in use, that leaves Auto Deploy or boot from an iSCSI target. I decided to try the latter option. The work consists of 1. configuring the iSCSI targets (easiest part) and 2. Configuring the NUCs correctly (somewhat more difficult). After some searching and experimenting I found the desired solution, including VLAN configuration.

First step is creating the iSCSI targets. You will need a target for each ESXi host. In this example I will show how the targets are created on a Synology NAS.

Continue reading