How to Monitor VMware Infrastructure for Free

If your IaaS backbone is formed by bare-metal machines with the industry leading VMware hypervisor, you’ve probably made this choice for a good reason – efficiency. When talking scale with massive I/O, CPU and/or memory loads, the public Cloud/SaaS offerings such as AWS Lambda are still far away from the classic bare-metal model, which yields much higher performance in a controlled and predictable environment. ESXi (product name – VMware vSphere Hypervisor), VMware’s bare-metal hypervisor, is the industry’s standard hypervisor with roughly 70%+ of the market share in this segment because of its robustness and performance. With a footprint of just 150MB it can support up to 128 vCPUs, 6TB or RAM and all kinds of OSs on top of it.

VMware vSphere is the enterprise product that adds a Virtual Center that automates the management of ESXi hypervisors and adds enterprise class features for mission-critical applications, such as the famous vMotion (ability to move a VM between ESXi hosts without downtime), the high-availability, proactive monitoring or disaster recovery features. These features, however, come at a certain price (currently at 6.5k USD for a single vSphere license, according to VMware’s website). On the other hand the ESXi hypervisor is offered for free. In case your software architecture employs stateless design, e.g. through Dockers/Kubernetes, then you can leverage the agility and robustness of stateless architectures and cut down some of the costly vSphere high availability features, such as shadow VM, vMotion, etc…. For example, in case your monitoring system shows that a container is bogged down, just kill it and start another node. From IaaS perspective this greatly simplifies your IaaS and reduces costs by just employing a number of good old and free ESXi-s…given that you have a good monitoring layer laid out.

In this post, we will review how to easily add free, yet enterprise-class monitoring layer for your light-weight VMware virtualisation tier.

What do We Mean by Syslog?

Let’s first get some terminology in place. The standard way to monitor UX systems is syslog. Syslog is an overloaded with meaning term. When people talk syslog they could mean the syslog daemon, called syslogd, that all UX systems have to collect and dump logs (usually in the /var/log directory). They could also mean the standard kernel functions for logging, called syslog. Less frequently they mean the syslog protocol (e.g. RFC-5424) for relaying logs over the network. There are also log relayers, which purpose is to receive, process and retransmit log messages; and there analysers that understand the syslog protocol, can collect large amounts of logs, extract, systemise and analyse them, even with AI predictive analytics which can help you discover failures before they occur.

Here is a simplistic representation of this ecosystem:

Exactly this kind of chain is used in the enterprise VMware vSphere monitoring feature. So our goal is to achieve a similar stack with free tools.

Choosing Your Syslog

Regardless of the transportation mean (UX domain socket for local or TCP/UDP socket for remote), the RFC-5424 syslog protocol should be observed or major loss of information could happen. Sadly in the pleiad of syslog client and server libraries, a very small number are strictly RFC-5424-compliant by default. If you don’t pick the right combination there is significant configuration and matching hustle associated.

Another important aspect is the performance – or how efficiently the local or remote syslogd can collect and relay messages. Among all contenders, syslog-ng seems to be a popular choice. However, when I led the vSphere monitoring team, we evaluated a number of syslogd libraries carefully and the open-source small and light-weight rsyslog showed much greater (40%+) efficiency, when compared to all others.

Albeit not that rich and popular as syslog-ng, Rsyslog also has all the needed features for enterprise class logging, such as throttling, flood protection, queue de/serialisation configuration for rapid-reliable log queueing, custom extensible log rotation, easily configurable and extensible log processing, etc. As a matter of fact, we were so impressed by its overall scoring that we selected it as the VMware vSphere syslogd, playing a key part in the enterprise VMware vSphere Monitoring layer. All that we need to do in order to have that “enterprise” syslogd is install a Linux flavour OS. If by some chance rsyslog is not present, install the rsyslog package as well, using your favourite package-installation tool. Here is how to do it for Ubuntu and Redhat.

Setup and Configuration

The next step is configuring rsyslogd to collect, process and store messages. This could be an altogether separate article, but luckily rsyslog has good documentation, along with example configurations for the most common case. Log filters and actions are the basic terminology. A filter specifies what messages to get while the action – what to do with them, (e.g. store them in a file). Here is a basic configuration example to get started with the log-storage and retention which, although utterly important, are not the focus of this post.

Now that we have set up a robust syslogd for collecting and relaying our logs, let’s configure the ESXi to send their logs to our remote syslogd (rsyslog). Here is the VMware article on the matter.

Finally, you may want to also relay all collected logs to some log-analysis tool or database that has a decent way to visualise the health state of your entire IaaS thanks to the logs you fed in and even predict failures based on integrated ML algorithms. Such tools are VMware’s LogInsight, the free plan of (limitations apply) Splunk, etc. In order to do that first you’ll need to add a line in the rsyslog configuration instructing it to not only store the received logs but also relay them to the remote log analyser. Here is an example syslog.conf that relays all messages to a remote syslogd or log analyser.

This is it. Now you have added a very capable, yet low-cost, monitoring layer to your infrastructure.

— About the author: Doichin Yordanov is R&D engineer with years experience in scalable enterprise infrastructure software. He has been technical lead of the VMware vSphere Log Collection and Monitoring.


How to Configure Applications Running on Virtual Machines

 

 

Using docker containers is now a standard for deploying and managing applications. With them, transfering a particular configuration or startup parameters to an application is practically trivial. But what if your app is running on a Virtual Machine? How do you pass arguments to it, if it is running on a VM’s guest OS? There are still many apps out there that either haven’t been containerised or simply require some special permissions, etc, yet are still running on Virtual Machines. Passing startup parameters to such apps may be hard because it requires a tool with OS level access. Of course, Puppet, Chef or Ansible will handle the matter with ease, but those demand the application of specific knowledge.

In this post we outline a much simpler solution using a specific example to ensure clarity. The described solution is for AWS cloud and VMware vSphere based infrastructure.

The main steps are as follows:

  1. Provision a VM that contains your application and assign metadata (Tags) to it;
  2. Read the metadata from within the VM with a simple script;
  3. Start the app using the metadata as startup parameters/arguments.

And below you can read through a more detailed description of the proposed solution.

We’re working with the assumption that we have an application that requires a few parameters when starting up. This app has to be deployed for each of our clients and one of the startup parameters is the clientID, unique for each and every client. We have a VM template (AMI in Amazon terms) and we create a new VM (EC2) instance for each client. The question is, how to pass the startup parameter (in our case the clientID) to the app?

In AWS we are going to use Amazon Metadata Service with a combination of AWS Tags API.

Amazon Metadata Service is injected in each AWS EC2 instance. It contains diverse data, which you can query, using tools like Curl on Linux. Some of the information you can get includes the EC2 instanceID, the region where the instance is running and so on. For the full list of properties see the AWS docs here.

Tags in AWS are key/value pairs and you can map them directly to the application startup parameters. Basically, you need to create as many tags as your application startup parameters are.

We said the app is inspecting a parameter called “clientID”, so you can attach a tag with key=clientID and value= when a new EC2 instance is created. Assigning the tags is the first part of the solution, because now you have the application startup parameters attached to your instances. Next, we need to read these tags and pass them to our application as arguments.

For the second part of our solution we are going to create a simple script, which will read the instance tags and then start the application, passing them as arguments.

In the example below we’re using Bash, which depends on AWS CLI and jq library. AWS CLI requires AWS credentials in order to operate. For maximum security, the AWS credentials should only be allowed to read tags.

The script retrieves the EC2 instanceID and EC region from the metadata service like this:

Reading the region:

region=$(curl --silent http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region)

Reading the instanceID:

instanceId=$(ec2metadata --instance-id)

Reading a tag:

tagValue=$(aws ec2 describe-tags --filters "Name=resource-id,Values=${instanceId}" --region ${region} | jq -r ".Tags | .[] | select(.Key ==\”clientId\”) | .Value")

Last step is to start your app and pass the actual parameters:

/pathToMyApp –clientID ${tagValue}

This approach is also applicable for Virtual Machines running on VMware vSphere infrastructure. It is possible to provide information to the running guest OS in the form of key/value pairs using the official VMware vSphere API. One can make such API calls both against vSphere vCenter Server and VMware ESXi. In addition, bindings for this API are supported and provided by VMware for a number of different programing languages.

The examples in this post use the VMware vSphere API Python Bindings available on GitHub.

The basic assumption is that you know how to get a reference to the VM on which your app is running. You have to modify the ‘extraConfig’ property of the VM object. The property is a member of the VM’s ConfigSpec. It is of type OptionValue[]. For more information have a look at the vSphere API Reference.

myKey = 'guestinfo.clientId'
myValue = 'client identifier goes here'
cspec = pyVmomi.vim.vm.ConfigSpec()
cspec.extraConfig = [pyVmomi.vim.option.OptionValue(key=myKey, value=myValue)]

Next call ReconfigVM_Task on the VM MORef like this:

vm.Reconfigure(cspec)

Note the key prefix ‘guestinfo.’! It is important to add this to the key, otherwise it can’t be read by the guest OS.

If this is executed on a powered off Virtual Machine, the key/value pair is saved in the VM’s .vmx file and survives restarts. If the info is added while the VM is running then the information will not be available after reboot.

So, how to read the property inside the Virtual Machine? We need ‘VMware Tools’ provided by VMware or, in the case of a Linux guest OS, the command ‘open-vm-tools’, which reads the info:

vmware-rpctool "info-get guestinfo.clientId"

All of these operations are already conveniently scripted on GitHub. Note that this script assumes ESXi/vCenterServer running on ‘localhost’ and does not use SSL verification, but is still a great starting point.

Now you have everything to extract the application parameters from a script running inside the VM. The final step is to start your app and pass the parameters.

This is how you can fully automate the deployment of applications packed as VMs.


VMware Cloud on AWS – A Bridge Between Private and Public Cloud

What are Hybrid Clouds?

In an IT world realigning around the Cloud, hybrid infrastructures or “Hybrid Clouds” are a way for companies to tap into the potential of both private and public infrastructures and get the best of both worlds – security, scalability, flexibility, and cost-effectiveness. They allow companies to share the computing workload of their data centres with public Clouds, run by a handful of big infrastructure corporations, such as Amazon (Amazon Web Services), Microsoft (Azure), Google (Google Cloud Platform), IBM (IBM Cloud) and others. Hybrid infrastructures are a cost-effective, highly available way for organisations to extend their data centres’ capacity, migrate data and applications to the Cloud and closer to customers, make use of new Cloud-native capabilities and create backups and disaster recovery solutions. A simple way to view hybrid computing is as having your company’s data reside both in the Cloud and on premise.

Moving workloads between different clouds

Effectively moving workloads between different clouds is a notoriously tedious and slow process. It involves accounting for virtual machines’ networking and storage configurations with the associated security policies, while converting them from one format to another. And this is just one of many challenges. Moving workloads from public back to the private Cloud is just as difficult considering management dependencies and proprietary APIs.

VMware and AWS

VMware software is entrenched in the data centers of enterprise customers (government and big companies) around the world. Enterprises that build and operate private clouds popularly use VMware’s cloud infrastructure software suite, with its server virtualization software practically ubiquitous. But with companies wanting to leverage the scale and capabilities of the Amazon public cloud, VMware realised the benefit of building a bridge to AWS. After the announcement of a partnership in 2016, with the VMware and AWS architectures being as different from each other as they are, it took more than a year to launch a solution and another six months (until VMworld 2018) for it to be fully globalised.

To launch VMware on Amazon’s cloud infrastructure AWS engineers had to change how they actually architect their data center – a massive effort, necessary to make sure that the Amazon Cloud infrastructure will be able to support the arguably best-in-class hypervisor, the VMware ESX hypervisor. To virtualise their infrastructure, Amazon’s traditionally used the open source Xen hypervisor, incompatible with VMware’s. Over the last year and a half, the company has been transitioning to a new custom distribution of KVM – a different open source hypervisor, with which compatibility won’t be an issue. In addition, practically everything in the AWS architecture has had to be modified as well – network, physical-server provisioning, security, etc.

VMware-AWS Hybrid Cloud

The basic premise is that the VMware-operated AWS-based service allows organisations who have on-site vSphere private infrastructures to migrate and extend them to the AWS Cloud (running on Amazon EC2 bare metal infrastructure), using the same software and methods to manage them. VMware-based workloads can now successfully be run on the AWS Cloud with applications deployed and managed across on-premise and public environments with guaranteed scalability, security and cost-effectiveness. Companies can take a hybrid approach to cloud adoption, consolidating and extending the capacities of their data centers and additionally modernising, simplifying and optimising their disaster recovery solutions.

AWS infrastructure and platform capabilities (Amazon S3, AWS Lambda, Elastic Load Balancing, Amazon Kinesis, Amazon RDS, etc.) can be natively integrated, which will allow organisations to quickly and easily innovate their enterprise applications. What they need to be mindful of, however, when selecting which capabilities to use, is that not all of them are available on the VMware stack. This could become an issue, should they ever decide to migrate their workloads from public cloud back to private.

Organisations can also simplify their Hybrid IT operations with VMware Cloud on AWS and leverage the power, speed and scale of the AWS cloud infrastructure to enhance their competitiveness and pace of innovation. They can use the same VMware Cloud Foundation technologies (NSX, vSAN, vSphere, vCenter Server) on the AWS Cloud without any purchase of new custom hardware, modification of operating models or applications being necessary. Workload portability and VM compatibility is automatically provisioned.

All AWS’ services, including databases, analytics, IoT, compute, security, deployments, mobile, application services, etc. can be leveraged with VMware Cloud on AWS with a promise of secure, predictable, low-latency connectivity.