vRO scriptable task to identify VMtools

One of the blog articles that gets the most hits on my blog is a article about how to get VMtools status from Powershell.  I figured I would share similar methods using vRO.  If you are wondering the Powershell method takes 10 minutes while the vRO method takes 10 seconds.  It’s pretty impressive.

First you create a scriptable task and get all virtual machines (this will be across all vCenters connected to your vRO instance)

//get list of all VM's
 vms = System.getModule("com.vmware.library.vc.vm").getAllVMs();

now the variable vms has array of VC:VirtualMachine containing all vm’s in your environment.

We now want to run through all vm’s one at a time using for each and check for two things:

  • Tools not running
  • Machine is powered on
for each (vm in vms)
{
if (vm.guest.toolsStatus == VcVirtualMachineToolsStatus.toolsNotRunning && vm.runtime.powerState.value === "poweredOn") {
         //do your reporting or action here
         System.log(vm.name + " tools is not running");
}

}

The VcVirtualMachineToolsStatus scripting class exposes a number of methods to check status including the following:

  • VcVirtualMachineToolsStatus.toolsNotInstalled
  • VcVirtualMachineToolsStatus.toolsNotRunning
  • VcVirtualMachineToolsStatus.toolsOk
  • VcVirtualMachineToolsStatus.toolsOld

As seen here:

Capture

These can be used to identify almost any status of tools.   You can then push your results into an array to use in vRO or vRA.  The full script can be found here: tools



			

vRO How to get information from Plugins

When I first started with vRO I struggled with how to use elements that the plugins discover.   It was really a struggle because I was used to PowerShell elements.   The real challenge was figuring out how to use the elements provided by the plugins.   The easiest way to explain how to mine this information is an example.  For example, I want to generate some basic information on all my datastores from all my connected vCenters.  We want to gather capacity, free space, percent free space and name.

 

We start by creating a basic workflow with and drag over a scriptable task:

script

Edit the scriptable task and you will be presented with:

Capture

This allows you to review the elements provided by the plugins to vRO.    Let’s do a quick search for datastores using the magnifying glass.  We locate a number of methods, attributes and scripting classes.

Capture

Inside here it’s important to understand the difference between Method, attribute, scripting class

  • Method – output’s an object of attributes
  • Attribute – specific data element
  • Scripting Class – Collections of objects and their available methods

For this section we are going to choose vcplugin.getAllDatastores() and choose to go to selection which will change our top pane to:

Capture

 

This tells us that this method will return an array of the object VcDatastore.   To see the individual elements in VcDatastore click on the blue VcDatastore

Capture

Each of the individual key value’s are listed you can click them for additional information.  Let’s create a array of object VcDatastore

var datastores = VcPlugin.getAllDatastores();

Now let’s identify the values I want

  • Name
  • Capacity
  • Free Space
  • Percent free

Browsing the available values in the list I locate name which is the datastore name.   Let’s loop through the object’s and write all the names out to log.

var datastores = VcPlugin.getAllDatastores();

for each (datastore in datastores){
     System.log("Datastore : " + datastore.name); 
 }

Output on my home lab looks like this:

Capture

It worked.  Now we have to locate capacity of drive.   As you browse the available fields you will notice that some fields like info and summary return objects with additional fields.  You can click the blue link to learn more about available information for example summary returns object VcDatastoreSummary which if clicked has a ton of values:

Capture

Including two fields I want capacity and freespace.  Lets make sure they are correct with some easy code inserted inside our loop:

System.log("Capacity : " + datastore.summary.capacity + " Free space : " + datastore.summary.freeSpace);

The output from the whole thing combined looks like this:

Capture

As you can see I have my required information.  The data is really not human readable so I want to create a function at the top to convert the data into GB’s.  Like this:

function convert_to_gb(size)
{
 gbsize = size/1024/1024/1024;
 return gbsize;
}

And let’s add it to our output inline:

System.log("Capacity : " + convert_to_gb(datastore.summary.capacity) + " Free space : " + convert_to_gb(datastore.summary.freeSpace));

Now my output is a lot more readable:

Capture

But wait I hate all that decimal point mess lets just round it up

System.log("Capacity : " + Math.ceil(convert_to_gb(datastore.summary.capacity)) + " Free space : " + Math.ceil(convert_to_gb(datastore.summary.freeSpace)));

Now the output looks much better:

Capture

Now we just need a percent of free space.  This one is not built in but it’s easy math.  (freespace / capacity *100 = percent free)  let’s do it inline:

System.log("Percent free : " + Math.ceil((datastore.summary.freeSpace/datastore.summary.capacity)*100)); 

The output has everything we need.

CaptureNow I understand that outputting this to log does not help you at all.  But from here you can feed this information into arrays or objects to be passed to additional workflow items.  I hope it helps you understand how to work with the methods provided.  If you want the whole script download it here: datastore

Deep Dive Multi-nic vMotion

What is vMotion?

Most of the IT world has heard of vMotion.  If this is new it’s the feature that really put VMware on the map.  It’s the ability to move a virtual machine workload between different compute nodes without interruption of the guest operating system.     Allowing you to migrate off failing hardware or update hardware without interruption to the customers machine.  Quite simply it is awesome.

How does vMotion work?

I can provide a basic overview a lot of the details are IP controlled by VMware. Normal vMotion takes advantage of the following things:

  • Shared storage between cluster members (Same lun’s or volumes)
  • Shared networking between cluster members (same VLAN’s)

The major portion of any virtual machine is data at rest on the shared storage.  The only portion of a virtual machine not on the storage is the execution state and active memory.   vMotion creates a copy of these states (memory active and execution called Shadow VM) and transfers it over the network.   When both copies are almost in sync VMware stuns the operating system for microseconds to transfer the workload to another compute node.   Once transferred to the new compute node the virtual machine sends out a gratuitous arp to update the physical switches with the virtual machines new location.  Memory and execution state are transferred over the vMotion interface.

What is storage vMotion?

Locking into shared storage became a problem for a lot of larger customers.   VMware addressed this issue by providing storage vMotion.  Storage vMotion allows a guest operating system to move between similar compute without shared storage between them or between different storage on the same compute.   The only common requirement was networking and similar execution environment (cpu instructions).   The process is similar except during the final stun both active state and final disk changes are moved.   SvMotion has two types of data : active and cold.   Active data are files that are read and writable.  Cold data applies to any data that is not currently writable.   Some examples of cold data are powered off virtual machines or parents files of the currently active snapshot (only the active snapshot is considered active).

In 5.5 storage vMotion data that is cold is moved across the management network while active data uses the vMotion network.  (If your management network and vMotion network share the same subnet then the lowest vmk nic will be used for all vMotions – which is always management – Always separate your vMotion and management traffic with VLAN’s)

In 6 the cold migration data is moved across the NFC protocol link (designated as Provisioning traffic in vSphere 6).  NFC uses the management network unless you have a designated NFC link (it can be the vMotion interface).  So design consideration create a NFC designated vmkernel nic to avoid having management used.  vMotion is used for all hot data.

Storage vMotion can be offloaded to the array when the array supports VAAI and the movement is on the same array.

What is multi-nic vMotion?

Multi-nic vMotion is the practice of using multiple nics to transfer vMotion data.  Why would you need more nics?

  • Really large memory VM’s (memory and execution state have to cross the wire)
  • Large storage vMotion jobs without shared storage (that will be across the network)
  • Long distance vMotion (vMotion across larger distance than traditional datacenter)

If multi-nic vMotion is configured correctly any vMotion job will be load balanced between all available links, thus increasing the bandwidth available to transfer data.   A single machine vMotion can take advantage of the the multiple links.  This can really help the speed of vMotions.  Multi-nic vMotion does have a cost.  The cost is if you are moving a really large virtual machine you could saturate your links.    The easiest way to understand this is with a overly simple graphic.

Autodeploy

For the sake of this explanation we have two ESXi hosts each connected via two 10Gbps link to the same network switch.   Both are configured to use both links for vMotion.  We initiate a vMotion between the source and destination.  The virtual machine is very large so the movement requires both links and load balances traffic on both links.  Lets follow the movement:

  • vMotion is negotiated between both sides
  • Lets assume that my vMotion requires 7Gbps of traffic on each link for a total of 14Gbps (not really possible but numbers used for example)
  • Source starts using both links and throws 14Gbps at the destination
  • Destination has multi-nic so it can receive 14Gbps without any major issues

This plan makes a pretty big assumption that the source and destination are both able to allocated 14Gbps for the vMotion without effecting current workload.   This is a really bad assumption.   This is why VMware introduced Network I/O control (NIOC).  NIOC provides a method for controlling outbound traffic across links when under contention.   Essentially you give each traffic type a share value (0-100) and during contention all traffic types that are active get their calculated share.   For example if I allocated the following:

  • Management 20
  • vMotion 20
  • VM 60

Assume that my ESXi host is only using management and VM traffic during a time of contention on a single 10GB link I would get:

  • Total shares (Management 20 + VM 60 = 80)
  •  Allocated bandwidth per share (10 / 80 = 0.125)
    • Management Allocated 2.5Gbps (0.125*20)
    • VM 7.5Gbps Allocated (0.125*60)

This is calculated per link not system wide.    This works really well to control traffic on the source but fails to protect the destination.   NIOC has no way to control incoming traffic.  For example:

magic

Let’s assume that the destination host is very busy and only have 1Gbps per link not in use while the source has 10Gbps available per link for the vMotion.   The source initiates the vMotion and floods the destination with 14Gbps of traffic.   Now packets are getting dropped for every time of traffic on the destination ESXi host.  This creates a critical problem.   You cannot control all sources of network traffic into your host.   In order to combat this issue VMware provided limits in network traffic.  This allows you to identify types of traffic and have ESXi throttle that traffic when it becomes too much.   This overloading is not unique to multi-nic vMotion but can be complicated quickly by the load multiple nics can provide.

 

How do I setup Multi-nic vMotion?

It is very much like iSCSI connections you setup each vmkernel interface with its own ip address and bind it to a single uplink.  So if you have two uplinks you need two ip addresses, two vmkernel interfaces for vMotion each bound to a single uplink with no fail over.  If a uplink is removed that vmkernel interface for vMotion will not be used.

Should I use Multi-nic vMotion?

This is a great question and the answer is it depends.   I personally think the configuration settings to implement multi-nic vMotion is minimal but could be a major problem in larger shops.   All of the problems with multi-nic vMotion are present with standard vMotion.   You really should consider NIOC and potentially limits for any design that is not grossly over sized.   If you plan on using multi-nic vMotion I think you need NIOC and limits at least a limit on vMotion traffic.    Let me know what you think and your experience with this feature.

 

NSX Controller forever deploying never working

I ran into an issue with NSX in the home lab where a new NSX controller was deploying and a power outage interrupted the deployment.  This left me in a state of deploying forever.  After waiting a day and being in the same state I removed the inoperable virtual machine from vCenter.  The issue persisted in NSX.

bad show

As you can see controller-18 is forever deploying.  The NSX manager command line showed it as deploying so it’s a database issue somewhere.  Since the deployment action was in place it was impossible to remove the controller and the cluster health was bad with only two controllers.   I don’t have any magic method for working with the NSX manager database (I assume it’s postgres but I really don’t know) other than the API.  So off to the API I went.   First I wanted to query for all controllers to make sure I had the correct name (ID field in picture above).   So I setup my REST connection for

https://IP_of_Manager/api/2.0/vdn/controller

no-controller

I returned that the ID is indeed controller-18.   Once I knew the controller number is was a simple delete method with the right command line:

Capture

Removal via

https://nsx_manager_ip/api/2.0/vdn/controller/controller-18?forceRemoval=True

 

After this command I queried again to confirm it was gone:

after

Since 18 was my last controller it returned nothing.   Hopefully if you have a stuck deploying NSX controller this article will help you remove it.

A way to check DEM health with vRO

I have been doing a lot of vRA so expect more articles.  One concern I had was with a distributed architecture it’s hard outside the vRA web gui to identify failed DEM workers.  I opened a case with VMware BCS (Business critical support – it’s awesome worth every penny) and my engineer Adam provided a custom solution for monitoring the DEMs.  I want to be clear this is not VMware support but does work on the 6.x versions.  I have not tested on 7 yet.   Adam created a vRO plugin that emails you when the DEM’s have failed but since it’s vRO you could open a ticket in your ticketing system or do almost anything.   I wanted to share this awesome script and the work Adam did to help me solve a problem.  Download here:

https://flowgrab.com/project/view.xhtml?id=6632669e-e721-45e3-9d0a-ac373d039f2c&download=true&download_id=DownloadTask-12705390063324520

He also runs a blog here http://scriptdeez.com/  which seems to be expired right now… I hope he resolves soon.

 

Perfect deployments of OS with automation

I have spend the last few years working in enterprise shops and enjoying the challenges they bring.   I find a number of my peers are hired for a single use case or implementation and then leave.  Staying with an infrastructure past a single implementation allows me to enjoy all that brownfield IT has to offer.   It’s a completely different challenge.   Almost everyone I talk to and everywhere I work they are trying to solve the same basic problem.  Do more with less and more automation. Everyone wants Amazon easy button without the security or off premises challenges of AWS.   In order to make it into the cloud they need organizational change and operational.  The first place almost everyone focuses is upon operating system deployments.   There are a number of models available and I though I would share some of my thoughts on them.

Cloning 

This model has been made available by VMware.  It’s a combination of creating a golden template and some guest customization.  It’s very easy to manage and produces very similar results every time during provisioning. You have to focus on core shared elements or create a template for each use.  It does have some challenges:

  • How much of our software should we load on to it?  Security software, monitoring agents etc..  How can we identify only core shared elements
  • It does not scale to lots of different templates – keeping application templates for every application kills you.  Imagine monthly updating 100 templates and ensuring they are not broken with application teams
  • It is a virtual only solution making physical machine builds manual or a different process
  • It’s a provisioning only process it has no idea of state after initial implementation

It’s a provisioning only process

This is a big problem for me with a lot of provisioning solutions not just cloning.  They do initial provisioning and not steady state of operating system.  This lack of life cycle management does not solve my brownfield issues.  Sure you have an awesome initially consistent implementation but five minutes later you are now out of sync with the initial template.   This problem has led me to configuration management in almost every shop I have worked in.   I wish that everywhere I worked was a netflix with a re-deploy the micro-service if failed model.  The truth is none of the shops I have worked in have that model.   I have monolithic multi-tier applications that are not going away this year or in the future.

Do I have a life cycle problem or provisioning problem?

Yes both.   I do not believe that the days of fire and forget operating systems are available to us anymore.   Every server is under a constant state of change from attackers to patches.  Everything changes.   Changes bring outages when assumptions are made about configuration of servers.  Early in my career I cannot count the number of outages that were cause by incorrect DNS settings or host files.   These are simple configuration items that were expected to be correct but found after an outage to be changed.    ITIL would have us believe it’s all about change management.  We need a CAB and approves to avoid these issues.   While I am all about documented processes and procedures, I have not found that most of the host file changes get done via CAB, they get changed ad-hoc or during an outage.   We have to be able to provision, configure and ensure the configuration stays.

Configuration management and provisioning

Take a look at this scenario:

  • Provisioning agent clones, provisions, duplicates a base operating system
  • Provisioning agent does initial configuration of OS (IP address, sysprep etc..)
  • Provisioning agent based upon customer select provides some unique information to configuration management that enables the understanding of server role (this is a SQL server, this is Apache etc..)
  • Provisioning agent installs configuration management agent
  • Configuration management agent checks in with configuration management system and changes all settings (both base settings and server role settings)
  • Configuration management agent continues to ensure that role and base settings are correct for the life of the server
  • Server administrator / application administrator etc uses configuration management agent to adjust settings

This model provides for initial configuration and consistent life cycle management.  It does mean your configuration management agent does the heavy lifting instead of your provisioning agent.

What about physical?

The model above also works for physical.  You have to move away from cloning and back into provisioning an operating system from PXE boot but it works very well.  Now you can provision both physical and virtual from the same cloud agent using consistent life cycle management.

What is the challenge?

For me the challenge has been whenever I discuss configuration management it gets confused with compliance management.   I believe that configuration management can and should be used for compliance management but it’s not the primary role.   Compliance is about meeting security standards.  Configuration is about ensuring configuration settings are correct and if not correcting.   I can identify compliance issued and apply the resolution via configuration management.   I can use the configuration management engine to identify things out of compliance that I have now changed to meet compliance.

vRO get all VM’s

I have been spending less time than I would like in vRO but I wanted to share some of my findings in a brief format.  Here is the code in a scriptable task that can get all virtual machines across all vCenters connected to your vRO instance.

var vCenters=VcPlugin.allSdkConnections;

 for each (vCenter in vCenters){
     System.log(vCenter.name);
     var clusters = vCenter.getAllClusterComputeResources();
     for each (cluster in clusters) {
        System.log(cluster.name);
        var vms = vCenter.allVirtualMachines
         for each (vm in vms)
           System.log(vm.name);
           //do your per vm action here
    }
}

There are better ways to gather each virtual machine but I wanted to demonstrate how to walk down the layers.  (you can just get the vm’s without getting vCenters and getting clusters I’ll show at bottom)  This code will be very familiar to PowerCLI users who do this type of action all the time.   I have included lots of system logging to help you understand the walking feel free to remove.  Some highlights are as follows

  • create an instance of the vCenter sdk called vCenters
    • From this you can call almost any vCenter sdk exposed object
  • identify vCenters one at a time into vCenter
  • identify clusters one at a time into cluster
  • identify vm’s one at a time into vm
  • Take some action on each vm

You can of course shorten this code with:

var vCenters=VcPlugin.allSdkConnections;

for each (vCenter in vCenters){
  var vms = vCenter.allVirtualMachines
    for each (vm in vms) {
      System.log(vm.name);
      //do your per vm action here
    }
}

See how that is shorter.  It’s a pretty cool feature.  One thing to remember is the returned data into vms is a object of allVirtualMachines not a text field.  It’s a multi-dimensional array of key value pairs associated with vm.   I am referencing one element of individual elements using vm.name (or single vm entity key field of name)

Enjoy and let me know if I can help.