Wednesday, November 26, 2014

Creating a vROps Capacity Dashboard for your Virtual Desktop Infrastructure!

As a part of the GUEST BLOGGERS initiative on vXpress, here is the second post by Anand Vaneswaran. In this article he is giving us a dope on using vRealize Custom Dashboards to showcase the Capacity of Virtual Desktops managed by Horizon View and running on the vSphere platform.

With this post, he has used the concept of capacity management in vROps mostly used with server infrastructure and applied the same to Virtual Desktops. Hence this will provide you with a 360 degree view into the capacity of your Virtual Desktop Infrastructure using vRealize Operations Manager. Here is what Anand has to say:

In my previous post, I provided instructions on constructing a high-level “at-a-glance” VDI dashboard in vRealize Operations for Horizon, one that would aid in troubleshooting scenarios. In the second of this three-part blog series, I will be talking about constructing a custom dashboard that will take a holistic view of my vSphere HA clusters that run my VDI workloads in an effort to understand current capacity. The ultimate objective would be to place myself in a better position in not only understanding my current capacity, but I better hope that these stats help me identify trends to be able to help me forecast capacity. In this example, I’m going to try to gain information on the following:

comes to RAM utilization. I always want to leave a little buffer. So my utilization factor will be 80 percent or .8.
want to incorporate this N+1 cluster configuration design in my formula.

·       Total number of running hosts
·       Total number of running VMs
·       VM-LUN densities
·       Usable RAM capacity (in a N+1 cluster configuration)
·       vCPU to pCPU density (in a N+1 cluster configuration)
·       Total disk space used in percentage


You can either follow my lead and recreate this dashboard step-by-step, or simply use this as a guide and create a dashboard of your own for the most important capacity metrics you care about. In my environment, I have five (5) clusters comprising of full-clone VDI machines and three (3) clusters comprising of linked-clone VDI machines. I have decided to incorporate eight (8) “Generic Scoreboard” widgets in a two-column custom dashboard. I’m going to populate each of these “Generic Scoreboard” widgets with the relevant stats described above.




Once my widgets have been imported, I will rearrange my dashboard so that the left side of the screen occupies full-clone clusters and the right side of the screen occupies linked-clone clusters. Now, as part of this exercise I determined that I needed to create super metrics to calculate the following metrics:

·       VM-LUN densities
·       Usable RAM capacity (in a N+1 cluster configuration)
·       vCPU to pCPU density (in a N+1 cluster configuration)
·       Total disk space used in percentage

With that being said, let’s begin! The first super metric I will create will be called SM – Cluster LUN Density. I’m going to design my super metric with the following formula:

sum(This Resource:Deployed|Count Distinct VM)/sum(This Resource:Summary|Total Number of Datastores)




In this super metric I will attempt to find out how many VMs reside in my datastores on average. The objective is to make sure I’m abiding by the recommended configuration maximums of allowing a certain number of virtual machines to reside on my VMFS volume.

The next super metric I will create is called SM – Cluster N+1 RAM Usable. I want to calculate the usable RAM in a cluster in an N+1 configuration. The formula is as follows:

(((sum(This Resource:Memory|Usable Memory (KB)/sum(This Resource:Summary/Number of Running Hosts))*.80)*(sum(This Resource:Summary/Number of Running Hosts)-1))/10458576




Okay, so clearly there is a lot going on in this formula. Allow me to try to break it down and explain what is happening under the hood. I’m calculating this stat for an entire cluster. So what I will do is take the usable memory metric (installed) under the Cluster Compute Resource Kind. Then I will divide that number by the total number of running hosts to give me the average usable memory per host. But hang on, there are two caveats here that I need to take into consideration if I want an accurate representation of the true overall usage in my environment:

1)     I don’t think I want my hosts running at more than 80 percent capacity when it
2)     I always want to account for the failure of a single host (in some environments, you might want to factor in the failure of two hosts) in my cluster design so that compute capabilities for running VMs are not compromised in the event of a host failure.  I’ll

So, I will take the result of my overall usable, or installed, memory (in KB) for the cluster, divide that by the number of running hosts on said cluster, then multiply that result by the .8 utilization factor to arrive at a number – let’s call it x – this is the amount of real usable memory I have for the cluster. Next, I’m going to take x, then multiply the total number of hosts minus 1, which will give me y. This will take into account my N+1 configuration. Finally I’m going to take y, still in KB, and divide it by (1024x1024) to convert it to GB and get my final result, z.

The next super metric I will create is called SM – Cluster N+1 vCPU to Core Ratio. The formula is as follows:

sum(This Resource:Summary|Number of vCPUs on Powered On VMs)/((sum(This Resource:CPU Usage|Provisioned CPU Cores)/sum(This Resource:Summary|Total Number of Hosts))*(sum(This Resource:Summary|Total Number of Hosts)-1))



In this formula, I want to know my vCPU to physical core ratio.  Now, its great to know this detail under normal operational circumstances when things are fine and dandy, but what would happen in the event of a host failure? How would that affect the vCPU-pCPU ratio? To that end I want to incorporate this condition in my super metric. My formula will attempt to find out the overall number of vCPUs on my powered-on VMs and divide that number by my total number of hosts minus 1 (for N+1), multiplied by the number of physical cores per host.

The next super metric I will create is called SM – Cluster HD Percent Used (Datastore Cluster). This is for my full clone VDI Clusters, which make use of the datastore clusters feature. The formula is as follows:

sum(This Resource:Capacity|Used Space (GB)/sum(This Resource:Capacity|Total Capacity (GB) * 100



This formula is fairly self-explanatory. I’m taking the total space used for that datastore cluster and dividing that by the total capacity of that datastore cluster. This is going to give me a number greater than 0 and less than 1, so I’m going to multiply this number by 100 to give me a percentage output.

Once I have the super metrics I want, I want to attach these super metrics to a package called SM – Cluster SuperMetrics.




The next step would be to tie this package to current Cluster resources as well as Cluster resources that will be discovered in the future. Navigate to Environment > Environment Overview > Resource Kinds > Cluster Compute Resource. Shift-select the resources you want to edit, and click on Edit Resource.


Click the checkbox to enable “Super Metric Package, and from the drop-down select SM – Cluster SuperMetrics.



To ensure that this SuperMetric package is automatically attached to future Clusters that are discovered, navigate to Environment > Configuration > Resource Kind Defaults. Click on Cluster Compute Resource, and on the right pane select SM – Cluster SuperMetrics as the Super Metric Package.




Now that we have created our super metrics and attached the super metric package to the appropriate resources, we are now ready to begin editing our “Generic Scoreboard” widgets. I will tell you how to edit two widgets (one for a full-clone cluster and one for a linked-clone cluster) with the appropriate data and show its output. We will then want to replicate the same procedures to ensure that we are hitting every unique full clone and linked clone cluster. Here is an example of what the widget for a full-clone cluster should look like:



And here’s an example of what a widget for a linked-clone cluster should look like:




Once we replicate the same process and account for all of our clusters, our end-state dashboard should resemble something like this:



And we are done. A few takeaways from this lesson:

·      We delved into the concept of super metrics in this tutorial. Super metrics are awesome resources that allow you the ability to manipulate metrics and display just the data you want to.  In our examples we created some fairly involving formulas, but a very simple example for why a super metric can be particularly useful would be memory. vRealize Operations Manager displays memory metrics in KB, but how do we get it to display in GB? Super metrics are your solution here.

·       Obviously, every environment is configured differently and therefore behaves differently, so you will want to tailor the dashboards and widgets according to your environment needs, but at the very least the above examples can be a good starting point to build your own widgets/dashboards.

In my next tutorial, I will walk through the steps for creating a high-level “at-a-glance” VDI dashboard that your operations command center team can monitor. With most organizations, IT issues are categorized on a severity basis that are then assigned to the appropriate parties by a central team that runs point on issue resolution by coordinating with different departments.  What happens if a Severity 1 issue happens to afflict your VDI environment? How are these folks supposed to know what to look for before placing that phone call to you? This upcoming dashboard will make it very easy. Stay tuned!!

Thanks once again to Anand for sharing his experiences around usage of vROps for monitoring Virtual Desktop Environments. Please leave your comments, thoughts or questions if you have any. To be a guest blogger at vXpress see this page.


SHARE & SPREAD THE KNOWLEDGE :-)