Monday, August 21, 2017

Demystifying vRealize Operations Data Collection

Around 3 years ago I wrote an article explaining the data collection process of vRealize Operations Manager. In that article, I wrote about the path followed by the data from source to destination. Fast forward 3 years and I still get questions about vROps data collection process and granularity.

People have different notions about this, so let me share the secret sauce:

vRealize Operations Manager Out of the Box, is configured to collect data every 5 (FIVE) minutes form the source. The collector wakes up every 5 minutes and gets last 15 samples of 20 seconds interval from the source.


15 Data Points X 20 Seconds = 300 Seconds ( 5 minute collection cycle)


Once the 20 samples are in, we go ahead and average them out to come up with the value which is saved in the FSDB (File System Database) of vROps. In parallel, operations such as threshold checking, computed metric calculation etc. take place.

As we all know you can further transform this data in views and super metrics by using functions such as Min, Max, Standard Deviation etc. All these functions run on the data saved in FSDB which is the 5 minute data point explained above.

These transformations are NOT ran on the 20 seconds samples. The 20 seconds samples infact are used to do calculations of the 5 minutes data points and are dropped as soon as that calculation is done. This reduces the storage requirement drastically.

Also, it is possible to change this default collection cycle from 5 minutes down to 1 minute, however it is not recommended to do that unless you understand the overall impact on Compute and Storage for increased collection and processinh.

Hope this helps!!




7 comments:

  1. Hello,

    Thanks for the Article.

    But I am confused about the Statement that the collection cycle can set to 1 minute.

    I tried that for specific vSphere Object without success.
    See also:
    https://communities.vmware.com/thread/565824

    Best regards
    Markus

    ReplyDelete
    Replies
    1. I tested that in my vROps environment running 6.6 and I was able to default all objects int he adapter to 1 minutes without any issues. I will look at the community article.

      Delete
  2. Hello,

    could you please describe how data collection is working for 1 minute interval?

    Thanks a lot.

    Regards

    Martin

    ReplyDelete
    Replies
    1. This blog has the steps. It's working fine for me. - https://blogs.vmware.com/management/2016/06/customizing-vrops-creating-relationships-and-changing-collection-intervals.html

      Delete
    2. Hello Sunny,

      it's not a problem to set it up, I would say that this is working as expected. My issue is that I'm seeing missing datapoints during certain events(VM restarted by vSphere HA). From VMware support I got info that 1 minute collection interval is not recommended configuration and it can sometimes lead to missing data points. I would believe that if I would see random missing data points it could be the case, but currently I'm seeing them only during situation when VM is restarted by vSphere HA. What I would like to achieve is to be able to monitor VM availability and when using 5 minutes collection interval I would say this kind of issues is propably never visible. VM gets restarted most of the time in less than 2 minutes and if averaging of values is working as you described and value is 0 or 1, expected result in this case will be 1, means VM was available(even if it was powered off during this issue). On the other side when I'm using 1 minute collection interval, I'm seeing missing data points, so no 0 or 1.

      What I would like to know is if there are for every collection interval(1, 5, 10 minute) still 15 collection samples or there are always samples based on 20 seconds.

      Thanks a lot.

      With regards

      Martin

      Delete
    3. There data samples based on 20 seconds since vCenter collects at that frequency. Hence in one minute it would be 3 - 20 seconds data points.

      On the issue of when VM restarts due to HA, I believe the vCenter loses access to the VM as it gets disconnected from vCenter during an HA event. In a disconnected state, vCenter has no way to collect that data for vROps and hence you see the missing data points. This is my assumption but GSS might be able to validate that.

      By the way for measuring uptime, you might want to see this blog if you hav'nt already seen - http://virtual-red-dot.info/vm-availability-monitoring/

      Delete
    4. Perfect, thanks for explanation.

      That makes more sense to me, I will have to look at it further.

      Iwan's blog post was my starting point, really useful source for VM availability monitoring, but even he is mentioning limitation within 5 minute interval. And that's something what I don't understand. Is there a different way how it's collected or calculated for PowerOn metric? He is stating that it doesn't matter if VM is up for 4:59, what does matter is the 5th minute.

      Delete