Wednesday, April 29, 2015

Part 14: Can I deploy vROps Cluster Nodes Across Two Sites?

YOU CAN, BUT YOU MAY NOT!!

This might be the weirdest way to begin a blog post but I wanted to put this word out, loud and clear! Since the release of vRealize Operations 6.0, we all know that vROps can be deployed in a cluster architecture with multiple nodes which bring in resiliency and scalability to the solutions by just adding a node to an existing cluster. I have discussed this architecture and the benefits of the same in an article before



In that article I have also highlighted that you should deploy vRealize Operations Manager Cluster within the same site as that is the requirement for deploying the solution. Since then, their have been a number of occasions when I have been asked this question as people want to use the cluster architecture to their advantage and deploy the solution across sites to ensure that the cluster is alive even if one of the sites completely fails.

This across site cluster deployment will not work if it is more than a 2 NODE cluster, because vROps nodes randomly distribute data to the nodes in the cluster while storing a copy of the same in some other node, there is no way that those nodes will always be in different sites always (you cannot control that). Hence imagine you have a 4 node cluster and the N1 and N2 are in Site A while N3 and N4 are in Site B. A metric collected by N1 can be replicated as a copy on N2, N3 or N4 and you do not have any control over it, hence if a metric is copied to N2 and the Site A fails, then you will lose that data, hence this solution cannot be a across site solution.


While I say that a 2 Node cluster would work, I MUST HIGHLIGHT that it is not supported by VMware hence you should ideally not deploy a vROps Cluster across sites.

I have had arguments where the definition of the site is quite vague and I have been told that the sites are within the same campus etc. VMware did not define the site during the first release of vROps 6.0. With the release of vROps 6.0.1, they missed mentioning this in a document, however this was later highlighted in the RELEASE NOTES of 6.0.1 as a documentation bug. Let's have a look at what the release notes say:



The technical reason behind the less than 1 ms and atleast 1GB bandwidth is how GEMFIRE works. The in-memory database layer used in the analytics layer in vROps runs on a product called Gemfire. Gemfire cannot withstand latency which is more than a millisecond and might result into failure of the cluster nodes or corruption of data. 

If you wish to provide HA to the vROps solution then deploy the nodes in the same site. For remote sites, use remote collectors to capture data. For providing Disaster Recovery, use the backup and restore solution. While Site Recovery Manager sounds like another solution to provide DR, the solution at this moment is not certified to work with vROps. I think SRM would be a practical solution to provide disaster recovery capabilities and the day it happens you would no longer have to worry about deploying cross site vROps Clusters. I must also add that VMware should be looking at making this a practical solution as well since this would help the Metro-Cluster type of deployments which are popular in Near-DR scenarios.

Let's wait and watch as to what is in store for the future!!

Till Then : Please deploy what is supported..


Share & Spread the Knowledge!!




5 comments:

  1. Hi Sunny,

    as vROps 6.1 was released recently and the backend database was changed from Gemfire to Cassandra, I'm wondering if a deployment of a two-node vROps Cluster across two data centers is now officially suppported since there is no mention in the release notes anymore about the deployment restriction within the same data center?

    cheers,

    Ronny

    ReplyDelete
  2. Hey Ronny,

    This still remains the same. No Support across sites still.

    Regards
    Sunny

    ReplyDelete