Thursday, March 3, 2016

vROps Backup - DO NOT quiese the nodes

vRealize operations manager is quickly becoming the backbone of your SDDC and it is important that you always have a sound backup methodology for vROps to ensure that you can recover from a disaster.

The backup of vRealize Operations Manager cannot be done through a regular agent based backup. VMware supports only a specific way to backup the appliance. You need to use vStorage API for Data Protection (VADP) to have a successful backup. This backup methodology leverages the vSphere snapshot mechanism which allows the base disk of a virtual machine to be backed up while the current I/O is written on a redo log. Once the backup is completed, the redo log is committed to the base disk and you have a successful backup.

It is important to remember that there are some basic guidelines given by VMware pertaining to the backup of vROps and it is highly recommended that they are followed in order to avoid any outages on vROps and a proper restore in case of the disaster.

At this time I would like to point you to this document.

Quoting from this document:

To minimize vRealize Operations Manager downtime and data loss in an event of failure, back up on a regular basis. In this way, if your system fails, you can recover it by restoring to the last full or incremental backup.
You can backup and restore vRealize Operations Manager single or multi-node clusters by using vSphere Data Protection or other backup tools. You can perform full, differential, and incremental backups and restores of virtual machines.

All nodes are backed up and restored at the same time. You cannot back up and restore individual nodes.

Be aware of these prerequisites when you back up vRealize Operations Manager systems by using any tool:

Disable quiescing.
Verify that all nodes are powered on and are accessible while the backup is taking place.
Be aware of these guidelines when you back up vRealize Operations Manager systems by using any tool:

Use a resolvable host name and a static IP address for all nodes.
Back up the entire virtual machine. You must back up all VMDK files that are part of the virtual appliance.
Do not stop the cluster while performing the backup.
Do not perform backup while dynamic threshold (DT) calculations are running because this might lead to performance issues or loss of nodes.
It is important to note that Quiescing Should be disabled to ensure that you do not kill the Gemfire layer which is an in-memory database in vRealize Operations.  A few days back I wrote about a whitepaper which which helps you configure the backup of the entire vCloud Suite using Netbackup.
I would highly recommend you follow the principles defined in this white paper to have a successful backup and more importantly a successful restore of vRealize Operations Manager.

In case your backup software does not have a way to create a backup policy with Non-Quiesced snapshot, you can use the steps mentioned on the following guide to disable quiescing at the Virtual Machine level. 

Special thanks to my colleague Dean Ravenscroft for pointing out to this limitation of some of the backup software and the workaround. 
Share & Spread the Knowledge...


  1. Can we backup only the vRops Customization ?

    1. You can backup dashboards if they want, but in most cases the one which have customized for them tie to a resource UUID, which would change with a fresh install of vROps and hence the custom exports would be unusable.