This has come up in a number of discussions with customers and a few VMware field employees and hence I wanted to blog about this and use this as a standard answer going foward :-)
Use Case - An administrator wants to disable alerts on a ESXi host which has been put into maintenance mode in vCenter. This is to avoid any alerts on this ESXi hosts inside of vROps, while the admin wants to continue to collect the metrics on this ESXi host.
Goal - The goal is to do this automatically without any manual changes in vROps. As soon as a host goes into maintenance in vCenter, vROps should know this and should stop alerting on the host in vROps.
Solution - This can be achieved by using Custom Groups and Policy and a one time configuration.
1- Create a new policy in vROps named "Hosts in maintenance policy". This policy can be created under the default policy.
Go to Administration -> Policies -> Policy Library
2- Select the default policy and click on the + sympbol to add a new policy.
3- Give it a name and description as shown below.
4- Click on Alerts and Symptom Definitions and filter the list of aletrs with only host system alerts. We want a filtered list so that we can disable these in one go.
5- Now press CTRL + A on the keyboard to select all of them, you can also click on Actions -> Select All.
6- Click on Actions - > State -> Disable
7- Click on Save and now you can see he new policy under your default policy.
8- Create a new custom group named "Hosts in Maintenance". Use the following creiteria to dynamically add members to this custom group based on ESXi host property which vROps collects every 5 minutes.
Click on Environment -> Custom Groups -> Click on the + Sign to add a new custom group.
Make sure to select the policy "Hosts in maintenace policy" which we created earlier.
9- Click on Preview to see if you are getting results. If you have any hosts in mainteance moed in your environment, you will see results like me :-)
10 - Finally go into Administration -> Policies -> Active Policies and set the newly created policy at priorty 1 rank.
Now, as soon as you will put hosts into maintenance mode in vCenter, within the next 5 minutes they will be be discovered as being in maintenance inside vROps and they will be added into the cutom group. Once they are a part of the group, the maintenance policy with all alerts disabled will be applied to these hosts and you will not see any alerts on them in vCenter, till the time they are in maintenance. Once they are out of maintenance, they will be moved out of this group subsequently. All of this would no longer require any manual intervention, once configured.
Do note, that if you add any new alerts (in future) related to hosts, you would need to make sure that they are disabled in this policy.
Hope this helps...
Share and spread the knowledge..
Great stuff... Thanks for this post.
ReplyDeleteAnything similar to this or at least partially automated to for handling HPE OneView alerts? I presume a Maintenance Schedule would be needed because there is not always HPE maintenance when a host is put into maintenance mode.
It would be pretty sweet to have HPE components upstream of a host in maintenance mode (in a vROps maintenance schedule) to be disabled IF sysmptoms were triggered from that host. If from non-mm hosts, then send all HPE alerts.
I see Blue Medora's management pack can see these relationships.
- Child Relationships on Hosts, Parents, and Ports
Thanks
One option to let vROps know the HPE gear is under maintenance would be to have admins set a Custom Attribute on the ESXi Host System when starting maintenance. Something like an attribute "HPE maintenenace" and a value of "yes". This metric could be applied to an HPE Maintenance custom group in the same way as you did for maintenance mode in this article.
ReplyDeleteI see that there are some object relationships already in place without having the Blue Medora pack. I can see in the child direction (in the advanced Object Relationship widget), Host System -> HPE Server HW -> HPE Server Profile -> HPE Connection - > HPE Downlink Port. There are also some relationships going in the parent direction as well.
So my issue is really just to get a Custom Group to have hosts in maintenance mode AND the HPE components like in the path shown above. That would be a great start...
Thanks
Solved: The missing piece was a Super Metric to read from multiple adapter types and identify parent and child objects of host systems in maintenance mode,
ReplyDeleteWalkthrough here for a similar setup with VSAN objects:
https://vman.ch/vrops-put-host-system-children-and-parents-in-alert-blackout-when-host-is-in-maintenance-mode/