Friday, January 31, 2014

Part 5 - Architecting vSphere Networks: A Scoop From my vForum Prezo!

This is the fifth part of the series of articles I have been writing about “Architecting vSphere Environments”. For those who have not read the other parts, I would highly recommend you go through them to understand the context of this entire series and to pick up key learning's which I have gained during design & implementation of VMware vSphere Infrastructures for VMware customers. Here is the list:-

Maintaining the context from my previous articles, I would continue to stress on areas which are mostly ignored either due to high complexity or due to lack of skill sets. Also as the title of this article suggests, I will speak about key considerations around designing Networks for a vSphere Infrastructure. In my humble opinion, a single blog article cannot do justice to plethora of networking gotchas which you need to be aware of, hence I would expect that the reader of this article would have basic understanding of the concepts of Networking in a Virtual Environment. Needless to say that you should understand the traditional physical networks.

Let me make your’ s & mine task simpler by putting up a slide here which talks about 4 major things which you need to remember. These are my Do’s & Don’ts and are areas which are mostly mis-configured and mis-designed in most of the environments I have seen. Let’s have a look at the slide first and then we will discuss these options in detail.

Network Layout

The first part on the slide speaks about laying out the vSphere Virtual Networks appropriately. The best and the worst part about doing this is that there is no right or a wrong way. This is one of the areas of vSphere which is entirely dependent on the Requirements & Constraints. While you need to ensure that you meet the requirements, you cannot step out of the boundaries created due to constraints. Let us see as to what are the key considerations you need to worry about.

  • One Role Per PortGroup – Using each port group for a single role would be an ideal scenario while designing networks for a vSphere environment. In essence, you should have one VMkernel (Management) PortGroup each for Management Network, Secondary Management Network (if available), FT, vMotion & vSphere Replication (this option is still not active in vSphere, although can be seen through vSphere Web Client). This will ensure that you assign individual VMK IDs to each function which not only makes your network settings simple but easy to troubleshoot as well. For Virtual Machine Port Groups you have to use separate PortGroups for easy identification and VLAN tagging. Although with a VLAN trunk option available on dvSwitch you can use a single PortGroup for multiple segments (rarely used).
  • Use of VLANs – Use of VLANs is not uncommon in a vSphere infrastructure. This is the easiest method to make sure that you can run multiple network segments on a single physical wire. I have seen a very few environments who do not use VLANs & in most of such cases it is the lack of networking knowledge. Have seen servers with up to 22 NIC ports and the same number of network cables going out of the server which ofcourse is INSANE. The simple reason behind this crazy setup was no use of VLANs. Last but not the least, don’t forget to trunk the VLANs in the physical switch ports and define the same on the virtual port group. Use of VLAN1 (native VLANs) is another crime which should never be committed :-)
  • Standardize VMkernel interfaces across ESXi hosts – This one is the simplest but the most ignored best practice of all. Each VMK defined on a host should be identical to the VMK defined on other hosts in the same ESXi cluster. For e.g. if VMK0 is Management on one host, it should be Management on all the other hosts in the cluster. Same applies to all the VMkernel interfaces. This will ensure that your network configuration is easily readable. Another benefit is seen when you are writing scripts to manage network settings of your ESXi cluster. Try to follow this practice for IP addressing of management functions as well. This makes your life easy when you are troubleshooting during those unplanned downtime.
  • Redundant NICs, dedicated FT Network, appropriate Load Balancing Policies & appropriate NIC failover ordering are other few configurations which can make or break the networking stack of a vSphere Infrastructure.

Enterprise Plus Licensing – Do you use a dvSwitch? 

If you are one of many who have Enterprise Plus licensing on vSphere, but are not using the Distributed Virtual Switch, you my friend are wasting your investments made on the vSphere Platform. The only reason for vSphere to be the number one hyper-visor in the world is the amazing features that comes with this platform. Distributed Virtual switch a.k.a. DVS is definitely one of them. In my experience every 6 out of 10 enterprise plus licensed customers I see are still running on VSS (Virtual Standard Switch). Although VSS might meet your requirements, it does not have the intelligence of a DVS. While managing an environment with a DVS is simple due to a centralized control plane, features like health check & backup/restore are priceless. Health Check ensures that any mis-configuration is automatically rolled back which reduces the risk of human errors which by the way is the single biggest reason for most of the issues you will see in any IT infrastructure. DVS also allows you to Backup itself and restore it in-case you need to replicate environments or re-build from scratch.

While I am pushing all my customers to dvSwitch, I would highly recommend you plan a move as well in-case you are still on the VSS. While you have a freedom of moving all the PortGroups to dvSwitch, some people prefer to keep the management networks (Heartbeat/FT/vMotion etc.) on VSS while they move the Virtual Machine PortGroups or Data traffic to DVS. I completely support this philosophy as well, especially when you do not have any redundancy for your vCenter Server. This allows you to play around with the network settings of an ESXi hosts by directly logging into it using the vSphere Client, even if the vCenter Server is down and not recoverable. Although it is a bad situation to be in, but certainly not unrecoverable.

Read the following knowledge base article from VMware which is a great source for clearly laying out the differences between Standard & Distributed Virtual Switches - Overview of vNetwork Distributed Switch concepts.

Convergence is the way forward - Move to 10G Networks

I know for a fact that many might challenge this thought. However, in my experience and a number of projects I have worked on, I see that customers with a converged infrastructure are more on ease and peace of mind as compared to the one who have their network all over the place.

While I say this, I would only encourage you to make this move if you have planned IT budgets, as this would be an overkill if you want to transform as in most of the cases your old hardware would be a waste. With 10G, you have options of either using the convergence with the management platform available from the hardware vendors or use features like Distributed Virtual Switches / Network I/O Control to manage the bandwidth requirements of your workload.

Here are a couple of articles which I wrote around using 10G cards for vSphere Networking. Would urge you to read them to see how things become simple with implementation of converged networks.

Some More Quick Tips Around vSphere Networking
  • Be a miser while provisioning vSwitches, Portgroups or for that matter any virtual hardware. Please remember they might appear to be FREE but they do use CPU Cycles and Memory since they are nothing but a piece of code which runs as soon as you create a new object.The idea is not to stop you from provisioning, but to make sure that you do not provision what you do not require. This also helps when you are trying to troubleshoot issues.
  • vMotion has been around for a while, but the options to configure vMotion have always been evolving. Use these options to ensure that you have Fast vMotion Networks which will allow you to load balance clusters, evacuate hosts & run maintenance tasks way faster then what you have been doing. Use Multi NIC vMotion to do this.  There are a number of articles on Multi NIC vMotion which you can find on by Frank Denneman which I refer to. Go use them, they are awesome :-)
  • This is another important tip which is organizational and operational in nature. You need to involve your Networking team while choosing, designing & implementing your vSphere Infrastructure. The more the networking experts are involved, the better would be your networking stack supporting your virtual infrastructure. Also, as I see convergence in the Datacenter, I also see convergence in the IT teams. Networking & Storage teams are getting trained on vSphere and Wintel/VMware teams are getting trained on the networking and storage platforms. This change is welcoming and my suggestion to you would be follow this change ASAP so that you, your teams and your organization is able to cope with this paradigm shift.

With this, I will close this article. Hope this helps you make the right choices while choosing the network configuration for your infrastructure. Please share your comments, thoughts & feedback around this series.

Share & Spread the Knowledge!!

Follow on Twitter -     @Sunny_Dua

LinkedIn - LinkedIn@duasunny

No comments:

Post a Comment