VMware vSphere - What is the correct way to right size a virtual machine?? (Situation 1: Sizing for an RFP Response)
Being in the field and working with a lot of VMware customers and partners, I often get questions about how to size my virtual machines. I often answer this question in different ways as the answer to this topic is quite subjective and the approach to look at this issue would change based on various situations.
Disclaimer:- These sizing methodologies are based on experiences and best practices, hence there is nothing right or wrong, this is scenario based, hence use this as a guideline and not a thumb rule.
For ease of understanding, let me break down the discussions into different situations. In this article I will speak about the first situation.
Situation 1: Sizing for an RFP response::
This section is for consultants who work on responding to RFP's which have scope of Server Virtualization and Consolidation. Usually in a RFI or RFP situation, the amount of information given by customers is limited. On the basis of this limited information, the customers expect you to size and price the target virtual infrastructure. This situation can be divided into 2 scenarios:-
Scenario A- The inventory of existing servers is available. Such inventory includes data points like Server Count, CPU / Cores per Server, Provisioned Memory, Provisioned Storage etc. This data may or may not include the utilization data for these servers. Usually customers pull out this inventory using some existing tool in the infrastructure such as a ITM, SCOM etc.
Lets see an example which will help is size a single server with this information:-
The current physical server has the following data available:-
Server Name - WEBROLE1
CPU Specs - 2 CPU, Dual core, 2.4 GHz
Memory - 8 GB
HDD - 500 GB
OS - Windows 2003 Standard Edition
Utilization data (Assuming it is available)
Avg CPU Utilization - 5%
Avg Memory Utilization - 8%
Disk Utilization - 47%
Now lets look at the Average utilization in the real terms:-
Total CPU MHz available on the server = 2 CPU x 2 Cores x 2400 MHz = 9600MHz
Average used in MHz = 5% of 9600 Mhz = 480 Mhz
Total Memory available on the server = 8192 MB
Average used in MB's= 8% of 8192 MB = 656 MB
Total Disk available on the server = 500 GB
Used in GB's = 235 GB
So now if you look at the utilization we get the following specs
CPU = 480 MHz
Memory = 656 MB
Disk = 235 GB (Actual Used)
Is this enough to size the target machine?? The answer is a BIG NO.
Though I said there are no THUMB RULES, there is one RULE which we should always consider.
"Never Size on Average Utilization" - "Always Size on Peak Utilization"
Hence if we have the the average utilization data, the target virtual machine CPU & Memory size CAN NOT be derived using the values as described above.
In such a scenario we will either use the PEAK UTILIZATION values, if given by the customer, else we would have to go into an assumption mode based on which you would size the VM.
Assuming the values given the example above are PEAK Utilization values then target utilization of the virtual machine would be:-
CPU = 480 MHz
Memory = 656 MB
Now, it is always a best practice to add a buffer to the Peak Utilization values. Hence add a buffer of 25% to the Peak Utilization Values:
CPU = 480 MHz + 25% = 600 MHz
Memory = 656 MB + 25% = 820 MB
There are 2 reasons why we added this 25%:
i) The peak utilization data is a single peak point collected, however there could be multiple peak points across business cycles which we needs to address, hence a buffer is always good.
ii) Its good to have some head room for situations where the memory utilization shoots up due to a misbehaved service, process, application etc.
Now lets consider the other factors which might influence this sizing:-
i) We should look at the OS requirement as well to ensure that we do not go below the requirement while right sizing the virtual machine. Hence in this case, for a Windows 2003 Standard, Microsoft recommends a minimum of 2 GB of RAM, hence the vRAM for the above example would trim down to a minimum of 2 GB.
ii) For number of vCPU's for the above example, we see that we need only 600 MHz of processing power to suffice the workload need, however we should ensure that we meet the minimum OS requirement to derive the required clock speed. At the same time the decision to introduce vSMP (more than one vCPU) in a virtual machine is a tricky one. Ensure to minimize the number of vCPU's to as low as possible to ensure that we are not introducing the factor of "CPU Ready TIME" on the virtual machine. For situations where the number of vCPU's on a ESXi server are more that the underlying hardware cores, Time Sharing is introduced. This time sharing can lead to CPU Ready time in order to schedule CPU threads during contention. For more on this read Performance Best Practices for VMware vSphere 5.0. This will help you understand why more is not always better. So in the above case since the server role was using multiple cores in the physical world, I would give it 2 vCPU's. Since this is an RFP stage, you might not have the privilege to speak to an application owner, hence please clearly state your assumptions and reasons for taking this value.
Hence, with the above example, we were able to size the target virtual machine's CPU and Memory Requirement. The Disk requirement is an open factor and most of the RFP's will have information on how would the customer want to work with the storage space of the servers during and post Virtualization. Using the above method you can size for all the physical machines which are available to derive the total CPU and Memory Requirements to run the workloads post Virtualization.
Phewww.....
Coming back to where we started!! What if you don't have the Peak Utilization? How do you size in such case??
The answer is simple, increase you buffer window from 25% to 35%. The additional 10% would be used to cater the needs of peaks in the workloads. As always, mention this clearly in the assumptions.
I know what you are thinking... What if the customer does not give you the utilization data at all.. This might sound like a GREY area, but trust me, I have encountered such cases very regularly. In such situations, you need to take the industry assumption that any physical server which is assessed has an average utilization of 2% to 15% (This will hold good for 95% of physical servers). Hence I would recommend that you take the highest value i.e. 15% and add a buffer of 35% (to size for Peaks + Head Room) as explained above. Based on this you would end up with a 50% utilization. Though this sounds on a higher side, you would have your chances to slim down this number during the due diligence phase of any engagement.
I guess the last possibility for an RFP would be around a scenario where a customer would not give you any data at all. The statement or the end goal of the customer might just say that he wants to Virtualize 1000 physical servers. That brings us to the Scenario B.
Lets understand this by using an Example:-
XYZ LTD. has floated an RFP with a goal to virtualize 1000 Physical servers in his datacenter. The total usable capacity of the storage being used in 10 TB.
With this information, we need to find out the compute requirements of the virtual machines running in the end state. Lets see how we break down this issue:-
First we would have to assume sizes of the virtual machines which might come up in the end state. Taking an assumption we divide the virtual machines in his environment into 3 categories based on sizes:-
Table - 1
Virtual Machine Type | # of vCPU's | vRAM (GB) | |
Large | 8 | 16 | |
Medium | 4 | 8 | |
Small | 2 | 4 | |
After this we should look to break down the environment into 4 different categories if not mentioned by the customer:-
Table - 2
Workload Type | Category | Percentage |
Platinum | Business Critical Workloads | 10% |
Gold | Production Workloads | 40% |
Silver | Infrastructure Workloads | 20% |
Bronze | Test/Dev/UAT | 30% |
Based on the table above if we divide the 1000 servers into the defined categories, we will get the following values:-
Table - 3
Workload Type | Category | # of Servers |
Platinum | Business Critical Workloads | 100 |
Gold | Production Workloads | 400 |
Silver | Infrastructure Workloads | 200 |
Bronze | Test/Dev/UAT | 300 |
Now these tables should go into your assumptions and the numbers can absolutely change based on the data given to you by the customer. if I have to define the Work load type, here are some examples of each workload type:-
PLATINUM - Production Databases, ERP, SAP, Email & Collaboration, Other Customer apps etc.
Gold - Web Servers, Applications (packaged or custom), vCenter, CRM etc.
Silver - Directory Services, Anti Virus, DHCP, DNS, Security, Patch Management etc.
Bronze - Test, UAT & Dev environments.
The next task is to give a Virtual Machine Type to a Workload Type. In simpler terms, what would be the size of the Platinum, Gold, Silver or Bronze virtual machines. We would need to take some assumptions here to derive these numbers. The assumptions I have taken are listed in table - 4. Your table might look different if you have some information from the customer about what is the distribution of sizes across workloads.
Table - 4
Workload Size Percentage | |||
Workload Type | Large | Medium | Small |
Platinum | 50% | 25% | 25% |
Gold | 30% | 35% | 35% |
Silver | 20% | 40% | 40% |
Bronze | 30% | 20% | 50% |
This will give you the numbers you are looking for. Lets calculate for the these 1000 servers and see how many vCPU's and vRAM we need for each category mentioned above.
Table - 5
Workload Size Percentage | ||||
Workload Type |
Large Servers |
Medium Servers | Small Servers | Total Servers |
Platinum | 50 | 25 | 25 | 100 |
Gold | 120 | 140 | 140 | 400 |
Silver | 40 | 80 | 80 | 200 |
Bronze | 90 | 60 | 150 | 300 |
_____________ Total Servers |
__________________ 300 |
_____________ 305 |
___________ 395 |
___________ 1000 |
Since we now know the size per category, here is total amount of vCPU & vRAM requirement as per Table -1.
Now comes the biggest questions, how many vCPUs per Core? Can we over-commit memory? Have you not faced such questions before? I bet you have. Here is a guideline which I tell the customers to follow:-
Table - 6
So lets see how much physical RAM and Cores we need for XYZ LTD's physical servers to Virtualize:-
Now if your target server is a 2 CPU 10 cores Blade with 96 GB of RAM, lets see how many such blades you would need to consolidate these 1000 Physical servers.
Total Number of Servers required = 1617 / 20 = 81 Servers approx.
The total physical RAM we get is = 81 x 96 = 7776 GB
Hence to Virtualize these 1000 Physical workloads, we would need 81 ESXi servers giving us a handsome consolidation ratio of 1:13 (approximately). Voila.. we are done here...
Though Licensing is a different topic, but you would need 162 vSphere 5.0 Enterprise Plus licenses to license these ESXi servers.
I hope this helps you size your solutions correctly and respond to those RFP's with more accurate and sensible information. I will talk about other situations in my upcoming articles. Till then..
Happy Virtualizing......
Regards
Sunny Dua
Workload
Type |
Total vCPUs | Total vRAM (GB) |
Platinum | 550 | 1100 |
Gold | 1800 | 3600 |
Silver | 800 | 1600 |
Bronze | 1260 | 2520 |
Now comes the biggest questions, how many vCPUs per Core? Can we over-commit memory? Have you not faced such questions before? I bet you have. Here is a guideline which I tell the customers to follow:-
Table - 6
Workload Type | vCPU to Core Ratio | Memory Ratio |
Platinum | 1.75 vCPU for 1 Core | 1 GB vRAM for 1 GB pRAM |
Gold | 2.5 vCPU for 1 Core | 1 GB vRAM for 1 GB pRAM |
Silver | 3 vCPU for 1 Core | 1.4 GB vRAM for 1 GB pRAM |
Bronze | 4 vCPU for 1 Core | 1.6 GB vRAM for 1 GB pRAM |
So lets see how much physical RAM and Cores we need for XYZ LTD's physical servers to Virtualize:-
Workload Type | Physical Cores |
Physical RAM (GB) |
Platinum | 315 | 1100 |
Gold | 720 | 3600 |
Silver | 267 | 1143 |
Bronze | 315 | 1575 |
Total | 1617 | 7418 |
Now if your target server is a 2 CPU 10 cores Blade with 96 GB of RAM, lets see how many such blades you would need to consolidate these 1000 Physical servers.
Total Number of Servers required = 1617 / 20 = 81 Servers approx.
The total physical RAM we get is = 81 x 96 = 7776 GB
Hence to Virtualize these 1000 Physical workloads, we would need 81 ESXi servers giving us a handsome consolidation ratio of 1:13 (approximately). Voila.. we are done here...
Though Licensing is a different topic, but you would need 162 vSphere 5.0 Enterprise Plus licenses to license these ESXi servers.
I hope this helps you size your solutions correctly and respond to those RFP's with more accurate and sensible information. I will talk about other situations in my upcoming articles. Till then..
Happy Virtualizing......
Regards
Sunny Dua