vXpress: VMware vSphere - What is the correct way to right size a virtual machine?? (Situation 1: Sizing for an RFP Response)

VMware vSphere - What is the correct way to right size a virtual machine?? (Situation 1: Sizing for an RFP Response)

Being in the field and working with a lot of VMware customers and partners, I often get questions about how to size my virtual machines. I often answer this question in different ways as the answer to this topic is quite subjective and the approach to look at this issue would change based on various situations.

Disclaimer:- These sizing methodologies are based on experiences and best practices, hence there is nothing right or wrong, this is scenario based, hence use this as a guideline and not a thumb rule.

For ease of understanding, let me break down the discussions into different situations. In this article I will speak about the first situation.

Situation 1: Sizing for an RFP response::

This section is for consultants who work on responding to RFP's which have scope of Server Virtualization and Consolidation. Usually in a RFI or RFP situation, the amount of information given by customers is limited. On the basis of this limited information, the customers expect you to size and price the target virtual infrastructure. This situation can be divided into 2 scenarios:-

Scenario A- The inventory of existing servers is available. Such inventory includes data points like Server Count, CPU / Cores per Server, Provisioned Memory, Provisioned Storage etc. This data may or may not include the utilization data for these servers. Usually customers pull out this inventory using some existing tool in the infrastructure such as a ITM, SCOM etc.

Lets see an example which will help is size a single server with this information:-

The current physical server has the following data available:-

Server Name - WEBROLE1

CPU Specs - 2 CPU, Dual core, 2.4 GHz

Memory - 8 GB

HDD - 500 GB

OS - Windows 2003 Standard Edition

Utilization data (Assuming it is available)

Avg CPU Utilization - 5%

Avg Memory Utilization - 8%

Disk Utilization - 47%

Now lets look at the Average utilization in the real terms:-

Total CPU MHz available on the server = 2 CPU x 2 Cores x 2400 MHz = 9600MHz

Average used in MHz = 5% of 9600 Mhz = 480 Mhz

Total Memory available on the server = 8192 MB

Average used in MB's= 8% of 8192 MB = 656 MB

Total Disk available on the server = 500 GB

Used in GB's = 235 GB

So now if you look at the utilization we get the following specs

CPU = 480 MHz

Memory = 656 MB

Disk = 235 GB (Actual Used)

Is this enough to size the target machine?? The answer is a BIG NO.

Though I said there are no THUMB RULES, there is one RULE which we should always consider.

"Never Size on Average Utilization" - "Always Size on Peak Utilization"

Hence if we have the the average utilization data, the target virtual machine CPU & Memory size CAN NOT be derived using the values as described above.

In such a scenario we will either use the PEAK UTILIZATION values, if given by the customer, else we would have to go into an assumption mode based on which you would size the VM.

Assuming the values given the example above are PEAK Utilization values then target utilization of the virtual machine would be:-

CPU = 480 MHz

Memory = 656 MB

Now, it is always a best practice to add a buffer to the Peak Utilization values. Hence add a buffer of 25% to the Peak Utilization Values:

CPU = 480 MHz + 25% = 600 MHz

Memory = 656 MB + 25% = 820 MB

There are 2 reasons why we added this 25%:

i) The peak utilization data is a single peak point collected, however there could be multiple peak points across business cycles which we needs to address, hence a buffer is always good.

ii) Its good to have some head room for situations where the memory utilization shoots up due to a misbehaved service, process, application etc.

Now lets consider the other factors which might influence this sizing:-

i) We should look at the OS requirement as well to ensure that we do not go below the requirement while right sizing the virtual machine. Hence in this case, for a Windows 2003 Standard, Microsoft recommends a minimum of 2 GB of RAM, hence the vRAM for the above example would trim down to a minimum of 2 GB.

ii) For number of vCPU's for the above example, we see that we need only 600 MHz of processing power to suffice the workload need, however we should ensure that we meet the minimum OS requirement to derive the required clock speed. At the same time the decision to introduce vSMP (more than one vCPU) in a virtual machine is a tricky one. Ensure to minimize the number of vCPU's to as low as possible to ensure that we are not introducing the factor of "CPU Ready TIME" on the virtual machine. For situations where the number of vCPU's on a ESXi server are more that the underlying hardware cores, Time Sharing is introduced. This time sharing can lead to CPU Ready time in order to schedule CPU threads during contention. For more on this read Performance Best Practices for VMware vSphere 5.0. This will help you understand why more is not always better. So in the above case since the server role was using multiple cores in the physical world, I would give it 2 vCPU's. Since this is an RFP stage, you might not have the privilege to speak to an application owner, hence please clearly state your assumptions and reasons for taking this value.

Hence, with the above example, we were able to size the target virtual machine's CPU and Memory Requirement. The Disk requirement is an open factor and most of the RFP's will have information on how would the customer want to work with the storage space of the servers during and post Virtualization. Using the above method you can size for all the physical machines which are available to derive the total CPU and Memory Requirements to run the workloads post Virtualization.

Phewww.....

Coming back to where we started!! What if you don't have the Peak Utilization? How do you size in such case??

The answer is simple, increase you buffer window from 25% to 35%. The additional 10% would be used to cater the needs of peaks in the workloads. As always, mention this clearly in the assumptions.

I know what you are thinking... What if the customer does not give you the utilization data at all.. This might sound like a GREY area, but trust me, I have encountered such cases very regularly. In such situations, you need to take the industry assumption that any physical server which is assessed has an average utilization of 2% to 15% (This will hold good for 95% of physical servers). Hence I would recommend that you take the highest value i.e. 15% and add a buffer of 35% (to size for Peaks + Head Room) as explained above. Based on this you would end up with a 50% utilization. Though this sounds on a higher side, you would have your chances to slim down this number during the due diligence phase of any engagement.

I guess the last possibility for an RFP would be around a scenario where a customer would not give you any data at all. The statement or the end goal of the customer might just say that he wants to Virtualize 1000 physical servers. That brings us to the Scenario B.

Scenario B - Though I don't like to be in this scenario, but you can't really help if a customer expects you to give sizing and pricing based on the count of servers he has.

Lets understand this by using an Example:-

XYZ LTD. has floated an RFP with a goal to virtualize 1000 Physical servers in his datacenter. The total usable capacity of the storage being used in 10 TB.

With this information, we need to find out the compute requirements of the virtual machines running in the end state. Lets see how we break down this issue:-

First we would have to assume sizes of the virtual machines which might come up in the end state. Taking an assumption we divide the virtual machines in his environment into 3 categories based on sizes:-

Table - 1

Virtual Machine Type	# of vCPU's	vRAM (GB)
Large	8	16
Medium	4	8
Small	2	4

After this we should look to break down the environment into 4 different categories if not mentioned by the customer:-

Table - 2

Workload Type	Category	Percentage
Platinum	Business Critical Workloads	10%
Gold	Production Workloads	40%
Silver	Infrastructure Workloads	20%
Bronze	Test/Dev/UAT	30%

Based on the table above if we divide the 1000 servers into the defined categories, we will get the following values:-

Table - 3

Workload Type	Category	# of Servers
Platinum	Business Critical Workloads	100
Gold	Production Workloads	400
Silver	Infrastructure Workloads	200
Bronze	Test/Dev/UAT	300

Now these tables should go into your assumptions and the numbers can absolutely change based on the data given to you by the customer. if I have to define the Work load type, here are some examples of each workload type:-

PLATINUM - Production Databases, ERP, SAP, Email & Collaboration, Other Customer apps etc.

Gold - Web Servers, Applications (packaged or custom), vCenter, CRM etc.

Silver - Directory Services, Anti Virus, DHCP, DNS, Security, Patch Management etc.

Bronze - Test, UAT & Dev environments.

The next task is to give a Virtual Machine Type to a Workload Type. In simpler terms, what would be the size of the Platinum, Gold, Silver or Bronze virtual machines. We would need to take some assumptions here to derive these numbers. The assumptions I have taken are listed in table - 4. Your table might look different if you have some information from the customer about what is the distribution of sizes across workloads.

Table - 4

	Workload Size Percentage
Workload Type	Large	Medium	Small
Platinum	50%	25%	25%
Gold	30%	35%	35%
Silver	20%	40%	40%
Bronze	30%	20%	50%

This will give you the numbers you are looking for. Lets calculate for the these 1000 servers and see how many vCPU's and vRAM we need for each category mentioned above.

Table - 5

	Workload Size Percentage
Workload Type	Large Servers	Medium Servers	Small Servers	Total Servers
Platinum	50	25	25	100
Gold	120	140	140	400
Silver	40	80	80	200
Bronze	90	60	150	300
_____________ Total Servers	__________________ 300	_____________ 305	___________ 395	___________ 1000

Since we now know the size per category, here is total amount of vCPU & vRAM requirement as per Table -1.

Workload Type	Total vCPUs	Total vRAM (GB)
Platinum	550	1100
Gold	1800	3600
Silver	800	1600
Bronze	1260	2520

Now comes the biggest questions, how many vCPUs per Core? Can we over-commit memory? Have you not faced such questions before? I bet you have. Here is a guideline which I tell the customers to follow:-

Table - 6

Workload Type	vCPU to Core Ratio	Memory Ratio
Platinum	1.75 vCPU for 1 Core	1 GB vRAM for 1 GB pRAM
Gold	2.5 vCPU for 1 Core	1 GB vRAM for 1 GB pRAM
Silver	3 vCPU for 1 Core	1.4 GB vRAM for 1 GB pRAM
Bronze	4 vCPU for 1 Core	1.6 GB vRAM for 1 GB pRAM

So lets see how much physical RAM and Cores we need for XYZ LTD's physical servers to Virtualize:-

Workload Type	Physical Cores	Physical RAM (GB)
Platinum	315	1100
Gold	720	3600
Silver	267	1143
Bronze	315	1575
Total	1617	7418

Now if your target server is a 2 CPU 10 cores Blade with 96 GB of RAM, lets see how many such blades you would need to consolidate these 1000 Physical servers.

Total Number of Servers required = 1617 / 20 = 81 Servers approx.
The total physical RAM we get is = 81 x 96 = 7776 GB

Hence to Virtualize these 1000 Physical workloads, we would need 81 ESXi servers giving us a handsome consolidation ratio of 1:13 (approximately). Voila.. we are done here...

Though Licensing is a different topic, but you would need 162 vSphere 5.0 Enterprise Plus licenses to license these ESXi servers.

I hope this helps you size your solutions correctly and respond to those RFP's with more accurate and sensible information. I will talk about other situations in my upcoming articles. Till then..

Happy Virtualizing......

Regards
Sunny Dua

13 comments:

Arun PandeyJune 30, 2012 at 9:57 AM
Sunny, this is an excellent post, very well written.

--Arun
pravin utekarJune 30, 2012 at 11:43 AM
Hi Sunny very helpful post .For the entry level person like me this knowledge is very useful.Thanks for sharing your thoughts. Keep sharing.

Thanks
PravinU.
SunnyJuly 1, 2012 at 12:50 AM
Thanks @arunp @pravin...
AnonymousJuly 1, 2012 at 9:38 AM
its a lovely post sunny.

one suggestion based on my experience, i would not do much overcommitment for the memory specially for business critical Apps like SAP, Oracle Business suites etc
The cases we have worked on, post deployment customer encounters higher response time. Most of such Apps are memory driven and needs it to the fullest when Apps process the punched data to the DB. Apps can still take load of compute (CPU) overcommitment but not memory.
This may not be the case in every environment, just shared my experience.

this is a wonderful sizing article for beginners. Kudos!!
SunnyJuly 1, 2012 at 10:19 AM
Thanks @Anonymous. I do agree to your point that applications such as SAP, Oracle E-business Suites, Databases etc are memory hungry. For the same reason I have mentioned the following in Table 6.

For Platinum & Gold Workloads there is no memory over commitment at all. And I have always seen customers categorizing such workloads as more important or Platinum/Gold. So I do agree to what you said. Though ESXi has some wonderful memory saving techniques such as TPS & Memory Compression, its still important to size such workloads for memory as you would in physical.

Thanks once again... Would appreciate you leave your name on the posts... :-)
amitrathodJuly 5, 2012 at 11:07 PM
Really Good and helpful Sunny. Way to go. Very well written. Kudos :-)

>Amit<
ShyfurJanuary 12, 2013 at 8:22 PM
Its really a helpful for beginner like me. Thanks for this post. But unfortunately I did not understand some calculation. It would be very much helpful for if you share your thoughts.

1. How did get (Platinum 550 1100) for Total vCpu and vRam as per table 1? What is the calculation?

2. Same question for (Platinum 1.75 vCPU for 1 Core 1 GB vRAM for 1 GB pRAM) this calculation.

3. Same again. How did you get 315 cores.

I know you are laughing. But bro I dont understand the calculation have done. I know you are busy. But I hope you will guide me.

Thanks in advanced.
SunnyJanuary 13, 2013 at 10:14 PM
Shyfur,

The above method shown and numbers taken are mostly assumptions, since the theme of this article is based on RFP responses. In most of such cases you would not have any data from the customers. In case you do not, you can use the standards which I have mentioned in the article.

Also, Core to vCPU ratios are something which come from experience. These are not thumb rules but guidelines.

All the Scenarios I have defined have assumptions based on the situation and some data is from experience.
VishalJanuary 31, 2013 at 5:36 AM
This comment has been removed by the author.
Vijay SangarramuMarch 18, 2016 at 9:43 PM
Hi Sunny,

Thanks for the wonderful post for the beginners to understand more in detail about the vm size calculation.

It would be great if you could explain how you decide 1.75 vCPU per core for plantinum servers, 2.5 vcpu for gold servers, 3 for bronze servers, 4 for silver servers as per your post.

what makes the difference these numbers towards the core.
mySeptember 23, 2016 at 1:56 AM
Hi Sunny, can i know ur reference to make this article? thx b4

vXpress

Pages

Saturday, June 30, 2012

VMware vSphere - What is the correct way to right size a virtual machine?? (Situation 1: Sizing for an RFP Response)

VMware vSphere - What is the correct way to right size a virtual machine?? (Situation 1: Sizing for an RFP Response)

13 comments:

Popular Posts