Saturday, June 30, 2012

VMware vSphere - What is the correct way to right size a virtual machine?? (Situation 1: Sizing for an RFP Response)

VMware vSphere - What is the correct way to right size a virtual machine?? (Situation 1: Sizing for an RFP Response)


Being in the field and working with a lot of VMware customers and partners, I often get questions about how to size my virtual machines. I often answer this question in different ways as the answer to this topic is quite subjective and the approach to look at this issue would change based on various situations.

Disclaimer:- These sizing methodologies are based on experiences and best practices, hence there is nothing right or wrong, this is scenario based, hence use this as a guideline and not a thumb rule.

For ease of understanding, let me break down the discussions into different situations. In this article I will speak about the first situation. 

Situation 1: Sizing for an RFP response::

This section is for consultants who work on responding to RFP's which have scope of Server Virtualization and Consolidation. Usually in a RFI or RFP situation, the amount of information given by customers is limited. On the basis of this limited information, the customers expect you to size and price the target virtual infrastructure. This situation can be divided into 2 scenarios:-

Scenario A- The inventory of existing servers is available. Such inventory includes data points like Server Count, CPU / Cores per Server, Provisioned Memory, Provisioned Storage etc. This data may or may not include the utilization data for these servers. Usually customers pull out this inventory using some existing tool in the infrastructure such as a ITM, SCOM etc. 

Lets see an example which will help is size a single server with this information:-

The current physical server has the following data available:-

Server Name - WEBROLE1
CPU Specs - 2 CPU, Dual core, 2.4 GHz
Memory - 8 GB
HDD - 500 GB
OS - Windows 2003 Standard Edition 

Utilization data (Assuming it is available)

Avg CPU Utilization - 5%
Avg Memory Utilization - 8%
Disk Utilization - 47%

Now lets look at the Average utilization in the real terms:-

Total CPU MHz available on the server = 2 CPU x 2 Cores x 2400 MHz = 9600MHz
Average used in MHz = 5% of 9600 Mhz = 480 Mhz 


Total Memory available on the server = 8192 MB
Average used in MB's= 8% of 8192 MB = 656 MB

Total Disk available on the server = 500 GB
Used in GB's = 235 GB

So now if you look at the utilization we get the following specs

CPU = 480 MHz
Memory = 656 MB
Disk = 235 GB (Actual Used)

Is this enough to size the target machine??  The answer is a BIG NO.

Though I said there are no THUMB RULES, there is one RULE which we should always consider.
"Never Size on Average Utilization" - "Always Size on Peak Utilization"

Hence if we have the the average utilization data, the target virtual machine CPU & Memory size CAN NOT be derived using the values as described above.

In such a scenario we will either use the PEAK UTILIZATION values, if given by the customer, else we would have to go into an assumption mode based on which you would size the VM.

Assuming the values given the example above are PEAK Utilization values then target utilization of the virtual machine would be:-


CPU = 480 MHz
Memory = 656 MB

Now, it is always a best practice to add a buffer to the Peak Utilization values. Hence add a buffer of 25% to the Peak Utilization Values:

CPU = 480 MHz + 25%    = 600 MHz

Memory = 656 MB + 25% = 820 MB

There are 2 reasons why we added this 25%:

i) The peak utilization data is a single peak point collected, however there could be multiple peak points  across business cycles which we needs to address, hence a buffer is always good.

ii) Its good to have some head room for situations where the memory utilization shoots up due to a misbehaved service, process, application etc.

Now lets consider the other factors which might influence this sizing:-

i) We should look at the OS requirement as well to ensure that we do not go below the requirement while right sizing the virtual machine. Hence in this case, for a Windows 2003 Standard, Microsoft recommends a minimum of 2 GB of RAM, hence the vRAM for the above example would trim down to a minimum of 2 GB.

ii) For number of vCPU's for the above example, we see that we need only 600 MHz of processing power to suffice the workload need, however we should ensure that we meet the minimum OS requirement to derive the required clock speed. At the same time the decision to introduce vSMP (more than one vCPU) in a virtual machine is a tricky one. Ensure to minimize the number of vCPU's to as low as possible to ensure that we are not introducing the factor of "CPU Ready TIME" on the virtual machine. For situations where the number of vCPU's on a ESXi server are more that the underlying hardware cores, Time Sharing is introduced. This time sharing can lead to CPU Ready time in order to schedule CPU threads during contention. For more on this read Performance Best Practices for VMware vSphere 5.0. This will help you understand why more is not always better. So in the above case since the server role was using multiple cores in the physical world, I would give it 2 vCPU's. Since this is an RFP stage, you might not have the privilege to speak to an application owner, hence please clearly state your assumptions and reasons for taking this value.

Hence, with the above example, we were able to size the target virtual machine's CPU and Memory Requirement. The Disk requirement is an open factor and most of the RFP's will have information on how would the customer want to work with the storage space of the servers during and post Virtualization. Using the above method you can size for all the physical machines which are available to derive the total CPU and Memory Requirements to run the workloads post Virtualization.


Phewww.....

Coming back to where we started!! What if you don't have the Peak Utilization? How do you size in such case??

The answer is simple, increase you buffer window from 25% to 35%. The additional 10% would be used to cater the needs of peaks in the workloads. As always, mention this clearly in the assumptions.

I know what you are thinking... What if the customer does not give you the utilization data at all.. This might sound like a GREY area, but trust me, I have encountered such cases very regularly. In such situations, you need to take the industry assumption that any physical server which is assessed has an average utilization of 2% to 15% (This will hold good for 95% of physical servers). Hence I would recommend that you take the highest value i.e. 15% and add a buffer of 35% (to size for Peaks + Head Room) as explained above. Based on this you would end up with a 50% utilization. Though this sounds on a higher side, you would have your chances to slim down this number during the due diligence phase of any engagement.

I guess the last possibility for an RFP would be around a scenario where a customer would not give you any data at all. The statement or the end goal of the customer might just say that he wants to Virtualize 1000 physical servers. That brings us to the Scenario B.

Scenario B - Though I don't like to be in this scenario, but you can't really help if a customer expects you to give sizing and pricing based on the count of servers he has.

Lets understand this by using an Example:-

XYZ LTD. has floated an RFP with a goal to virtualize 1000 Physical servers in his datacenter. The total usable capacity of the storage being used in 10 TB. 

With this information, we need to find out the compute requirements of the virtual machines running in the end state. Lets see how we break down this issue:-

First we would have to assume sizes of the virtual machines which might come up in the end state. Taking an assumption we divide the virtual machines in his environment into 3 categories based on sizes:-

Table - 1
Virtual Machine Type  # of vCPU's     vRAM (GB)
       Large               8             16
       Medium                                                          4         8
       Small        2         4

After this we should look to break down the environment into 4 different categories if not mentioned by the customer:-

Table - 2 


Workload Type Category       Percentage
Platinum    Business Critical Workloads            10%
Gold Production Workloads            40%
Silver Infrastructure Workloads            20%
Bronze Test/Dev/UAT            30%

Based on the table above if we divide the 1000 servers into the defined categories, we will get the following values:-

Table - 3


Workload Type         Category      # of Servers
   Platinum      Business Critical Workloads           100
   Gold     Production Workloads           400
   Silver   Infrastructure Workloads           200
   Bronze   Test/Dev/UAT           300

Now these tables should go into your assumptions and the numbers can absolutely change based on the data given to you by the customer. if I have to define the Work load type, here are some examples of each workload type:-

PLATINUM - Production Databases, ERP, SAP, Email & Collaboration, Other Customer apps  etc.

Gold - Web Servers, Applications (packaged or custom), vCenter, CRM etc.

Silver - Directory Services, Anti Virus, DHCP, DNS, Security, Patch Management etc.

Bronze - Test, UAT & Dev environments.


The next task is to give a Virtual Machine Type to a Workload Type. In simpler terms, what would be the size of the Platinum, Gold, Silver or Bronze virtual machines. We would need to take some assumptions here to derive these numbers. The assumptions I have taken are listed in table - 4. Your table might look different if you have some information from the customer about what is the distribution of sizes across workloads.

Table - 4 


      Workload Size Percentage
Workload Type Large Medium  Small
Platinum 50% 25% 25%
Gold 30% 35% 35%
Silver 20% 40% 40%
Bronze 30% 20% 50%


This will give you the numbers you are looking for. Lets calculate for the these 1000 servers and see how many vCPU's and vRAM we need for each category mentioned above.


Table - 5


                                        Workload Size Percentage
Workload
Type
               Large
               Servers
Medium Servers Small Servers Total Servers
Platinum                    50 25 25 100
Gold                   120 140 140 400
Silver                    40 80 80 200
Bronze                    90 60 150 300
_____________
Total Servers
      __________________          
                 300
_____________
305
___________
395
___________
1000

Since we now know the size per category, here is total amount of vCPU & vRAM requirement as per Table -1.



Workload
 Type
 Total vCPUs     Total
vRAM (GB)
Platinum 550      1100
Gold 1800      3600
Silver 800      1600
Bronze 1260      2520


Now comes the biggest questions, how many vCPUs per Core? Can we over-commit memory? Have you not faced such questions before? I bet you have. Here is a guideline which I tell the customers to follow:-

Table - 6


Workload Type vCPU to Core Ratio Memory Ratio
Platinum 1.75 vCPU for 1 Core 1 GB vRAM for 1 GB pRAM
Gold 2.5 vCPU for 1 Core 1 GB vRAM for 1 GB pRAM
Silver 3 vCPU for 1 Core 1.4 GB vRAM for 1 GB pRAM
Bronze 4 vCPU for 1 Core 1.6 GB vRAM for 1 GB pRAM



So lets see how much physical RAM and Cores we need for XYZ LTD's physical servers to Virtualize:-



Workload  Type Physical
Cores
Physical RAM (GB)
Platinum 315 1100
Gold 720 3600
Silver 267 1143
Bronze 315 1575
Total 1617 7418



Now if your target server is a 2 CPU 10 cores Blade with 96 GB of RAM, lets see how many such blades you would need to consolidate these 1000 Physical servers.

Total Number of Servers required = 1617 / 20 = 81 Servers approx.
The total physical RAM we get is = 81 x 96 = 7776 GB

Hence to Virtualize these 1000 Physical workloads, we would need 81 ESXi servers giving us a handsome consolidation ratio of 1:13 (approximately). Voila.. we are done here...

Though Licensing is a different topic, but you would need 162 vSphere 5.0 Enterprise Plus licenses to license these ESXi servers.

I hope this helps you size your solutions correctly and respond to those RFP's with more accurate and sensible information. I will talk about other situations in my upcoming articles. Till then..

Happy Virtualizing......

Regards
Sunny Dua
Enter your email address:


Delivered by FeedBurner

13 comments:

  1. Sunny, this is an excellent post, very well written.

    --Arun

    ReplyDelete
  2. Hi Sunny very helpful post .For the entry level person like me this knowledge is very useful.Thanks for sharing your thoughts. Keep sharing.



    Thanks
    PravinU.

    ReplyDelete
  3. its a lovely post sunny.

    one suggestion based on my experience, i would not do much overcommitment for the memory specially for business critical Apps like SAP, Oracle Business suites etc
    The cases we have worked on, post deployment customer encounters higher response time. Most of such Apps are memory driven and needs it to the fullest when Apps process the punched data to the DB. Apps can still take load of compute (CPU) overcommitment but not memory.
    This may not be the case in every environment, just shared my experience.

    this is a wonderful sizing article for beginners. Kudos!!

    ReplyDelete
  4. Thanks @Anonymous. I do agree to your point that applications such as SAP, Oracle E-business Suites, Databases etc are memory hungry. For the same reason I have mentioned the following in Table 6.

    For Platinum & Gold Workloads there is no memory over commitment at all. And I have always seen customers categorizing such workloads as more important or Platinum/Gold. So I do agree to what you said. Though ESXi has some wonderful memory saving techniques such as TPS & Memory Compression, its still important to size such workloads for memory as you would in physical.

    Thanks once again... Would appreciate you leave your name on the posts... :-)

    ReplyDelete
  5. Really Good and helpful Sunny. Way to go. Very well written. Kudos :-)

    >Amit<

    ReplyDelete
  6. Its really a helpful for beginner like me. Thanks for this post. But unfortunately I did not understand some calculation. It would be very much helpful for if you share your thoughts.

    1. How did get (Platinum 550 1100) for Total vCpu and vRam as per table 1? What is the calculation?

    2. Same question for (Platinum 1.75 vCPU for 1 Core 1 GB vRAM for 1 GB pRAM) this calculation.

    3. Same again. How did you get 315 cores.

    I know you are laughing. But bro I dont understand the calculation have done. I know you are busy. But I hope you will guide me.

    Thanks in advanced.

    ReplyDelete
  7. Shyfur,

    The above method shown and numbers taken are mostly assumptions, since the theme of this article is based on RFP responses. In most of such cases you would not have any data from the customers. In case you do not, you can use the standards which I have mentioned in the article.

    Also, Core to vCPU ratios are something which come from experience. These are not thumb rules but guidelines.

    All the Scenarios I have defined have assumptions based on the situation and some data is from experience.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Hey Vishal,

      Have you removed your question?? Lemme know if you need help with that..

      Regards
      Sunny

      Delete
  9. Hi Sunny,

    Thanks for the wonderful post for the beginners to understand more in detail about the vm size calculation.

    It would be great if you could explain how you decide 1.75 vCPU per core for plantinum servers, 2.5 vcpu for gold servers, 3 for bronze servers, 4 for silver servers as per your post.

    what makes the difference these numbers towards the core.

    ReplyDelete
    Replies
    1. Hi Vijay,
      These are over-commitment ratios which I am referring to. vCPUs to Physical Cores. These are just guiding principles, you can have different consolidation ratios based on what kind of workloads you are vitualizing.

      Delete
  10. Hi Sunny, can i know ur reference to make this article? thx b4

    ReplyDelete