Here’s the truth – I’ve made a lot of VMware mistakes.
There I said it, now here are 3 tips to help avoid costly downtime and outages caused by beginners performing a vSphere installation and setup for the first time.
Before we get started, I want to share my new guide to the top step-by-step vSphere video training.
vSphere for Beginners
Tip #1 – Design (No Shooting from the Hip)
Nothing is more painful than having to rebuild a vSphere cloud already being used to support 100s of VMs because it was never properly designed.
My guess is these environments grow out of a single free ESXi host installed on an old re-purposed server dug up from the bone-yard for an evaluation, but somehow it sprawled into hosting VMs for business-critical applications or databases.
Does this mistake sound familiar?
Without a design plan, this will happen to you, too!
So start with a design [and a budget] so you do not find yourself robbing graves to build your first cloud.
The plan should include:
1. Purpose (scope). What’s the purpose of the environment is for POC, Test, Dev, UAT, Prod, etc?
2. Budget. Especially if it will be used for Prod, you should not use hardware that is out-of-warranty. This is an absolute no-no!
- If you plan to use it to scale larger than one host purchase licenses and forget about the free stuff. You don’t want to be wasting time trying to figure out problems without having technical support.
- Don’t try and use hardware, not on the HCL. This will come back and bite you big time when you can’t upgrade because the hardware is not supported and there is a bug causing problems. This goes for the server, storage, and network hardware.
3. Design. Have a whiteboard session and include storage and network groups. vSphere touches all these technologies and they need to be in on the design so they are following best practices to support virtualization. Collaboration on the design is key to long-term success, scalability, and performance.
4. Documentation. Document your design and do Visio drawings of servers, networks, and storage configurations. Also, create build checklists and include details such as IP addresses, VLANs, ports, and naming conventions.
The biggest mistake I often see happening is the server team, or IT department, shooting from the hip and just doing a vSphere installation and setup from scraps, and then suffering from the pain caused by gridlock.
Tip #2 – Have a Capacity Management Plan (Contain Sprawl and Trend Utilization)
The second mistake happens in part due to the value-add virtualization offers which is the ability to leverage hardware.
Virtualization allows us to take full advantage of powerful devices and run them at greater than 100% utilization.
Virtualizing is more efficiency, but on the down side leads to over using the hardware and causes poor performance on the servers, storage, and networks.
Without a capacity management plan to control sprawl and back-fill capacity as it is used, your cloud wit run out of resources and the performance of existing VMs will suffer.
And once the pain begins, there is no way to reduce it except to power off VM servers to free up resources or add more physical capacity. This, however, presents a temptation to dig up graves and bring old hardware back online while new hardware is being haggled over.
Furthermore, once old servers are brought back online they may never be decommissioned and risk costly crashes and data loss.
Avoid this issue by having a capacity management plan and budget to keep your cloud growing based on demand trends and utilization.
A basic vSphere Capacity Management Plan should:
1. Trend Demand and Utilization. This is normally done by knowing and trending your demand so you can add resources before your CPU, memory, and storage thresholds are breached.
2. Monitor Thresholds. Thresholds should be determined and monitored so more capacity can be added once they are passed. This gives you time to get quotes, make purchases, and set up the added capacity before it is needed for new projects.
3. CAPEX Budget. CAPEX is an area of major contention in environments suffering from over-subscription.
Beware of the myth that VMware saves money and provides unlimited server resources. This myth is false and can bury you.
Plan your budget to include regular CAPEX for new hardware and costly licenses and support.
Tip #3 – Develop VM Provisioning Standards (Test and Document)
Even the best design and capacity management plans cannot fix a performance problem caused by a badly provisioned virtual server configuration.
One shoe does not fit all sizes.
Even if the goal is to commoditize or streamline your build process so there are not 50 different builds, you still need to tweak VM configs to fit or you will have problems or performance issues later on.
This means one basic VM template cannot be used for any type of server build.
1. Do your own testing. Test what works best for your applications and document it.
2. Enforce your standards. Once you figure out what works best, stay true to your standards and enforce them. Don’t skimp on resources, and don’t over-provision. Your standard should be balanced and should follow best practices for resource allocation and configuration.
I recently posted a checklist for troubleshooting VM performance issues, and this VM best practices checklist goes hand in hand.
- Did you split the VM VMDK volumes on a separate datastore?
- Did you create separate vControllers for each VMDK (especially if it is a database VM server)?
- Did you configure ESXi, Windows, or the OS to properly manage the swap file?
- Are VMtools updated and properly configured?
- Are you using 1:1 memory and CPU ratios (don’t overuse physical resources)?
- Are you rightsizing and not over-provision memory or vCPUs (remember it’s not a physical server)?
- Are you taking advantage of templates to deploy VMs?
- Are you keeping Gold templates updated (patches, agents & middleware)?
- Are you monitoring network and storage utilization?
- Are you avoiding file-level backups or antivirus scans on the local VM OS?
- Are you splitting your management network traffic from your data and storage network traffic?
And finally, did you customize the VM configuration to the best practices for the type of virtual server being built, and then clone it as many times as needed to avoid configuration drift [fat fingering errors]?
Download the VM Best Practices Checklist.
Only do what works best for your environment.
A common VMware beginner mistake is trying to use every trick you read about.
Different types of servers require varying resource configurations and by doing your research you will find there are 100s of best practices for database, web, and application servers on the web, even I post a few.
But be warned, each best practices are different depending on the environment [and use cases] so evaluate them and only use what works best for your applications.
Summarizing these Tips:
Have a solid vSphere cloud design that includes participation from other stakeholders. Getting network and storage teams involved in the beginning avoids battles later on over ownership and is one of the most common vSphere beginner mistakes.
Have a capacity management plan so you avoid reusing old hardware once it’s retired. Trend growth and budget for more demand.
And having standards for your virtual server builds so your team follows best practices for the type of servers being deployed. Test and develop your own best practices and document them in your own checklist.
Check out these other posts I’ve written on the same topic:
- vSphere Pre-installation Checklist
- Mentoring new virtualization staff
- Spreading your VMware admin resources too thin
- How to keep a VMware project on track with Kanban
- Protecting against data loss
- Monitoring vSphere
- Uptime Monitoring
These tips are all to help avoid mistakes beginners make when installing and managing a vSphere Cloud for the first time.
Do you have a tip or mistake you’d like to share?