To V or Not to V is the Question?

Buzz Word

Has virtualization become a buzzword in your strategic plan? Take the advice from someone who has spent years supporting VMware in three different enterprise environments – think before you V.

If all you plan is a small experimental cluster of virtual servers, it’s no big deal. The software is even free. VMware ESXi is the best way to go. But, remember, it’s only a seed for future growth. Growth will evolve into a nest of unexpected problems if you don’t think before you start to roll out ESXi hosts for your DEV, QA, and PROD environments.

V is not for Vendetta

Cynical as this may sound, it’s more truth than fiction! Five months from the day the first ESXi host is deployed in your datacenter you will have users complaining about poor performance, and your smartest system administrators will begin many long hours of research to solve riddles for virtualization backups, migrations, storage, and other strange unknown glitches that are attributed to the “V” word.

Fix for SCSI Errors Filling Up Logs

If you are dabbling as I said earlier go for it. Virtual servers are fun to work with – that is – until you get too many on one host and you need to reboot the host but you can’t because there is a DEV server that somehow has become a production server and is running a business-critical application that absolutely cannot be bounced! Now you are caught between complaining users, worried managers and not knowing for sure if the fix applied by your system administrator will be the solution for the SCSI errors that are filling up the logs and killing performance. All you need to do is bounce the host to see if changing the queue depth does the job. It should be because the ESX command was found by your system administrator when Googling “Fix for SCSI Errors Filling Up Logs”

and reading through 50,000 threads about SCSI errors and VMware.

I am not trying to be sarcastic; I am trying to make you think! The incident about the SCSI error is a real situation that has played out in two of the three VMware virtual environments I have worked in. Google “SCSI error” + VMware and see for yourself.

Ignorance and Arrogance

The two biggest problems I have noticed since I started supporting VMware are ignorance and arrogance. Here’s what I mean.

“I” is for Ignorance

First, just about every virtual environment starts off with ignorant managers who do not understand what’s required to build a successful virtual environment. They just know everyone else is doing it and it’s supposed to save a lot of money. Unfortunately, all the future savings will go out the windows with support and license costs, and lost labor. Support and license costs are easy enough to understand, but what do I mean by lost labor? Well, lost labor is the many hours, days, and weeks that your top system administrators spend trying to solve VM riddles. Even when you purchase expensive to support the problems don’t go away. In most cases, someone just finds a work-around because the solution for the root cause is purchasing a new SAN or more powerful hardware because someone didn’t bother to check the HCL (hard compatibility list).

“A” is for Arrogance

Oh, then there are the problems associated with arrogance. Here’s how this one plays out. You and the stakeholders decide to deploy a virtual infrastructure to save money, right? You start with the basic package of three hosts and a virtual center which quickly grows to 16 and 32 hosts (whistles and bells included are: vMotion, HA, DRS). Then you schedule a 30-minute meeting with your systems team and pick your best system administrator to lead the “Virtualization Project” (maybe even send him to training). He/she has built a ton of servers and even has an MCSE or RHCE. VMware is easy to set up. Just load the hosts, install the virtual center, and start building and P2Ving servers into the environment. It’s so easy, anybody can do it, right?

A week later, Wow! Everyone thinks you have the VMware god working for you. Pretty soon the VMware god gets vMotion running and HA to work. Next, scripts are running to reboot and auto-config VMs. Another 30-minute meeting is scheduled to plan out-migrating servers into the virtual infrastructure; this plan is called the “Server Consolidation” Project. The system administrator has become a VMware expert in two months and this is his/her virtual environment…. Quoting from the good book, “Pride comes before the fall”.

I am a firm believer that history repeats itself so here’s what I am predicting happens next. Beneath the surface of your virtual world trouble is brewing. First, the problems are small and your VMware expert can figure them out, then the problems get bigger and as any normal person would do, the expert doesn’t tell anyone because they’ve gotten used to being called the VMware god and they don’t want anyone to know they are really mortal. Soon emails start trickling into your inbox and you are the CC:. The subject states “why is my VM so slow logging in?”. Soon it’s no longer a trickle and you are now the “To:” from all the users and the VMware god is at the end of the list which includes everyone in the GAL and your boss. The cat’s out of the bag, there’s a problem with VMware and everyone now knows the VMware god is mortal and they come looking for you. No this is not a Steven Spielberg novel it’s real-world stuff…

How Do You Measure Success – Invisible

The worst thing that could’ve happened to your virtual infrastructure has happened. Technically things are fixable but what will not be fixed any time soon is the bad user experience that has affected hundreds or even thousands of users who work on servers that were virtualized. All your effort to sell virtualization to the business units has just been destroyed and users don’t want virtual servers anymore. Managers even demand that all their applications be moved off your virtual infrastructure because they can’t handle bad performance and complaining workers any longer.

Rule of thumb: The measurement of a successful VMware deployment is not how many VMs can be hosted per ESX host, it’s how many users can be satisfactorily serviced without them knowing they are using virtual technology. Virtualization should be invisible. Once users start noticing footprints in the snow, it’s over before a shot is fired.

Do It Right the First Time

If to V or not to V is still the question then here are my recommendations? Don’t even set up a single host, even with the free version ESXi if you don’t plan to do it right. It will just snowball from there into a monster with unexpected results and in the end, users will think all virtual environments are the same – crap!

If I haven’t deterred you and you still want to virtualize, then here’s the right way to do it. (Pardon my arrogance).

First, work closely with vendors. Listen to them and let them guide you. Yes, they will try to sell you better equipment, but not because they want to take advantage of you. They can’t tell you this but they know you will have problems if you buy the cheapest equipment. My guess is you will still be tempted to buy the cheap stuff then later you will blame the vendor when the users start emailing you about their bad experience. As I stated earlier, it will be too late and you are to blame.

For those who do listen to their vendor’s advice, the next item is more important. Hire an experienced VMware administrator. Don’t try to turn a system or network administrator into a VMware administrator. VMware administrators are specialists just like a DBA. Here’s why, VMware is an infrastructure technology, not a server technology. Sure a system administrator can become a skilled VMware administrator, I did. Here’s why you need someone from the outside, they will be objective about things like storage, networking, and quality. Existing internal staff will cut corners because they always do, you just don’t know about it. And, chances are likely that you will trust an outsider’s discretion more than a buddy or subordinate. If it’s done right the users will never know the difference, if it’s done wrong, everyone all the way up to the CEO will be sending you an email.

Third, virtualization requires total collaboration with storage, networking, developers, operations, and business unit managers. If you can’t enlist them to work together as the “Virtualization Team” forget it. There’s nothing more frustrating than trying to figure out a problem when storage administrators will not let you verify they are using “Best Practices” for zoning your ESX host server storage fabric, or when a network administrator will not do what you ask when you tell them why curtain VLANs need to be trucked on all four data ports for the same host. The list of people hassles goes on…

Forth, this is more important than food and water. Please, heed my warning. After everyone has done the best they can do to make sure your virtual infrastructure will work and be a success. Do not come out of the left field with your own ideas and change everything after weeks and months of planning. Don’t change the SAN from tier 2 to tier 3. Don’t buy a cheaper model of server. Don’t redo how the servers will be backed up. Don’t change anything… The time for making changes is during the planning process or when an unexpected problem surfaces. The problem should not be you at the last minute throwing a wrench into the mix. Please!

Finally, virtualization is a great technology that will eventually find its way into your data center whether you want it or not. The way you choose to go is simple: let it evolve into a monster that everyone will hate or, plan and architect its deployment into something nobody knows exists. Remember, virtualization should be transparent to the users. When they know it’s there, someone hasn’t done their job right.

Conclusion!

To virtualize or not to virtualize is the question? My answer to this question is in the form of a question, “How important are people to you?” If their satisfaction is the most important thing on your list and if happy people pecking away on the keyboard is your top priority then make sure when you begin to assemble your virtualization team and you start scheduling 30-minute meetings, everyone knows from the first day until the project is completed that the objective is “virtualization should be invisible to the user”.

2 Comments

Leave a Reply