Eric Sloof, instructor and blogger for NTPro.nl, gave a great presentation on advanced troubleshooting on vSphere.
Eric shows that you can use esxtop for troubleshooting on almost every level. He said a lot about troubleshooting. Below you’ll find the things I could write down during his talk.
CPU Ready Time. interval in the graphic is important. The measured time has to be divided by the sample time. He talked about %RDY times and that it isn’t always a problem. Also the different scheduling mechanisms were covered.
Too much vCPUs on a virtual machine. One of the most important things I think was the tantrum: “Only add CPU’s when it necessary. First troubleshoot, then add”
Transparent page sharing reclaims memory by consolidating redundant pages with identical content. When you boot a Windows VM it will zero out all memory blocks. ESX doesn’t know what memory is free within the virtual machine.
Ballooning fills the memory in the VM that isn’t used. Ballooning is Ok as long as it doesn’t swap to disk within the guest.
Don’t use VM memory limits. Size the machine accordingly to needs.
Be careful with memory reservation as this will be used to calculate the slot size in an HA cluster with ‘number host failure allowed’
Memory compression will be used to compress memory before the host will fallback to swapping.
As a last resort memory will be swapped Swapping to SSD costs 12%, FC 69% and SATA 83%. So, if you have to swap, make sure it is to SSD.
Check the storage latency with esxtop. It shouldn’t be above 20ms then you have a serious disk performance bottleneck.
When a lot of virtual machines have snapshots the storage becomes very slow. This has to do with metadata locking on the VMFS volume.
Next to esxtop vSCSIstats (www.VMdamentals.com) collects and reports counters on storage activity. It will create a graphic
Misalignments are (as we all know) killing for performance. Check with the storage vendor what is best for your environment.
Network stack, load based teaming. Make sure that switches are stacked, or at least configured them with PortFast.
If you want to know more about the vSphere Advanced troubleshooting, checkout the the Performance troubleshooting guide on the VMware site.
A short list of troubleshooting tools:
- VeeAm monitor
- VMTurbo watchdog
- Quest vFloglight
- vkernel capacity analuzer
- VESI vmware community powerpack
- vmware health check analyzer
- Graph-VM (Bouke Groenescheij) http://www.jume.nl
- esxplot perfmon
- RVtools (Rob de Veij) http://www.robware.net
Originally Posted at VMGuru.