How esx/esxi numa scheduling works, How esx/esxi numa scheduling works 74 – VMware vSphere vCenter Server 4.0 User Manual

Page 74

Advertising
background image

There are many disadvantages to using such an operating system on a NUMA platform. The high latency of

remote memory accesses can leave the processors under-utilized, constantly waiting for data to be transferred

to the local node, and the NUMA connection can become a bottleneck for applications with high-memory

bandwidth demands.
Furthermore, performance on such a system can be highly variable. It varies, for example, if an application has

memory located locally on one benchmarking run, but a subsequent run happens to place all of that memory

on a remote node. This phenomenon can make capacity planning difficult. Finally, processor clocks might not

be synchronized between multiple nodes, so applications that read the clock directly might behave incorrectly.
Some high-end UNIX systems provide support for NUMA optimizations in their compilers and programming

libraries. This support requires software developers to tune and recompile their programs for optimal

performance. Optimizations for one system are not guaranteed to work well on the next generation of the same

system. Other systems have allowed an administrator to explicitly decide on the node on which an application

should run. While this might be acceptable for certain applications that demand 100 percent of their memory

to be local, it creates an administrative burden and can lead to imbalance between nodes when workloads

change.
Ideally, the system software provides transparent NUMA support, so that applications can benefit immediately

without modifications. The system should maximize the use of local memory and schedule programs

intelligently without requiring constant administrator intervention. Finally, it must respond well to changing

conditions without compromising fairness or performance.

How ESX/ESXi NUMA Scheduling Works

ESX/ESXi uses a sophisticated NUMA scheduler to dynamically balance processor load and memory locality

or processor load balance.
1

Each virtual machine managed by the NUMA scheduler is assigned a home node. A home node is one of

the system’s NUMA nodes containing processors and local memory, as indicated by the System Resource

Allocation Table (SRAT).

2

When memory is allocated to a virtual machine, the ESX/ESXi host preferentially allocates it from the

home node.

3

The NUMA scheduler can dynamically change a virtual machine's home node to respond to changes in

system load. The scheduler might migrate a virtual machine to a new home node to reduce processor load

imbalance. Because this might cause more of its memory to be remote, the scheduler might migrate the

virtual machine’s memory dynamically to its new home node to improve memory locality. The NUMA

scheduler might also swap virtual machines between nodes when this improves overall memory locality.

Some virtual machines are not managed by the ESX/ESXi NUMA scheduler. For example, if you manually set

the processor affinity for a virtual machine, the NUMA scheduler might not be able to manage this virtual

machine. Virtual machines that have more virtual processors than the number of physical processor cores

available on a single hardware node cannot be managed automatically. Virtual machines that are not managed

by the NUMA scheduler still run correctly. However, they don't benefit from ESX/ESXi NUMA optimizations.
The NUMA scheduling and memory placement policies in ESX/ESXi can manage all virtual machines

transparently, so that administrators do not need to address the complexity of balancing virtual machines

between nodes explicitly.
The optimizations work seamlessly regardless of the type of guest operating system. ESX/ESXi provides

NUMA support even to virtual machines that do not support NUMA hardware, such as Windows NT 4.0. As

a result, you can take advantage of new hardware even with legacy operating systems.

vSphere Resource Management Guide

74

VMware, Inc.

Advertising
This manual is related to the following products: