HP XC System 3.x Software User Manual

Page 81

Advertising
background image

NOTE:

The --nodelist=nodelist option is particularly useful for

determining problematic nodes.

If you use this option and the --nnodes=n option, the --nnodes=n option is
ignored.

The --queue LSF_queue option specifies the LSF queue for the performance
health tests.

test

Indicates the test to perform. The following tests are available:
cpu

Tests CPU core performance using the Linpack
benchmark.

cpu_usage

Tests CPU core usage. All CPU cores should be
idle during the test. This test reports a node if it
is using more than 10% (by default) of its CPU
cores.

The head node is excluded from this test.

memory

Uses the streams benchmark to test memory
performance.

memory_usage

Tests memory usage. This test reports a node that
uses more than 25 percent (by default) of its
memory.

network_stress

Tests network performance. Check network
performance under stress using the Pallas
benchmark's Alltoall, Allgather, and Allreduce
tests. These tests should be performed on a large
number of nodes for the most accurate results.

The default value for the number of nodes is 4,
which is the minimum value that should be used.

The --all_group option allows you to select
the node grouping size.

network_bidirectional

Tests network performance between pairs of
nodes using the Pallas benchmark's Exchange
test.

network_unidirectional

Tests network performance between pairs of
nodes using the HP MPI ping_pong_ring test.

NOTE:

Except for the network_stress and network_bidirectional tests,

these tests only apply to systems that install LSF-HPC incorporated with SLURM.
The network_stress and network_bidirectional tests also function under
Standard LSF.

You can list the available tests with the ovp -l command:

$ ovp -l

Test list for perf_health:

cpu_usage

memory_usage

cpu

memory

7.6 Running Performance Health Tests

81

Advertising