Discussion:
Kernel overhead and idle time in SMP virtual guest
Andrej Podzimek
2010-06-24 06:25:10 UTC
Permalink
Hello,

This is what mpstat typically looks like on a heavily loaded system building OS/Net with dmake configured to 16 concurrent tasks:

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 561 0 16 383 94 171 13 13 82 0 521 12 43 0 45
1 667 0 22 107 2 150 14 12 83 0 571 15 35 0 50
2 655 0 15 166 108 107 14 12 159 0 564 15 42 0 43
3 643 0 17 109 5 151 15 12 82 0 542 15 35 0 50
4 683 0 17 102 2 139 15 12 79 0 570 15 35 0 50
5 675 0 18 101 2 140 14 12 79 0 569 15 34 0 50
6 714 0 14 100 2 129 15 12 78 0 584 16 34 0 51
7 622 0 23 93 3 127 14 11 73 0 521 15 34 0 51

Yes, the amount of idle and system time is *incredible*. Something must be wrong. More than 20 running processes are reported by 'top' most of the time, but at most 4 to 6 are on CPU at any given moment. :-(

This happens on a 64-bit VirtualBox (3.2.4 r62467) guest running on a Linux host (4-core Core i7 (8 threads)). Setting a lower number of CPUs (such as 4) does not help. Uniprocessor guests seem to work normally (with IO APIC disabled).

I've read about bugs affecting the IO APIC performance under VirtualBox, but all the reports say they are only relevant for 32-bit guests. This is a 64-bit one.

I migrated the machine into a QEMU-KVM environment and performance got much better, but still far from ideal. Instead of 15:35:50 (usr:sys:idl), I got something like 70:30:0 (usr:sys:idl). (So there was no inexplicable idle time. But still quite a lot of kernel overhead.)

Is this a known issue? Is there a solution? Could I diagnose it with DTrace somehow? A piece of advice would be very helpful.

Andrej


P. S.
Unfortunately, OpenSolaris guests with IDE drives larger than 128 GiB cannot currently run under QEMU-KVM. This is a description of the bug:
http://www.neuhalfen.name/2009/08/05/OpenSolaris_KVM_and_large_IDE_drives/
http://www.neuhalfen.name/2009/08/06/OpenSolaris_KVM_and_large_IDE_drives_II/

Virtual SATA drives don't suffer from this issue, but QEMU cannot emulate SATA (so far). That's why I'd like to find out why the performance of OpenSolaris under VirtualBox is so poor (and possibly find a workaround).
j***@public.gmane.org
2010-06-24 22:46:37 UTC
Permalink
Post by Andrej Podzimek
Yes, the amount of idle and system time is *incredible*. Something
must be wrong. More than 20 running processes are reported by 'top'
most of the time, but at most 4 to 6 are on CPU at any given moment.
What kind of workload are you running? What are these 20 running
processes doing?
Post by Andrej Podzimek
This happens on a 64-bit VirtualBox (3.2.4 r62467) guest running on a
Linux host (4-core Core i7 (8 threads)). Setting a lower number of
CPUs (such as 4) does not help. Uniprocessor guests seem to work
normally (with IO APIC disabled).
I've read about bugs affecting the IO APIC performance under
VirtualBox, but all the reports say they are only relevant for 32-bit
guests. This is a 64-bit one.
I migrated the machine into a QEMU-KVM environment and performance got
much better, but still far from ideal. Instead of 15:35:50
(usr:sys:idl), I got something like 70:30:0 (usr:sys:idl). (So there
was no inexplicable idle time. But still quite a lot of kernel
overhead.)
Hard to say. Some virtualized environments have bugs that confuse the
operating system.
Post by Andrej Podzimek
Is this a known issue? Is there a solution? Could I diagnose it with
DTrace somehow? A piece of advice would be very helpful.
You can use DTrace to profile the kernel.

# dtrace -n 'profile-1997hz { @a[stack(50)] = count();} END { trunc(@a, 30); }'

The above will show you the 30 most common kernel stacks. That might be
a start.

-j
Andrej Podzimek
2010-06-25 03:29:02 UTC
Permalink
Post by j***@public.gmane.org
Post by Andrej Podzimek
Yes, the amount of idle and system time is *incredible*. Something
must be wrong. More than 20 running processes are reported by 'top'
most of the time, but at most 4 to 6 are on CPU at any given moment.
What kind of workload are you running? What are these 20 running
processes doing?
It is a full nightly build of OS/Net. Most of those running processes are compilers (cc, gcc) and dmake also comsumes a considerable amout of time. There is no intensive disk activity, no anonymous paging. The VM has 3 GB of memory, which seems to be sufficient for the build.
Post by j***@public.gmane.org
Post by Andrej Podzimek
This happens on a 64-bit VirtualBox (3.2.4 r62467) guest running on a
Linux host (4-core Core i7 (8 threads)). Setting a lower number of
CPUs (such as 4) does not help. Uniprocessor guests seem to work
normally (with IO APIC disabled).
I've read about bugs affecting the IO APIC performance under
VirtualBox, but all the reports say they are only relevant for 32-bit
guests. This is a 64-bit one.
I migrated the machine into a QEMU-KVM environment and performance got
much better, but still far from ideal. Instead of 15:35:50
(usr:sys:idl), I got something like 70:30:0 (usr:sys:idl). (So there
was no inexplicable idle time. But still quite a lot of kernel
overhead.)
Hard to say. Some virtualized environments have bugs that confuse the
operating system.
Sure, but I'm definitely not the only one who runs OpenSolaris on an SMP virtual machine. So either Google would be full of reports on this (which is not the case), or there must be something wrong only with my system (either host or guest). Just trying to figure out what it is...
Post by j***@public.gmane.org
Post by Andrej Podzimek
Is this a known issue? Is there a solution? Could I diagnose it with
DTrace somehow? A piece of advice would be very helpful.
You can use DTrace to profile the kernel.
The above will show you the 30 most common kernel stacks. That might be
a start.
# dtrace -n 'profile-1997hz { @a[stack(50)] = count();} END { trunc(@a, 30); }'
dtrace: description 'profile-1997hz ' matched 2 probes
dtrace: processing aborted: Abort due to systemic unresponsiveness

:-( Houston, we've had a problem...

Andrej

Loading...