[20161105]
|
NetBSD 7.0/xen scheduling mystery, and how to fix it with processor sets
Today I had a need to do some number crunching using a home-brewn
C program. In order to do some manual load balancing, I was firing
up some Amazon AWS instances (which is Xen) with NetBSD 7.0.
In this case, the system was assigned two CPUs, from dmesg:
# dmesg | grep cpu
vcpu0 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
vcpu1 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
I started two instances of my program, with the intent to
have each one use one CPU. Which is not what
happened! Here is what I observed, and how I fixed things for now.
I was looking at top(1) to see that everything was running fine,
and noticed funny WCPU and CPU values:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
2791 root 25 0 8816K 964K RUN/0 16:10 54.20% 54.20% myprog
2845 root 26 0 8816K 964K RUN/0 17:10 47.90% 47.90% myprog
I expected something like WCPU and CPU being around 100%, assuming
that each process was bound to its own CPU. The values I actually
saw (and listed above) suggested that both programs were fighting
for the same CPU. Huh?!
top's CPU state shows:
load averages: 2.15, 2.07, 1.82; up 0+00:45:19 18:00:55
27 processes: 2 runnable, 23 sleeping, 2 on CPU
CPU states: 50.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 50.0% idle
Memory: 119M Act, 7940K Exec, 101M File, 3546M Free
Which is not too useful. Typing "1" in top(1) lists the actual per-CPU usage
instead:
load averages: 2.14, 2.08, 1.83; up 0+00:45:56 18:01:32
27 processes: 4 runnable, 21 sleeping, 2 on CPU
CPU0 states: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Memory: 119M Act, 7940K Exec, 101M File, 3546M Free
This confirmed my suspicion that both processes were bound to one
CPU, and that the other one was idling. Bad! But how to fix?
One option is to kick your operating system out of the window,
but I still like NetBSD, so here's another solution:
NetBSD allows to create "processor sets",
assign CPU(s) to them and then assign processes to
the processor sets. Let's have a look!
Processor sets are manipulated using the
psrset(8) utility. By default all CPUs are in the same (system) processor set:
# psrset
system processor set 0: processor(s) 0 1
First step is to create a new processor set:
# psrset -c
1
# psrset
system processor set 0: processor(s) 0 1
user processor set 1: empty
Next, assign one CPU to the new set:
# psrset -a 1 1
# psrset
system processor set 0: processor(s) 0
user processor set 1: processor(s) 1
Last, find out what the process IDs of my two (running) processes are,
and assign them to the two processor sets:
# ps -u
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
root 2791 52.0 0.0 8816 964 pts/4 R+ 5:28PM 22:57.80 myprog
root 2845 50.0 0.0 8816 964 pts/2 R+ 5:26PM 23:33.97 myprog
#
# psrset -b 0 2791
# psrset -b 1 2845
Note that this was done with the two processes running,
there is no need to stop and restart them!
The effect of the commands is imediate, as can be seen in top(1):
load averages: 2.02, 2.05, 1.94; up 0+00:59:32 18:15:08
27 processes: 1 runnable, 24 sleeping, 2 on CPU
CPU0 states: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
CPU1 states: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
Memory: 119M Act, 7940K Exec, 101M File, 3546M Free
Swap:
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
2845 root 25 0 8816K 964K CPU/1 26:14 100% 100% myprog
2791 root 25 0 8816K 964K RUN/0 25:40 100% 100% myprog
Things are as expected now, with each program being bound to its
own CPU.
Now why this didn't happen by default is left as an exercise to the reader.
Hints that may help:
# uname -a
NetBSD foo.eu-west-1.compute.internal 7.0 NetBSD 7.0 (XEN3_DOMU.201509250726Z) amd64
# dmesg
...
hypervisor0 at mainbus0: Xen version 4.2.amazon
VIRQ_DEBUG interrupt using event channel 3
vcpu0 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
vcpu1 at hypervisor0: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, id 0x306e4
AWS Instance type: c3.large
AMI ID: NetBSD-x86_64-7.0-201511211930Z-20151121-1142 (ami-ac983ddf)
[Tags: amazon, aws, psrset, scheduler, smp, xen]
|