Discussion:
Syscalls
przemolicc-IjDXvh/
2012-07-06 20:22:03 UTC
Permalink
Hello,

we have been observing some high syscalls (much higher then usually) during fixed part of week.
DTrace shows that they mostly comes from Oracle:
...
uname exece 6788165162
oracle close 6904279356
dtrace ioctl 7987232206
oracle munmap 13857698724
emagent fork1 14217339236
oracle resolvepath 14438549232
oracle semsys 16447265715
tnslsnr fork1 18617374490
oracle stat 23129632013
oracle exece 23859088254
oracle pollsys 28102955920
oracle pread 32159134471
oracle mmap 35380305339
oracle times 35475706300
oracle open 42314061704
oracle memcntl 45026502910
oracle pwrite 51648592719
oracle read 72939284043
oracle write 95484269587
oracle ioctl 306634639800

TOTAL: 1060934672587

Using sar we can observe that system spends much more time in syscalls
(column 'sys') but number of syscalls is not higher then usually (column 'scall/s'):

server:/export/home/przemol>sar -f /var/adm/sa/sa04 -s 12:00 -e 15:30

SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012

12:00:03 %usr %sys %wio %idle
12:10:03 42 8 0 50
12:20:03 37 7 0 56
12:30:03 37 8 0 55
12:40:03 37 10 0 53
12:50:04 40 15 0 45
13:00:03 37 10 0 52
13:10:04 35 10 0 55
13:20:04 38 8 0 55
13:30:04 35 7 0 58
13:40:04 38 9 0 53
13:50:03 37 21 0 42
14:00:03 36 14 0 50
14:10:04 33 10 0 57
14:20:04 36 21 0 42
14:30:04 36 32 0 32
14:40:04 28 15 0 57
14:50:03 32 12 0 57
15:00:03 30 12 0 57
15:10:03 32 8 0 60
15:20:03 31 6 0 64

Average 35 12 0 52

server:/export/home/przemol>sar -c -f /var/adm/sa/sa04 -s 12:00 -e 15:30

SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012

12:00:03 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
12:10:03 107602 6567 4948 27.82 18.74 478579 4290812
12:20:03 96872 6411 4549 25.64 17.55 6949437 4295743
12:30:03 99273 6139 4792 25.69 17.76 723126 895552
12:40:03 95908 5950 4129 25.47 17.61 4020240 3585954
12:50:04 99792 5685 5319 24.52 16.69 4579288 4915466
13:00:03 90723 4927 3553 24.85 17.34 4490451 3330185
13:10:04 94744 6207 3688 25.22 17.40 6691959 3530345
13:20:04 93971 6105 4111 24.88 17.05 1220270 4051683
13:30:04 87887 6189 4418 24.45 17.15 3026237 4946326
13:40:04 92151 6032 4269 24.58 17.12 4982878 3595013
13:50:03 80146 5360 4101 23.22 16.10 448459 4052298
14:00:03 72998 4488 2142 23.69 16.69 4678268 3586545
14:10:04 76433 4031 2385 24.82 17.31 6913550 6157899
14:20:04 72935 4462 1994 23.49 16.25 3090017 5321098
14:30:04 72516 4124 1993 23.46 16.37 2942463 6993712
14:40:04 72057 3048 2344 23.95 16.77 3042674 2668180
14:50:03 73552 3370 2506 24.93 17.13 6398090 2906262
15:00:03 70869 3111 2049 25.39 17.55 5858628 3745978
15:10:03 80258 3643 2467 47.57 28.89 7126361 3621600
15:20:03 73211 3519 2693 24.66 16.99 4550096 2973429

Average 85197 4969 3423 25.91 17.72 4110499 3973336

Can I conclude that system spends much more time processing more or less the same number
of syscalls and time spent on one (statistically) syscall takes longer time ?
What is your opinion ?

Regards
Przemek
Mike Gerdts
2012-07-06 20:44:04 UTC
Permalink
Post by przemolicc-IjDXvh/
Hello,
we have been observing some high syscalls (much higher then usually) during fixed part of week.
...
uname exece 6788165162
oracle close 6904279356
dtrace ioctl 7987232206
oracle munmap 13857698724
emagent fork1 14217339236
oracle resolvepath 14438549232
oracle semsys 16447265715
tnslsnr fork1 18617374490
oracle stat 23129632013
oracle exece 23859088254
oracle pollsys 28102955920
oracle pread 32159134471
oracle mmap 35380305339
oracle times 35475706300
oracle open 42314061704
oracle memcntl 45026502910
oracle pwrite 51648592719
oracle read 72939284043
oracle write 95484269587
oracle ioctl 306634639800
TOTAL: 1060934672587
What was the dtrace script you used and over what period? How many
cpus in the system (according to psrinfo | wc -l)? Those numbers look
suspiciously high and make me think something is wrong with the dtrace
script.
Post by przemolicc-IjDXvh/
Using sar we can observe that system spends much more time in syscalls
server:/export/home/przemol>sar -f /var/adm/sa/sa04 -s 12:00 -e 15:30
SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012
12:00:03 %usr %sys %wio %idle
12:10:03 42 8 0 50
12:20:03 37 7 0 56
12:30:03 37 8 0 55
12:40:03 37 10 0 53
12:50:04 40 15 0 45
13:00:03 37 10 0 52
13:10:04 35 10 0 55
13:20:04 38 8 0 55
13:30:04 35 7 0 58
13:40:04 38 9 0 53
13:50:03 37 21 0 42
14:00:03 36 14 0 50
14:10:04 33 10 0 57
14:20:04 36 21 0 42
14:30:04 36 32 0 32
14:40:04 28 15 0 57
14:50:03 32 12 0 57
15:00:03 30 12 0 57
15:10:03 32 8 0 60
15:20:03 31 6 0 64
Average 35 12 0 52
server:/export/home/przemol>sar -c -f /var/adm/sa/sa04 -s 12:00 -e 15:30
SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012
12:00:03 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
12:10:03 107602 6567 4948 27.82 18.74 478579 4290812
12:20:03 96872 6411 4549 25.64 17.55 6949437 4295743
12:30:03 99273 6139 4792 25.69 17.76 723126 895552
12:40:03 95908 5950 4129 25.47 17.61 4020240 3585954
12:50:04 99792 5685 5319 24.52 16.69 4579288 4915466
13:00:03 90723 4927 3553 24.85 17.34 4490451 3330185
13:10:04 94744 6207 3688 25.22 17.40 6691959 3530345
13:20:04 93971 6105 4111 24.88 17.05 1220270 4051683
13:30:04 87887 6189 4418 24.45 17.15 3026237 4946326
13:40:04 92151 6032 4269 24.58 17.12 4982878 3595013
13:50:03 80146 5360 4101 23.22 16.10 448459 4052298
14:00:03 72998 4488 2142 23.69 16.69 4678268 3586545
14:10:04 76433 4031 2385 24.82 17.31 6913550 6157899
14:20:04 72935 4462 1994 23.49 16.25 3090017 5321098
14:30:04 72516 4124 1993 23.46 16.37 2942463 6993712
14:40:04 72057 3048 2344 23.95 16.77 3042674 2668180
14:50:03 73552 3370 2506 24.93 17.13 6398090 2906262
15:00:03 70869 3111 2049 25.39 17.55 5858628 3745978
15:10:03 80258 3643 2467 47.57 28.89 7126361 3621600
15:20:03 73211 3519 2693 24.66 16.99 4550096 2973429
Average 85197 4969 3423 25.91 17.72 4110499 3973336
It's averaging 85197 system calls per second. At that rate, 1
trillion system calls (1,060,934,672,587) would take around 144 CPU
days of system time. Seems kinda unlikely.

Perhaps the 1 trillion number is nanoseconds spent in the system
calls. That would turn into about 29 minutes of system time. That
could be, but whether it is bad depends a lot on the system and the
duration.
Post by przemolicc-IjDXvh/
Can I conclude that system spends much more time processing more or less the same number
of syscalls and time spent on one (statistically) syscall takes longer time ?
What is your opinion ?
Regards
Przemek
_______________________________________________
perf-discuss mailing list
--
Mike Gerdts
http://mgerdts.blogspot.com/
przemolicc-IjDXvh/
2012-07-06 21:44:39 UTC
Permalink
Post by Mike Gerdts
Post by przemolicc-IjDXvh/
Hello,
we have been observing some high syscalls (much higher then usually) during fixed part of week.
...
uname exece 6788165162
oracle close 6904279356
dtrace ioctl 7987232206
oracle munmap 13857698724
emagent fork1 14217339236
oracle resolvepath 14438549232
oracle semsys 16447265715
tnslsnr fork1 18617374490
oracle stat 23129632013
oracle exece 23859088254
oracle pollsys 28102955920
oracle pread 32159134471
oracle mmap 35380305339
oracle times 35475706300
oracle open 42314061704
oracle memcntl 45026502910
oracle pwrite 51648592719
oracle read 72939284043
oracle write 95484269587
oracle ioctl 306634639800
TOTAL: 1060934672587
What was the dtrace script you used and over what period?
#!/usr/sbin/dtrace -s

#pragma D option dynvarsize=64m

syscall:::entry
{
self->start = vtimestamp;
}
syscall:::return
/(int)arg0 != -1 && self->start/
{
this->time = vtimestamp - self->start;
@Time[execname,probefunc] = sum(this->time);
@TimeT["Total:"] = sum(this->time);
self->start = 0;
}
profile:::tick-5m
{
trunc (@Time, 20);
trunc (@TimeT, 20);
printa(@Time);
printa("\n TOTAL:%***@d", @TimeT);
clear(@Time);
clear(@TimeT);
exit(0);
}
END
{
clear (@Time);
clear (@TimeT);
}
Post by Mike Gerdts
How many
cpus in the system (according to psrinfo | wc -l)?
96
This is T2+ based server.
Post by Mike Gerdts
Those numbers look
suspiciously high and make me think something is wrong with the dtrace
script.
Post by przemolicc-IjDXvh/
Using sar we can observe that system spends much more time in syscalls
server:/export/home/przemol>sar -f /var/adm/sa/sa04 -s 12:00 -e 15:30
SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012
12:00:03 %usr %sys %wio %idle
12:10:03 42 8 0 50
12:20:03 37 7 0 56
12:30:03 37 8 0 55
12:40:03 37 10 0 53
12:50:04 40 15 0 45
13:00:03 37 10 0 52
13:10:04 35 10 0 55
13:20:04 38 8 0 55
13:30:04 35 7 0 58
13:40:04 38 9 0 53
13:50:03 37 21 0 42
14:00:03 36 14 0 50
14:10:04 33 10 0 57
14:20:04 36 21 0 42
14:30:04 36 32 0 32
14:40:04 28 15 0 57
14:50:03 32 12 0 57
15:00:03 30 12 0 57
15:10:03 32 8 0 60
15:20:03 31 6 0 64
Average 35 12 0 52
server:/export/home/przemol>sar -c -f /var/adm/sa/sa04 -s 12:00 -e 15:30
SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012
12:00:03 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
12:10:03 107602 6567 4948 27.82 18.74 478579 4290812
12:20:03 96872 6411 4549 25.64 17.55 6949437 4295743
12:30:03 99273 6139 4792 25.69 17.76 723126 895552
12:40:03 95908 5950 4129 25.47 17.61 4020240 3585954
12:50:04 99792 5685 5319 24.52 16.69 4579288 4915466
13:00:03 90723 4927 3553 24.85 17.34 4490451 3330185
13:10:04 94744 6207 3688 25.22 17.40 6691959 3530345
13:20:04 93971 6105 4111 24.88 17.05 1220270 4051683
13:30:04 87887 6189 4418 24.45 17.15 3026237 4946326
13:40:04 92151 6032 4269 24.58 17.12 4982878 3595013
13:50:03 80146 5360 4101 23.22 16.10 448459 4052298
14:00:03 72998 4488 2142 23.69 16.69 4678268 3586545
14:10:04 76433 4031 2385 24.82 17.31 6913550 6157899
14:20:04 72935 4462 1994 23.49 16.25 3090017 5321098
14:30:04 72516 4124 1993 23.46 16.37 2942463 6993712
14:40:04 72057 3048 2344 23.95 16.77 3042674 2668180
14:50:03 73552 3370 2506 24.93 17.13 6398090 2906262
15:00:03 70869 3111 2049 25.39 17.55 5858628 3745978
15:10:03 80258 3643 2467 47.57 28.89 7126361 3621600
15:20:03 73211 3519 2693 24.66 16.99 4550096 2973429
Average 85197 4969 3423 25.91 17.72 4110499 3973336
It's averaging 85197 system calls per second. At that rate, 1
trillion system calls (1,060,934,672,587) would take around 144 CPU
days of system time. Seems kinda unlikely.
Perhaps the 1 trillion number is nanoseconds spent in the system
calls. That would turn into about 29 minutes of system time. That
could be, but whether it is bad depends a lot on the system and the
duration.
Post by przemolicc-IjDXvh/
Can I conclude that system spends much more time processing more or less the same number
of syscalls and time spent on one (statistically) syscall takes longer time ?
What is your opinion ?
Regards
Przemek
_______________________________________________
perf-discuss mailing list
--
Mike Gerdts
http://mgerdts.blogspot.com/
przemolicc-IjDXvh/
2012-07-11 13:52:37 UTC
Permalink
Hello,

I have setup dtrace script which is run every 10 minutes:
0,10,20,30,40,50 * * * * /root/bin/sum.sh
(the dtrace script is below).
It runs for 5 min. and saves result in a file (so 5 out of 10 min is recorded) and records the following data:
dtrace ioctl 6265410485
uname exece 6460327273
oracle semsys 7842510331
oracle munmap 9508009506
oracle resolvepath 9757628007
tnslsnr fork1 13248716831
emagent fork1 14051647077
oracle stat 15720295909
oracle exece 16459326038
oracle pollsys 19301262569
oracle mmap 20975464662
oracle times 23059663461
oracle pwrite 25491331888
oracle open 30196916271
oracle memcntl 31122380062
oracle pread 40277198389
oracle read 64166456399
oracle write 67988846272
oracle ioctl 180151590889

TOTAL: 770149631798

When I grep for TOTAL in these log files I get:
2012-07-10-13:00: TOTAL: 900883959215
2012-07-10-13:10: TOTAL: 824011512465
2012-07-10-13:20: TOTAL: 839473060028
2012-07-10-13:30: TOTAL: 880047194365
2012-07-10-13:40: TOTAL: 5236046549
2012-07-10-13:50: TOTAL: 949408615917
2012-07-10-14:00: TOTAL: 957321356582
2012-07-10-14:10: TOTAL: 805430788624
2012-07-10-14:20: TOTAL: 678869217982
2012-07-10-14:30: TOTAL: 710839026697
2012-07-10-14:40: TOTAL: 816557964466
2012-07-10-14:50: TOTAL: 880681668064
2012-07-10-15:00: TOTAL: 986025103759

While running 'sar' for the _same_ time period it says:
13:00:04 38 10 0 52
13:10:04 36 10 0 54
13:20:04 36 11 0 53
13:30:03 35 12 0 53
13:40:04 34 13 0 53
13:50:04 35 20 0 45
14:00:04 36 23 0 41
14:10:04 29 9 0 62
14:20:04 30 9 0 61
14:30:03 26 8 0 67
14:40:04 27 8 0 65
14:50:04 32 9 0 59
15:00:04 31 9 0 60

So 'sar' data shows increase in syscall time while dtrace data doesn't.
Can anybody explain me that ? Of course it might be that something is happening in theses five
minutes period which is not recorded by dtrace. But this is typical OLTP database server with many Oracles
running on it and such behaviour is rather unrealistic ...

Best regards
Przemek
Post by przemolicc-IjDXvh/
Post by Mike Gerdts
Post by przemolicc-IjDXvh/
Hello,
we have been observing some high syscalls (much higher then usually) during fixed part of week.
...
uname exece 6788165162
oracle close 6904279356
dtrace ioctl 7987232206
oracle munmap 13857698724
emagent fork1 14217339236
oracle resolvepath 14438549232
oracle semsys 16447265715
tnslsnr fork1 18617374490
oracle stat 23129632013
oracle exece 23859088254
oracle pollsys 28102955920
oracle pread 32159134471
oracle mmap 35380305339
oracle times 35475706300
oracle open 42314061704
oracle memcntl 45026502910
oracle pwrite 51648592719
oracle read 72939284043
oracle write 95484269587
oracle ioctl 306634639800
TOTAL: 1060934672587
What was the dtrace script you used and over what period?
#!/usr/sbin/dtrace -s
#pragma D option dynvarsize=64m
syscall:::entry
{
self->start = vtimestamp;
}
syscall:::return
/(int)arg0 != -1 && self->start/
{
this->time = vtimestamp - self->start;
@Time[execname,probefunc] = sum(this->time);
@TimeT["Total:"] = sum(this->time);
self->start = 0;
}
profile:::tick-5m
{
exit(0);
}
END
{
}
Post by Mike Gerdts
How many
cpus in the system (according to psrinfo | wc -l)?
96
This is T2+ based server.
Post by Mike Gerdts
Those numbers look
suspiciously high and make me think something is wrong with the dtrace
script.
Post by przemolicc-IjDXvh/
Using sar we can observe that system spends much more time in syscalls
server:/export/home/przemol>sar -f /var/adm/sa/sa04 -s 12:00 -e 15:30
SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012
12:00:03 %usr %sys %wio %idle
12:10:03 42 8 0 50
12:20:03 37 7 0 56
12:30:03 37 8 0 55
12:40:03 37 10 0 53
12:50:04 40 15 0 45
13:00:03 37 10 0 52
13:10:04 35 10 0 55
13:20:04 38 8 0 55
13:30:04 35 7 0 58
13:40:04 38 9 0 53
13:50:03 37 21 0 42
14:00:03 36 14 0 50
14:10:04 33 10 0 57
14:20:04 36 21 0 42
14:30:04 36 32 0 32
14:40:04 28 15 0 57
14:50:03 32 12 0 57
15:00:03 30 12 0 57
15:10:03 32 8 0 60
15:20:03 31 6 0 64
Average 35 12 0 52
server:/export/home/przemol>sar -c -f /var/adm/sa/sa04 -s 12:00 -e 15:30
SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012
12:00:03 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
12:10:03 107602 6567 4948 27.82 18.74 478579 4290812
12:20:03 96872 6411 4549 25.64 17.55 6949437 4295743
12:30:03 99273 6139 4792 25.69 17.76 723126 895552
12:40:03 95908 5950 4129 25.47 17.61 4020240 3585954
12:50:04 99792 5685 5319 24.52 16.69 4579288 4915466
13:00:03 90723 4927 3553 24.85 17.34 4490451 3330185
13:10:04 94744 6207 3688 25.22 17.40 6691959 3530345
13:20:04 93971 6105 4111 24.88 17.05 1220270 4051683
13:30:04 87887 6189 4418 24.45 17.15 3026237 4946326
13:40:04 92151 6032 4269 24.58 17.12 4982878 3595013
13:50:03 80146 5360 4101 23.22 16.10 448459 4052298
14:00:03 72998 4488 2142 23.69 16.69 4678268 3586545
14:10:04 76433 4031 2385 24.82 17.31 6913550 6157899
14:20:04 72935 4462 1994 23.49 16.25 3090017 5321098
14:30:04 72516 4124 1993 23.46 16.37 2942463 6993712
14:40:04 72057 3048 2344 23.95 16.77 3042674 2668180
14:50:03 73552 3370 2506 24.93 17.13 6398090 2906262
15:00:03 70869 3111 2049 25.39 17.55 5858628 3745978
15:10:03 80258 3643 2467 47.57 28.89 7126361 3621600
15:20:03 73211 3519 2693 24.66 16.99 4550096 2973429
Average 85197 4969 3423 25.91 17.72 4110499 3973336
It's averaging 85197 system calls per second. At that rate, 1
trillion system calls (1,060,934,672,587) would take around 144 CPU
days of system time. Seems kinda unlikely.
Perhaps the 1 trillion number is nanoseconds spent in the system
calls. That would turn into about 29 minutes of system time. That
could be, but whether it is bad depends a lot on the system and the
duration.
Post by przemolicc-IjDXvh/
Can I conclude that system spends much more time processing more or less the same number
of syscalls and time spent on one (statistically) syscall takes longer time ?
What is your opinion ?
Regards
Przemek
_______________________________________________
perf-discuss mailing list
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
perf-discuss mailing list
--
Regards
Przemyslaw Bak (przemol)
--
http://przemol.blogspot.com/
Loading...