Discussion:
Syscall for posix_spawn?
ольга крыжановская
2010-07-12 03:46:34 UTC
Permalink
Perf team, has any one been interested in implementing posix_spawn()
as system call to improve performance? posix_spawn() is implemented as
sequence of vforkx and execve currently with many intermediate system
calls around it. From by looking at it and the kernel code of vforkx
and execve it might be beneficial to turn the whole thing into one
single system call.

Olga
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska-***@public.gmane.org \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
Phil Harman
2010-07-12 08:28:14 UTC
Permalink
This was one of my pet peeves during my 20 years at Sun. The main reason I wanted spawn(2) was for multithreaded apps needing to start new processes. The issue was that all LWPs need to be quiesced before even a vfork(2), which meant that something as simple as just calling system(3) might stop the world. The standard response from the gurus was ETOOHARD, followed by a good dose of FUD about how fundamental fork/exec was to UNIX, and that we couldn't be sure a spawned process would be exactly the same, "think of all the implicit inheritance". The proposed workaround was always the same: "your multithreaded app should use a fork helper co-process to do its spawning for it".

I salute you and wish you well!
Post by ольга крыжановская
Perf team, has any one been interested in implementing posix_spawn()
as system call to improve performance? posix_spawn() is implemented as
sequence of vforkx and execve currently with many intermediate system
calls around it. From by looking at it and the kernel code of vforkx
and execve it might be beneficial to turn the whole thing into one
single system call.
Olga
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
_______________________________________________
perf-discuss mailing list
_______________________________________________
perf-disc
ольга крыжановская
2010-07-12 12:05:21 UTC
Permalink
I don't want spawn(2), I want posix_spawn(2). The last is more
powerful and would reduce the number of system calls, possibilities
for race conditions and execute faster than the bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
vfork(), exec() applications and shells have to use today.

Olga
Post by Phil Harman
This was one of my pet peeves during my 20 years at Sun. The main reason I wanted spawn(2) was for multithreaded apps needing to start new processes. The issue was that all LWPs need to be quiesced before even a vfork(2), which meant that something as simple as just calling system(3) might stop the world. The standard response from the gurus was ETOOHARD, followed by a good dose of FUD about how fundamental fork/exec was to UNIX, and that we couldn't be sure a spawned process would be exactly the same, "think of all the implicit inheritance". The proposed workaround was always the same: "your multithreaded app should use a fork helper co-process to do its spawning for it".
I salute you and wish you well!
Post by ольга крыжановская
Perf team, has any one been interested in implementing posix_spawn()
as system call to improve performance? posix_spawn() is implemented as
sequence of vforkx and execve currently with many intermediate system
calls around it. From by looking at it and the kernel code of vforkx
and execve it might be beneficial to turn the whole thing into one
single system call.
Olga
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
_______________________________________________
perf-discuss mailing list
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
_______________________________________________
perf-discuss maili
Phil Harman
2010-07-12 13:56:53 UTC
Permalink
I was merely saying we need a spawn() equivalent to fork()/exec() which
doesn't suffer the "stop all LWPs" limitation of vfork(). It would
certainly make sense for such a syscall to support a faster
implementations of posix_spawn(3), but the latter doesn't have to be a
syscall per se.

Phil

p.s. I hadn't noticed that Roger was on the Cc list when I replied. I'd
just like to make to clear that Roger could never be accused of an
ETOOHARD response to any challenge. Indeed, it is Roger we have to thank
for so many things (e.g. unified process model) that others considered
ETOOHARD!
Post by ольга крыжановская
I don't want spawn(2), I want posix_spawn(2). The last is more
powerful and would reduce the number of system calls, possibilities
for race conditions and execute faster than the bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(), bla(),
vfork(), exec() applications and shells have to use today.
Olga
Post by Phil Harman
This was one of my pet peeves during my 20 years at Sun. The main reason I wanted spawn(2) was for multithreaded apps needing to start new processes. The issue was that all LWPs need to be quiesced before even a vfork(2), which meant that something as simple as just calling system(3) might stop the world. The standard response from the gurus was ETOOHARD, followed by a good dose of FUD about how fundamental fork/exec was to UNIX, and that we couldn't be sure a spawned process would be exactly the same, "think of all the implicit inheritance". The proposed workaround was always the same: "your multithreaded app should use a fork helper co-process to do its spawning for it".
I salute you and wish you well!
Post by ольга крыжановская
Perf team, has any one been interested in implementing posix_spawn()
as system call to improve performance? posix_spawn() is implemented as
sequence of vforkx and execve currently with many intermediate system
calls around it. From by looking at it and the kernel code of vforkx
and execve it might be beneficial to turn the whole thing into one
single system call.
Olga
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
_______________________________________________
perf-discuss mailing list
ольга крыжановская
2010-07-13 23:41:40 UTC
Permalink
This post might be inappropriate. Click to display it.
Bart Smaalders
2010-07-14 00:18:52 UTC
Permalink
Post by ольга крыжановская
posix_spawn() is a Swiss army knife which can replace al most all
system calls needed to execute an external utility.
The bla() refers to signal, I/O, process group, terminal ownership and
all other tasks a shell has to do to start an external utility. These
are 20+ system calls.
Olga
So?

You still have to do all the same work in terms of setting all those
behaviors - one size doesn't fit all. As a result, the real
savings here is the elimination of 20 system call traps - not
exactly a lot of time compared to the cost of spawning a process.

- Bart
--
Bart Smaalders Solaris Kernel Performance
***@oracle.com http://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."
_______________________________________________
perf
ольга крыжановская
2010-07-13 23:50:39 UTC
Permalink
My concern is not threading, not yet, until ksh93's multi thread
support lands in Solaris.
My concern is that fork and exec do double work or fork does jobs
which are then un done by exec again, but I can only speak for the BSD
implementation of fork and exec, not Solaris.
A posix_spawn system call would eliminate much of this madness.

Olga

On Wed, Jul 14, 2010 at 1:41 AM, Nicolas Williams
of order 150 ns. Given the general expense in creating a new process by
any means, this would not be my first place to look for performance
wins.
But the overhead of vfork() is probably much higher when the parent has
multiple threads, since they must all get stopped first, and then later
they must all be made runnable again. Java, Firefox, Thunderbird, all
come to mind as programs that might benefit from a native posix_spawn(2),
but first we need to see them using posix_spawn(3C).
Since you're the third person to assert that the overhead of stopping
lwps is prohibitive, would you please explain how it's possible to
safely fork a process if all of the lwps are still running?
I didn't say prohibitive, but clearly having to stop N threads, some or
all of which are possibly running on other CPUs (xcalls) is more
expensive than not having to at all. Whether that's prohibitive, or
whether there exist apps for which that even a small cost of
stopping/restarting the parent would add up to a lot of CPU time, I
don't know and leave to others to answer.
As to your question, posix_spawn(2) wouldn't fork at all under the
covers: as a syscall it'd not have to. It'd create a new process as a
child of the parent, but it'd not use the parent's address space at all.
There'd still be some amount of locking to copy the file descriptor
table and so on, of course, but nothing quite like the copying or
borrowing of the parent's address space (almost certainly the most
expensive part of fork(2)/ vfork(2)).
The vfork[x](2) code that's in posix_spawn right now already sets the
isfork1 bit, which tells cfork to be less restrictive about just where
exactly the process is stopped.
Surely all threads must be stopped by vfork() (the manpage says "[t]he
parent process is suspended while the child is using its resources").
Regardless of where the stopping points might be, surely a solution that
doesn't require stopping the parent's threads at all would be faster.
Correctness > performance.
Indeed. But please tell us where I proposed anything that might
comprimise our ability to be correct. You seem to have missed the point
that, if implemented as a system call, posix_spawn(2) can avoid
altogether the need to "fork" in the traditional sense.
Nico
--
_______________________________________________
on-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/on-discuss
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska-***@public.gmane.org \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
ольга крыжановская
2010-07-14 00:14:25 UTC
Permalink
Nicolas, I know that posix_spawn exists now in Solaris.
But I measured that it has no benefit over the traditional vfork and
exec sequence of calls, but the authors of posix_spawn designed it to
be an improvement if *implemented* in the kernel. The lack of
performance difference between posix_spawn and vfork,exec and the
*very* bad performance of vfork,exec on Solaris - compared to Linux on
the same hardware, where 2.6.25.20-0.5-default can spawn utilities al
most 70% faster than Opensolaris B134 - has lead to my request to find
some one to try and improve the situation.

Olga
Post by ольга крыжановская
My concern is not threading, not yet, until ksh93's multi thread
support lands in Solaris.
My concern is that fork and exec do double work or fork does jobs
which are then un done by exec again, but I can only speak for the BSD
implementation of fork and exec, not Solaris.
A posix_spawn system call would eliminate much of this madness.
Olga,
posix_spawn(3C) exists _now_ in Solaris, and it uses vforkx(2).
vforkx(2), like vfork(2), doesn't copy/COW the parent's address space,
but it does have to stop the parent's threads as a trade-off. The
address space COW/copy of traditional fork is almost certainly the most
expensive part of forking, which is precisely why posix_spawn(3C) uses
vforkx(2) under the covers.
A posix_spawn(2) system call would be able to save only the work of
stopping, and subsequently resuming the parent process' threads, and a
few syscall traps for the fcntl(2) and a few other system calls that
posix_spawn(3C) has to make.
The question is: is there a real problem here? Or is this just an
imperfection that has caught your eyes but which might not actually be a
real problem?
Again, I could see that Java apps, Firefox and a handful of other
applications could benefit from a posix_spawn(2), but I suspect these
apps spawn children rarely enough that a user-land posix_spawn() is not
a problem.
My advice is: worry about real problems.
Nico
--
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
_______________________________________________
Bart Smaalders
2010-07-14 00:20:46 UTC
Permalink
Post by ольга крыжановская
Nicolas, I know that posix_spawn exists now in Solaris.
But I measured that it has no benefit over the traditional vfork and
exec sequence of calls, but the authors of posix_spawn designed it to
be an improvement if *implemented* in the kernel. The lack of
performance difference between posix_spawn and vfork,exec and the
*very* bad performance of vfork,exec on Solaris - compared to Linux on
the same hardware, where 2.6.25.20-0.5-default can spawn utilities al
most 70% faster than Opensolaris B134 - has lead to my request to find
some one to try and improve the situation.
What does DTrace tell you about where the time is being spent?
Instead of asserting where the problem is in email, how about
doing some investigation?

- Bart
--
Bart Smaalders Solaris Kernel Performance
***@oracle.com http://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."
ольга крыжановская
2010-07-14 02:13:39 UTC
Permalink
I am the wrong person to do such an investigation - I am no Solaris
kernel internals expert.

Olga
Post by Bart Smaalders
Post by ольга крыжановская
Nicolas, I know that posix_spawn exists now in Solaris.
But I measured that it has no benefit over the traditional vfork and
exec sequence of calls, but the authors of posix_spawn designed it to
be an improvement if *implemented* in the kernel. The lack of
performance difference between posix_spawn and vfork,exec and the
*very* bad performance of vfork,exec on Solaris - compared to Linux on
the same hardware, where 2.6.25.20-0.5-default can spawn utilities al
most 70% faster than Opensolaris B134 - has lead to my request to find
some one to try and improve the situation.
What does DTrace tell you about where the time is being spent?
Instead of asserting where the problem is in email, how about
doing some investigation?
- Bart
--
Bart Smaalders Solaris Kernel Performance
"You will contribute more with mercurial than with thunderbird."
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
Bart Smaalders
2010-07-14 02:32:07 UTC
Permalink
Post by ольга крыжановская
I am the wrong person to do such an investigation - I am no Solaris
kernel internals expert.
Post by Bart Smaalders
...
What does DTrace tell you about where the time is being spent?
Instead of asserting where the problem is in email, how about
doing some investigation?
- Bart
In that case, your suggestion that we should improve process fork
rates is noted, while suggestions to reduce the number of system
calls as the best place to start is given less weight :-).

Seriously, re-implementing spawn as a system call is a lot of work,
and would result in the need to maintain dual code paths (spawn &
fork/exec) that would need the exact same behavior in terms of
inheritance of characteristics, etc. This is a significant on-going
maintenance burden, and the kernel folks are generally loathe to do
so unless there were advantages, not just better shell script
performance.

There are other, more generically useful performance improvements
being made in the VM system that will help this; given our limited
resources we'd rather focus on projects that have good paybacks in
multiple areas.

- Bart
--
Bart Smaalders Solaris Kernel Performance
***@oracle.com http://blogs.sun.com/barts
"You will contribute more with mercurial than with thunderbird."
ольга крыжановская
2010-07-14 03:09:22 UTC
Permalink
Reducing the *number* of system calls is not the solution. The problem
is that vfork and exec don't know that they are called in a sequence.
This and the surrounding system calls work all independent from each
but touch the same or related data pools. IMO there is a lot room for
optimisations, such as less locking, less touched pages and kernel
having the authority and *knowledge* in which *order* things are done
the way best, and *consolidation* if a single call - this is
posix_spawn - passes all data in one step.

I agree that Solaris would have to main tain a code path for
vfork,exec and an other for posix_spawn, but IMO it is worth the work
because modern shells are moving to use posix_spawn, ksh93 was the
first and the rest, like bash and tcsh, are following now. Shell
script performance is still very important - if I understand the
comments of Sun sustaining it is missing critical - and any
optimisation is welcome and important for business. ksh93 tackled part
of the problem by introducing many POSIX utilities as shell built ins,
but not all utilities can or should be provided as built ins and not
all scripts use ksh.

I doubt that there is some thing to tune in the VM system to improve
vfork,exec performance: vfork looks straightforward and the problems
with exec - such as the number of cross calls to other cpus
representing a bottleneck for scalability on machines with many cpus -
require a total new architecture of the way how Solaris manages the
virtual memory. This is there fore unrealistic and not going to happen
any time soon, if any time.

Olga
Post by Bart Smaalders
Post by ольга крыжановская
I am the wrong person to do such an investigation - I am no Solaris
kernel internals expert.
Post by Bart Smaalders
...
What does DTrace tell you about where the time is being spent?
Instead of asserting where the problem is in email, how about
doing some investigation?
- Bart
In that case, your suggestion that we should improve process fork
rates is noted, while suggestions to reduce the number of system
calls as the best place to start is given less weight :-).
Seriously, re-implementing spawn as a system call is a lot of work,
and would result in the need to maintain dual code paths (spawn & fork/exec)
that would need the exact same behavior in terms of
inheritance of characteristics, etc. This is a significant on-going
maintenance burden, and the kernel folks are generally loathe to do
so unless there were advantages, not just better shell script
performance.
There are other, more generically useful performance improvements
being made in the VM system that will help this; given our limited
resources we'd rather focus on projects that have good paybacks in
multiple areas.
- Bart
--
Bart Smaalders Solaris Kernel Performance
"You will contribute more with mercurial than with thunderbird."
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
j***@public.gmane.org
2010-07-14 17:53:20 UTC
Permalink
This post might be inappropriate. Click to display it.
j***@public.gmane.org
2010-07-14 18:28:17 UTC
Permalink
One possible improvement that comes to mind would be a private vforkx()
flag to request that threads in the parent other than the one calling
vforkx() not be stopped. The child would have to be extremely careful
to not call malloc() or do anything that might disturb the state of the
parent beyond the current stack frame, which is why such a flag would
have to be private and undocumented. But this should not be considered
until someone first shows that posix_spawn() performance is a problem
and that the parent thread stop/resume overhead is the biggest ticket
item.
It's significantly more complicated than that. You'd have to ensure
that there are no race conditions between the atfork handlers, rtld,
libc, and any application specific locking. The vfork manual is already
pretty clear that all bets are off should you use this in a
multi-threaded application:

The use of vfork() or vforkx() in multithreaded applica-
tions, however, is unsafe due to race conditions that can
cause the child process to become deadlocked and conse-
quently block both the child and parent process from execu-
tion indefinitely.

A solution that creates a new process without a call to fork is the best
idea I've heard so far. Unfortunately, as Bart has already observed,
it's a lot of work for a gain that has yet to be quantified.

-j
ольга крыжановская
2010-07-14 22:25:50 UTC
Permalink
James, ksh93 is getting thread support this year. The current ksh93u-
alpha has already much of the code changes required for thread support
and at the end of the ksh93u development cycle there will be a build
switch to choose between thread and no thread support. The switch will
be ON by default for ksh93v.

Olga
Post by j***@public.gmane.org
One possible improvement that comes to mind would be a private vforkx()
flag to request that threads in the parent other than the one calling
vforkx() not be stopped. The child would have to be extremely careful
to not call malloc() or do anything that might disturb the state of the
parent beyond the current stack frame, which is why such a flag would
have to be private and undocumented. But this should not be considered
until someone first shows that posix_spawn() performance is a problem
and that the parent thread stop/resume overhead is the biggest ticket
item.
It's significantly more complicated than that. You'd have to ensure
that there are no race conditions between the atfork handlers, rtld,
libc, and any application specific locking. The vfork manual is already
pretty clear that all bets are off should you use this in a
The use of vfork() or vforkx() in multithreaded applica-
tions, however, is unsafe due to race conditions that can
cause the child process to become deadlocked and conse-
quently block both the child and parent process from execu-
tion indefinitely.
A solution that creates a new process without a call to fork is the best
idea I've heard so far. Unfortunately, as Bart has already observed,
it's a lot of work for a gain that has yet to be quantified.
Plus, the original poster was apparently mostly concerned about ksh93's
performance, and making the heavy MT case work faster won't help that
application a bit, because ksh93 isn't making use of threads.
I think having the original poster come up with a test case of interest
(something that runs significantly faster on "all other platforms") and
then gathering basic data on that test case (where does a profiler say
the system is spending its time?) would be a much better plan then
speculating wildly on which magic new system call will make everything
run better.
--
_______________________________________________
on-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/on-discuss
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska-***@public.gmane.org \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
ольга крыжановская
2010-07-14 22:45:54 UTC
Permalink
The posix_spawn implementation in libc uses vfork. How is the POSIX
requirement that posix_spawn must be safe in multi threaded
application met if it relies on vfork which is *not* reliable in multi
threaded applications?

Olga
Post by j***@public.gmane.org
One possible improvement that comes to mind would be a private vforkx()
flag to request that threads in the parent other than the one calling
vforkx() not be stopped. The child would have to be extremely careful
to not call malloc() or do anything that might disturb the state of the
parent beyond the current stack frame, which is why such a flag would
have to be private and undocumented. But this should not be considered
until someone first shows that posix_spawn() performance is a problem
and that the parent thread stop/resume overhead is the biggest ticket
item.
It's significantly more complicated than that. You'd have to ensure
that there are no race conditions between the atfork handlers, rtld,
libc, and any application specific locking. The vfork manual is already
pretty clear that all bets are off should you use this in a
The use of vfork() or vforkx() in multithreaded applica-
tions, however, is unsafe due to race conditions that can
cause the child process to become deadlocked and conse-
quently block both the child and parent process from execu-
tion indefinitely.
A solution that creates a new process without a call to fork is the best
idea I've heard so far. Unfortunately, as Bart has already observed,
it's a lot of work for a gain that has yet to be quantified.
-j
_______________________________________________
on-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/on-discuss
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska-***@public.gmane.org \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
ольга крыжановская
2010-07-14 23:04:07 UTC
Permalink
Thank you for the explanation.

Olga
Post by ольга крыжановская
The posix_spawn implementation in libc uses vfork. How is the POSIX
requirement that posix_spawn must be safe in multi threaded
application met if it relies on vfork which is *not* reliable in multi
threaded applications?
vfork()/vforkx() can be used in thread-unsafe ways. Therefore they
marked as MT-Unsafe.
They can also be used in thread-safe ways. posix_spawn(3C) does use
vforkx() in a thread-safe way. How? By not doing anything to interfere
with the state of the parent[*] while the child is still executing in the
borrowed address space.
[*] Well, posix_spawn() does make use of stack space on the thread in
which it was called, but unless there's code in the parent that
makes use of uninitialized local variables, the parent won't be
affected by this. And the function call frame of posix_spawn() may
be modified, but not in any way that will cause any errors in the
parent.
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ ***@gmail.com \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
ольга крыжановская
2010-07-14 23:27:06 UTC
Permalink
A fairly simple - but not perfectly accurate - test case would be to
run ksh93 on Linux and Solaris and let it call /bin/true in a loop.

The numbers which follow are for identical ksh93 versions built with
the same gcc version and /bin/true a simple main() { return 0 ;}:

1. Timing values for Solaris B134 on bare metal hardware:
time ksh -c 'integer i ; for (( i=0 ; i < 100000 ; i++ )) ; do /bin/true ; done'

real 9m29.75s
user 0m8.55s
sys 2m46.89s

2. Timing values for Linux on the same hardware:
time ksh -c 'integer i ; for (( i=0 ; i < 100000 ; i++ )) ; do /bin/true ; done'

real 1m26.964s
user 0m12.314s
sys 0m42.831s

Olga
Post by j***@public.gmane.org
One possible improvement that comes to mind would be a private vforkx()
flag to request that threads in the parent other than the one calling
vforkx() not be stopped. The child would have to be extremely careful
to not call malloc() or do anything that might disturb the state of the
parent beyond the current stack frame, which is why such a flag would
have to be private and undocumented. But this should not be considered
until someone first shows that posix_spawn() performance is a problem
and that the parent thread stop/resume overhead is the biggest ticket
item.
It's significantly more complicated than that. You'd have to ensure
that there are no race conditions between the atfork handlers, rtld,
libc, and any application specific locking. The vfork manual is already
pretty clear that all bets are off should you use this in a
The use of vfork() or vforkx() in multithreaded applica-
tions, however, is unsafe due to race conditions that can
cause the child process to become deadlocked and conse-
quently block both the child and parent process from execu-
tion indefinitely.
A solution that creates a new process without a call to fork is the best
idea I've heard so far. Unfortunately, as Bart has already observed,
it's a lot of work for a gain that has yet to be quantified.
Plus, the original poster was apparently mostly concerned about ksh93's
performance, and making the heavy MT case work faster won't help that
application a bit, because ksh93 isn't making use of threads.
I think having the original poster come up with a test case of interest
(something that runs significantly faster on "all other platforms") and
then gathering basic data on that test case (where does a profiler say
the system is spending its time?) would be a much better plan then
speculating wildly on which magic new system call will make everything
run better.
--
_______________________________________________
on-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/on-discuss
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska-***@public.gmane.org \-`\-'----.
`'-..-| / http://twitter.com/fleyta \ |-..-'`
/\/\ Solaris/BSD//C/C++ programmer /\/\
`--` `--`
j***@public.gmane.org
2010-07-15 01:42:48 UTC
Permalink
Post by ольга крыжановская
A fairly simple - but not perfectly accurate - test case would be to
run ksh93 on Linux and Solaris and let it call /bin/true in a loop.
The numbers which follow are for identical ksh93 versions built with
time ksh -c 'integer i ; for (( i=0 ; i < 100000 ; i++ )) ; do /bin/true ; done'
real 9m29.75s
user 0m8.55s
sys 2m46.89s
The real time number here doesn't look right. In particular, you're
only using < 3m of CPU time, but you're spending almost 9.5m wall time
to complete the operation.

When I run this command on my system, I get the following timing data:

real 53.750666106
user 14.188122722
sys 30.753105797

This still isn't as fast as Linux, but it's not 9 minutes.

When I trussed this, I got the following output:

syscall seconds calls errors
_exit .000 100001
read .000 1
open .833 200009 100002
close .229 100007
time .000 1
brk .000 8
stat .000 12 3
lseek .000 1
getpid .291 200004
getuid .000 2
fstat .000 8
access .000 1
getsid .153 100000
getgid .000 2
sysi86 .169 100001
ioctl .407 200004
execve 8.471 100002
umask .000 2
fcntl .000 1
readlink .000 1
sigprocmask .145 100000
sigaction .187 100003
sigfillset .000 1
getcontext .213 100002
setcontext .221 99999
setustack 2.645 100002
waitid 1.171 399786 299786
mmap 1.823 600017
mmapobj 1.843 100006
getrlimit .176 100002
memcntl .932 300011 1
sysconfig .142 100008
sysinfo .165 100003
vforkx 1.531 100000
yield .000 1
lwp_sigmask .934 600001
lwp_private .148 100002
schedctl .147 100001
resolvepath 1.606 300010
stat64 .763 200002
-------- ------ ----
sys totals: 25.359 4699925 399792
usr time: 16.526
elapsed: 222.140

Keep in mind that the timing from truss has a lot of probe effect, so
don't consider these numbers the actual speed. However, the system call
that's using the most time here is execve, not vfork. The wall time is
still much higher than the CPU time, so I would investigate where this
process is waiting.

A couple of possibilities:

1. The swap reservations are done differently on Linux than Solaris. It
might be the case that you're running out of memory, or waiting for swap
to become available.

2. Perhaps ksh is waiting for the wrong thing to complete here?

I would dtrace for what's causing your processes to go off CPU and
aggregate for those stacks that are consuming the most time waiting.

HTH,

-j
Bob Friesenhahn
2010-07-15 14:12:24 UTC
Permalink
Post by ольга крыжановская
A fairly simple - but not perfectly accurate - test case would be to
run ksh93 on Linux and Solaris and let it call /bin/true in a loop.
The numbers which follow are for identical ksh93 versions built with
Note that Solaris application startup is pretty complex so there are
compile/link options which may substantially alter how quickly a
program executes. This is more true of large programs than tiny ones.
It may be that the execution time of /bin/true has not been optimized.

Bob
--
Bob Friesenhahn
bfriesen-***@public.gmane.org, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
j***@public.gmane.org
2010-07-15 01:17:16 UTC
Permalink
I think having the original poster come up with a test case of interest
(something that runs significantly faster on "all other platforms") and
then gathering basic data on that test case (where does a profiler say
the system is spending its time?) would be a much better plan then
speculating wildly on which magic new system call will make everything
run better.
It sounds like you, Nico, Bart, and I are all in agreement on this
point. :)

-j
j***@public.gmane.org
2010-07-14 18:24:10 UTC
Permalink
As to your question, posix_spawn(2) wouldn't fork at all under the
covers: as a syscall it'd not have to. It'd create a new process as a
child of the parent, but it'd not use the parent's address space at all.
There'd still be some amount of locking to copy the file descriptor
table and so on, of course, but nothing quite like the copying or
borrowing of the parent's address space (almost certainly the most
expensive part of fork(2)/ vfork(2)).
I'd like to see data for this assertion too. The code that's executed
for vfork performs very few address space manipulations while borrowing
the parent's address space. All it does today is clear and restore
watched pages. Most processes don't have watchpoints on their pages.
"Borrowing the parent's address space" means stopping all of the
parent's threads, at least for vfork() because that's what the
documented semantics of it are. I've explained this elsewhere in this
thread. I'm sorry you thought that I meant that vfork() does expensive
address space manipulations.
So back to Bart's point about the maintenance burden of two separate but
nearly identical code paths. Without any data, this discussion is very
silly.

-j
Phil Harman
2010-07-14 19:13:18 UTC
Permalink
Post by j***@public.gmane.org
As to your question, posix_spawn(2) wouldn't fork at all under the
covers: as a syscall it'd not have to. It'd create a new process as a
child of the parent, but it'd not use the parent's address space at all.
There'd still be some amount of locking to copy the file descriptor
table and so on, of course, but nothing quite like the copying or
borrowing of the parent's address space (almost certainly the most
expensive part of fork(2)/ vfork(2)).
I'd like to see data for this assertion too. The code that's executed
for vfork performs very few address space manipulations while borrowing
the parent's address space. All it does today is clear and restore
watched pages. Most processes don't have watchpoints on their pages.
"Borrowing the parent's address space" means stopping all of the
parent's threads, at least for vfork() because that's what the
documented semantics of it are. I've explained this elsewhere in this
thread. I'm sorry you thought that I meant that vfork() does expensive
address space manipulations.
So back to Bart's point about the maintenance burden of two separate but
nearly identical code paths. Without any data, this discussion is very
silly.
It is now eight years since libMicro started to hightlight some of the
major gaps between Solaris and Linux (of which fork/exec was a
significant gap). In response someone at Sun even coined the aphorism
"If Linux is faster, it's a Solaris bug!" (mea culpa).

If you want data, it would be very trivial to compare fork, fork/exec
and posix_spawn performance on Solaris and Linux on current hard
hardware. I'd expect that you'd find Linux is still considerably faster
than Solaris in the area of process creation (sometimes because Solaris
is safer, more scalable, contains lanolin, etc and sometimes because it
just could do with some TLC).

I don't see that the two code paths would necessarily be that different.
Actually, I'd hope that a lot of the code path would be the same (isn't
that one of the import reasons we use functions in C?). I'd also expect
that fork/exec performance would benefit from such an exercise.
Loading...