Discussion:
64-bit vs 32-bit applications
Kishore Kumar Pusukuri
2010-08-17 01:58:16 UTC
Permalink
Hi,
I am surprised with the performances of some 64-bit multi-threaded applications on my AMD Opteron machine. For most of the applications, the performance of 32-bit version is almost same as the performance of 64-bit version. However, for a couple of applications, 32-bit versions provide better performance (running-time is around 76 secs) than 64-bit (running time is around 96 secs). Could anyone help me to find the reason behind this, please?


$ldd program-64 (64-bit version)
libpthread.so.1 => /lib/64/libpthread.so.1
libstdc++.so.6 => /usr/lib/64/libstdc++.so.6
libm.so.2 => /lib/64/libm.so.2
libgcc_s.so.1 => /usr/lib/64/libgcc_s.so.1
libc.so.1 => /lib/64/libc.so.1

$ ldd program-32 (32-bit version)
libpthread.so.1 => /lib/libpthread.so.1
libstdc++.so.6 => /usr/lib/libstdc++.so.6
libm.so.2 => /lib/libm.so.2
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
libc.so.1 => /lib/libc.so.1
--
This message posted from opensolaris.org
Jim Mauro
2010-08-17 02:23:30 UTC
Permalink
Typically such performance disparities are due to
changes in the memory footprint with the 64-bit code,
and resulting cache miss rates being higher with the
64-bit code than the 32-bit code.

You need to use the Studio tools to profile the code,
and/or cputrack to measure cache hit rates.

Thanks,
/jim
Post by Kishore Kumar Pusukuri
Hi,
I am surprised with the performances of some 64-bit multi-threaded applications on my AMD Opteron machine. For most of the applications, the performance of 32-bit version is almost same as the performance of 64-bit version. However, for a couple of applications, 32-bit versions provide better performance (running-time is around 76 secs) than 64-bit (running time is around 96 secs). Could anyone help me to find the reason behind this, please?
$ldd program-64 (64-bit version)
libpthread.so.1 => /lib/64/libpthread.so.1
libstdc++.so.6 => /usr/lib/64/libstdc++.so.6
libm.so.2 => /lib/64/libm.so.2
libgcc_s.so.1 => /usr/lib/64/libgcc_s.so.1
libc.so.1 => /lib/64/libc.so.1
$ ldd program-32 (32-bit version)
libpthread.so.1 => /lib/libpthread.so.1
libstdc++.so.6 => /usr/lib/libstdc++.so.6
libm.so.2 => /lib/libm.so.2
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
libc.so.1 => /lib/libc.so.1
--
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
Johan Hartzenberg
2010-08-17 11:59:43 UTC
Permalink
Further to what Jim mentioned below, 64-bit-ness does not imply faster
execution. On the contrary it means each instruction, data address and data
access requires more bits to move across the bus, compared to the 32-bit
version, to do the same work. For example to sort a list of words (text)
the CPU compares byte codes which do not benefit from having 64-bit
registers. But every program variable accessed is done so via a 64-bit
address in stead of a 32-bit address, so this places a bigger load on the
address bus on every operation, without any benefit at all (the data
operated on still fits into 32-bit registers)

If however your application use the features of 64-bit programs then the
advantages can be more than these overheads. For the most part this means
programs that require access to more than 4 GB of data in RAM.

Cheers,
_Johan
Post by Jim Mauro
Typically such performance disparities are due to
changes in the memory footprint with the 64-bit code,
and resulting cache miss rates being higher with the
64-bit code than the 32-bit code.
You need to use the Studio tools to profile the code,
and/or cputrack to measure cache hit rates.
Thanks,
/jim
Post by Kishore Kumar Pusukuri
Hi,
I am surprised with the performances of some 64-bit multi-threaded
applications on my AMD Opteron machine. For most of the applications, the
performance of 32-bit version is almost same as the performance of 64-bit
version. However, for a couple of applications, 32-bit versions provide
better performance (running-time is around 76 secs) than 64-bit (running
time is around 96 secs). Could anyone help me to find the reason behind
this, please?
Post by Kishore Kumar Pusukuri
$ldd program-64 (64-bit version)
libpthread.so.1 => /lib/64/libpthread.so.1
libstdc++.so.6 => /usr/lib/64/libstdc++.so.6
libm.so.2 => /lib/64/libm.so.2
libgcc_s.so.1 => /usr/lib/64/libgcc_s.so.1
libc.so.1 => /lib/64/libc.so.1
$ ldd program-32 (32-bit version)
libpthread.so.1 => /lib/libpthread.so.1
libstdc++.so.6 => /usr/lib/libstdc++.so.6
libm.so.2 => /lib/libm.so.2
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
libc.so.1 => /lib/libc.so.1
--
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
_______________________________________________
perf-discuss mailing list
--
Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke

My blog: http://initialprogramload.blogspot.com
Rayson Ho
2010-08-17 12:09:00 UTC
Permalink
Post by Johan Hartzenberg
If however your application use the features of 64-bit programs then the
advantages can be more than these overheads.  For the most part this means
programs that require access to more than 4 GB of data in RAM.
A lot of applications get a boost in performance in 64-bit because
they get 16 general purpose registers (vs. 8 in 32-bit mode), but this
is true for x64 vs x86 only.

So profile the app and find out where it is spending most time in is
the best starting point.

Rayson
Post by Johan Hartzenberg
Cheers,
  _Johan
Post by Jim Mauro
Typically such performance disparities are due to
changes in the memory footprint with the 64-bit code,
and resulting cache miss rates being higher with the
64-bit code than the 32-bit code.
You need to use the Studio tools to profile the code,
and/or cputrack to measure cache hit rates.
Thanks,
/jim
Post by Kishore Kumar Pusukuri
Hi,
I am surprised with the performances of some 64-bit multi-threaded
applications on my AMD Opteron machine. For most of the applications, the
performance of 32-bit version is almost same as the performance of 64-bit
version. However, for a couple of applications, 32-bit versions provide
better performance (running-time is around 76 secs) than 64-bit (running
time is around 96 secs). Could anyone help me to find the reason behind
this, please?
$ldd program-64  (64-bit version)
       libpthread.so.1 =>       /lib/64/libpthread.so.1
       libstdc++.so.6 =>        /usr/lib/64/libstdc++.so.6
       libm.so.2 =>     /lib/64/libm.so.2
       libgcc_s.so.1 =>         /usr/lib/64/libgcc_s.so.1
       libc.so.1 =>     /lib/64/libc.so.1
$ ldd program-32 (32-bit version)
       libpthread.so.1 =>       /lib/libpthread.so.1
       libstdc++.so.6 =>        /usr/lib/libstdc++.so.6
       libm.so.2 =>     /lib/libm.so.2
       libgcc_s.so.1 =>         /usr/lib/libgcc_s.so.1
       libc.so.1 =>     /lib/libc.so.1
--
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
_______________________________________________
perf-discuss mailing list
--
Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke
My blog: http://initialprogramload.blogspot.com
_______________________________________________
perf-discuss mailing list
Darryl Gove
2010-08-17 16:50:46 UTC
Permalink
Post by Rayson Ho
Post by Johan Hartzenberg
If however your application use the features of 64-bit programs then the
advantages can be more than these overheads. For the most part this means
programs that require access to more than 4 GB of data in RAM.
A lot of applications get a boost in performance in 64-bit because
they get 16 general purpose registers (vs. 8 in 32-bit mode), but this
is true for x64 vs x86 only.
So profile the app and find out where it is spending most time in is
the best starting point.
Exactly.

So when you go from 32-bit to 64-bit code:

- longs and pointers become 64-bits in size. This increases the memory
footprint of the application, and causes the app to become slower.

On x86 you also get:

- An improved ABI where values are passed in registers rather than on
the stack.

- An increased number of registers.

Hence in general x86-64 codes will run faster.

In contrast 32-bit SPARC code uses v8plus which is basically the
advantages of the v9 (64-bit) instruction set with 32-bit pointers and
longs. On SPARC going from 32-bit to 64-bit is nearly always a slight
performance loss due to the increased memory footprint.


As others have said, profile the code to find where the time is spent.
You can profile on the cache miss performance counters. If you're on
Solaris, you can use spot in Studio to generate a report
(http://cooltools.sunsource.net/spot/)

Regards,

Darryl.
Post by Rayson Ho
Rayson
Post by Johan Hartzenberg
Cheers,
_Johan
Post by Jim Mauro
Typically such performance disparities are due to
changes in the memory footprint with the 64-bit code,
and resulting cache miss rates being higher with the
64-bit code than the 32-bit code.
You need to use the Studio tools to profile the code,
and/or cputrack to measure cache hit rates.
Thanks,
/jim
Post by Kishore Kumar Pusukuri
Hi,
I am surprised with the performances of some 64-bit multi-threaded
applications on my AMD Opteron machine. For most of the applications, the
performance of 32-bit version is almost same as the performance of 64-bit
version. However, for a couple of applications, 32-bit versions provide
better performance (running-time is around 76 secs) than 64-bit (running
time is around 96 secs). Could anyone help me to find the reason behind
this, please?
$ldd program-64 (64-bit version)
libpthread.so.1 => /lib/64/libpthread.so.1
libstdc++.so.6 => /usr/lib/64/libstdc++.so.6
libm.so.2 => /lib/64/libm.so.2
libgcc_s.so.1 => /usr/lib/64/libgcc_s.so.1
libc.so.1 => /lib/64/libc.so.1
$ ldd program-32 (32-bit version)
libpthread.so.1 => /lib/libpthread.so.1
libstdc++.so.6 => /usr/lib/libstdc++.so.6
libm.so.2 => /lib/libm.so.2
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
libc.so.1 => /lib/libc.so.1
--
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
_______________________________________________
perf-discuss mailing list
--
Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke
My blog: http://initialprogramload.blogspot.com
_______________________________________________
perf-discuss mailing list
_______________________________________________
perf-discuss mailing list
--
Darryl Gove
Compiler Performance Engineering
Blog : http://blogs.sun.com/d/
Books: http://my.safaribooksonline.com/9780321711441
http://my.safaribooksonline.com/9780768681390
http://my.safaribooksonline.com/0595352510
Continue reading on narkive:
Loading...