473,399 Members | 2,774 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

Intel vs Gnu compiler output quality

My experience has generally been that, for CPU-intensive tasks, the
Intel compiler produces code that is about as fast as that produced by
the Gnu compiler.

However, on this simple Shootout entry, Intel seems to be 4.5 times
faster:

http://shootout.alioth.debian.org/gp...lang=icpp&id=3

Any idea why?
Jun 27 '08 #1
5 3773
On Mon, 23 Jun 2008 03:36:58 -0700, jh*****@gmail.com wrote:
My experience has generally been that, for CPU-intensive tasks, the
Intel compiler produces code that is about as fast as that produced by
the Gnu compiler.

However, on this simple Shootout entry, Intel seems to be 4.5 times
faster:

http://shootout.alioth.debian.org/gp...lang=icpp&id=3

Any idea why?
Have you profiled the code? My guess would be that the bulk of the CPU
time is spent in the trig functions.

There are a host of possible explanations... maybe the Intel trig
functions are faster (but do they compute the same level of accuracy?)
Are the optimization levels really comparable? Does it make a difference
whether IEEE floating-point compliance is enforced (e.g. the GCC -ffast-
math flag can make quite a difference)?. Short of analysing the generated
assembler it's probably impossible to say.

In the past I've noticed that ICC seemed to be more aggressive at
vectorization, although recent versions of GCC do a better job (the
benchmarks don't specify which version of GCC was used) - and I'm not
sure if this is relevant here (you can test this: I think both compilers
will tell you what they vectorise if you ask them nicely).

In any case, this sort of benchmark is highly artificial and probably
quite irrelevant to real-life program performance. FWIW I too have found
very little to choose between ICC and GCC over a fair variety of real-
world numerically-intensive tasks (although I've also found later
versions of ICC on Linux to be unusably buggy).

Regards,

--
Lionel B
Jun 27 '08 #2
jh*****@gmail.com wrote:
My experience has generally been that, for CPU-intensive tasks, the
Intel compiler produces code that is about as fast as that produced by
the Gnu compiler.

However, on this simple Shootout entry, Intel seems to be 4.5 times
faster:

http://shootout.alioth.debian.org/gp...lang=icpp&id=3

Any idea why?
because the intel icc/icpc does magical
optimizations on this code and loads the
fpu stack (on x86) from ST(0) up to ST(6)
in the process, whereas the g++ (4.3)
doesn't have the vigor to go further up
than ST(2).

Out of this follows, the gcc code has to
to much more fldl/fildl and fst/fstp
to the L1, which isn't bad but not even
close to FPU register fiddling.

Thats it, basically.

Regards

M.
Jun 27 '08 #3
On Mon, 23 Jun 2008 18:04:49 +0200, Mirco Wahab wrote:
jh*****@gmail.com wrote:
>My experience has generally been that, for CPU-intensive tasks, the
Intel compiler produces code that is about as fast as that produced by
the Gnu compiler.

However, on this simple Shootout entry, Intel seems to be 4.5 times
faster:

http://shootout.alioth.debian.org/gp4/benchmark.php?
test=partialsums&lang=icpp&id=3
>>
Any idea why?

because the intel icc/icpc does magical optimizations on this code and
loads the fpu stack (on x86) from ST(0) up to ST(6) in the process,
whereas the g++ (4.3) doesn't have the vigor to go further up than
ST(2).

Out of this follows, the gcc code has to to much more fldl/fildl and
fst/fstp
to the L1, which isn't bad but not even close to FPU register fiddling.
That all sounds very impressive... could you possibly explain what it
means, roughly, to a non-assembler/microprocessor architecture expert?
Also, what about on x86_64?
Thats it, basically.
I'll quibble that "basically" ;-)

--
Lionel B
Jun 27 '08 #4
Lionel B wrote:
On Mon, 23 Jun 2008 18:04:49 +0200, Mirco Wahab wrote:
>Out of this follows, the gcc code has to to much more fldl/fildl and
fst/fstp
to the L1, which isn't bad but not even close to FPU register fiddling.

That all sounds very impressive... could you possibly explain what it
means, roughly, to a non-assembler/microprocessor architecture expert?
Also, what about on x86_64?
Shouldn't sound very impressive imho. The central part of said
benchmark is the following loop:
21:
for (int k = 1; k <= n; ++k, pot = -pot) {
kd = double(k);
kd2 = kd * kd;
kd3 = kd * kd2;

sink = std::sin(kd);
cosk = std::cos(kd);

res1 += std::pow(dt, kd);
res2 += 1.0 / std::sqrt(kd);
res3 += 1.0 / (kd2 + kd);
res4 += 1.0 / (kd3 * sink * sink);
res5 += 1.0 / (kd3 * cosk * cosk);
res6 += 1.0 / kd;
res7 += 1.0 / kd2;
res8 += pot / kd;
res9 += pot / (2.0 * kd - 1.0);
}
39:

What one may see is a bunch of operands that are
used all along the computation of the 9 different
terms (kd, kd2 etc). For me, it looks like the
Intel compiler counts the occurrences of these
operands and puts the "best" five into the upper
four or five fpu registers (x86) (ST[3] ... ST[7])
and does the increments if the res[1-9] terms
entirely out of these fpu registers.

Example:
;;; res4 += 1.0 / (kd3 * sink * sink);
;;; res5 += 1.0 / (kd3 * cosk * cosk);
;;; res6 += 1.0 / kd;
;;; res7 += 1.0 / kd2;
gives:
fdiv %st, %st(2) #36.27
fdiv %st, %st(1) #32.34
fxch %st(1) #32.13
faddp %st, %st(6) #32.13
fldl 112(%esp) #33.34
fxch %st(6) #32.13
fstpl 80(%esp) #32.13
fld %st(4) #33.34
fmul %st(6), %st #33.34
fmulp %st, %st(6) #33.41
fdiv %st, %st(5) #33.41
fldl 96(%esp) #33.13
faddp %st, %st(6) #33.13
fxch %st(5) #33.13
fstpl 96(%esp) #33.13
fldl 104(%esp) #34.34
fmul %st, %st(4) #34.34
fmulp %st, %st(4) #34.41
fxch %st(3) #34.41
fdivr %st(4), %st #34.41
[snipped]

One can immediately see that the operations use and store stuff
across the (almost) full fpu register set %st(0) .. %st(6).
Even the last register, %st(7) is used (elsewhere). A lot of
'fxch' operations are used too, which is '(fpu-) register renaming'
and costs 0 cycles on newer x86. This is necessary to throw out
operands no longer used, they are 'renamed' from %st(7) to %st
(which is the 'top of stack'). To the application, the x86 fpu
is a stack and can only used like a stack - except for 'renaming'.

x86_64 doesn't make a difference here. Only SSE would, which
isn't involved.

regards

M.
Jun 27 '08 #5
On Mon, 23 Jun 2008 19:03:03 +0200, Mirco Wahab wrote:
Lionel B wrote:
>On Mon, 23 Jun 2008 18:04:49 +0200, Mirco Wahab wrote:
>>Out of this follows, the gcc code has to to much more fldl/fildl and
fst/fstp
to the L1, which isn't bad but not even close to FPU register
fiddling.

That all sounds very impressive... could you possibly explain what it
means, roughly, to a non-assembler/microprocessor architecture expert?
Also, what about on x86_64?

Shouldn't sound very impressive imho. The central part of said benchmark
is the following loop:
[...]

Thanks,

--
Lionel B
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Alex Vinokur | last post by:
=========== Windows 2000 Intel C++ 8.0 =========== ------ foo.cpp ------ int main () { for (int i = 0; i < 10; i++); for (int i = 0; i < 10; i++);
1
by: alex | last post by:
Hello I have some c++ code, which has been happily developped in the linux/unix world. Everything has always been compiled with different compiler (intel, KAI, g++...) and gnu makefiles For...
2
by: Ryan Mitchley | last post by:
Hi all I have code for an object factory, heavily based on an article by Jim Hyslop (although I've made minor modifications). The factory was working fine using g++, but since switching to the...
5
by: Lars Schouw | last post by:
I have downloaded the newest boos release. I am havng problems building boost using the intel C++ 8.0 compiler. It looks as if bjam can't fine the icl.exe compiler executable itself. This file is...
4
by: merlevo | last post by:
I am trying to set up the intel compiler. I have no trouble when I #include <stdio.h> but if I #include (iostream> using namespace std; I get an error "could not open source file "iostream" ...
17
by: Ozo | last post by:
What would be the C++ compiler producing the fastest code for Windows XP Pro (32-bit)? I have to choose between these two: - Visual C++ 2005 compiler - Intel C++ Compiler 9.0 for Windows My...
2
by: Faheem Mitha | last post by:
Hi, The following bit of code compiles fine with gcc 3.3 or later, but has problems with the Intel C++ compiler version 9.1, which produces the following error message. Is this a compiler...
0
by: Vinay | last post by:
Hello, I have to compile the ICU 3.4 Using Intel C++ Compiler 9.0 with Cygwin on Windows Server 2003 (IA 64 machine). But I am facing a lot of difficulties. 1. I tried to compile a simple c...
4
by: CoL | last post by:
Hi All, We have recently migrated our product to intel compiler 8.0 from gcc 2.96 on RHEL 4. Everything works absolutely fine with gcc. I have successfully managed to build/link the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.