473,320 Members | 2,189 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

slow complex<double>'s

/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...
http://shootout.alioth.debian.org/gp...lbrot&lang=all

....I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp...&lang=gcc&id=2

....I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

Curious,

Greg Buchholz
*/

// Takes an integer argument "n" on the command line and generates a
// PBM bitmap of the Mandelbrot set on stdout.
// see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )

#include<iostream>
#include<complex>

int main (int argc, char **argv)
{
char bit_num = 0, byte_acc = 0;
const int iter = 50;
const double limit_sqr = 2.0 * 2.0;

std::ios_base::sync_with_stdio(false);
int n = atoi(argv[1]);

std::cout << "P4\n" << n << " " << n << std::endl;

for(int y=0; y<n; ++y)
for(int x=0; x<n; ++x)
{
std::complex<double> Z(0.0,0.0);
std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n -
1.0);

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
0x00:0x01);

if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc =
0; }
else if(x == n-1) { byte_acc <<= (8-n%8);
std::cout << byte_acc;
bit_num = byte_acc = 0; }
}
}

Mar 5 '06 #1
9 3394
In article <1141588925.900212.137230
@z34g2000cwc.googlegroups.com>,
sl**************@yahoo.com says...

[ ... ]
...I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?
[ ... ]
for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;


Hmm...try this:

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
Z *= Z; Z += C; }

No guarantee, but I think it's worth a shot.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Mar 5 '06 #2

Jerry Coffin wrote:
Hmm...try this:

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
Z *= Z; Z += C; }

No guarantee, but I think it's worth a shot.


Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.

Greg Buchholz

Mar 6 '06 #3

Greg Buchholz wrote:
Jerry Coffin wrote:
Hmm...try this:

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
Z *= Z; Z += C; }

No guarantee, but I think it's worth a shot.


Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.

Greg Buchholz


profile your code and find out what's causing the slowdown...

Mar 6 '06 #4

Greg Buchholz wrote:
/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...
http://shootout.alioth.debian.org/gp...lbrot&lang=all

...I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp...&lang=gcc&id=2

...I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

Curious,

Greg Buchholz
*/ [snip]
for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
0x00:0x01);

[snip]

Could it be that some large time is spent in calculating "norm"? I
doubt that the C-version does so - and it should not be necesarry. Some
simple test should fit the bill (or at least avoid calling norm all the
time).

/Peter

Mar 6 '06 #5
In article <1141602729.936509.269620
@e56g2000cwe.googlegroups.com>,
sl**************@yahoo.com says...

Jerry Coffin wrote:
Hmm...try this:

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) {
Z *= Z; Z += C; }

No guarantee, but I think it's worth a shot.


Tried it. No speed improvement on gcc-3.4.2 or gcc-4.1.0pre021006.


Well, that takes out the possibility that seemed most
obvious to me. Depending on your bent, your next step
would be either a profiler or examining the code the
compiler's producing. For large chunks of code, the
former works well, but for small amounts that you want to
examine in maximum detail the latter can be useful as
well.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Mar 6 '06 #6
Greg Buchholz wrote:
...I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++.


Looks like the problem can be solved by manually inlining the
definition of "norm"...

//manually inlining "norm" results in a 5x-7x speedup on g++
for(int i=0; i<iter and
(Z.real()*Z.real() + Z.imag()*Z.imag()) <= limit_sqr; ++i)
Z = Z*Z + C;

....For some reason g++ must not have been able to inline it (or does so
after common subexpression elimination or somesuch).

Greg Buchholz

Mar 6 '06 #7

"Greg Buchholz" <sl**************@yahoo.com> wrote in message
news:11**********************@z34g2000cwc.googlegr oups.com...
/*
While writing a C++ version of the Mandelbrot benchmark over at the
"The Great Computer Language Shootout"...
http://shootout.alioth.debian.org/gp...lbrot&lang=all
...I've come across the issue that complex<double>'s seem quite slow
unless compiled with -ffast-math. Of course doing that results in
incorrect answers because of rounding issues. The speed difference for
the program below is between 5x-8x depending on the version of g++. It
is also about 5 times slower than the corresponding gcc version at...

http://shootout.alioth.debian.org/gp...lbrot&lang=gcc
&id=2
...I'd be interesting in learning the reason for the speed difference.
Sure, the C version is slightly more optimized, but I was thinking that
the C++ code should only be 20-50% slower, not 750% slower like I get
with g++-4.1.0pre021006 (g++ 3.4.2 is a factor of 5 slower when
compiling with "-O3" vs. "-O3 -ffast-math"). Does it have something to
do with temporaries not being optimized away, or somesuch? A
limitation of the x87 instruction set? Is it inherent in the way the
C++ Standard requires complex<double>'s to be calculated? My bad coding
style? Limitations imposed by g++?

Curious,

Greg Buchholz
*/

// Takes an integer argument "n" on the command line and generates a
// PBM bitmap of the Mandelbrot set on stdout.
// see also: ( http://sleepingsquirrel.org/cpp/mandelbrot.cpp.html )

#include<iostream>
#include<complex>

int main (int argc, char **argv)
{
char bit_num = 0, byte_acc = 0;
const int iter = 50;
const double limit_sqr = 2.0 * 2.0;

std::ios_base::sync_with_stdio(false);
int n = atoi(argv[1]);

std::cout << "P4\n" << n << " " << n << std::endl;

for(int y=0; y<n; ++y)
for(int x=0; x<n; ++x)
{
std::complex<double> Z(0.0,0.0);
std::complex<double> C(2*(double)x/n - 1.5, 2*(double)y/n -
1.0);

for (int i=0; i<iter and norm(Z) <= limit_sqr; ++i) Z = Z*Z +
C;

byte_acc = (byte_acc << 1) | ((norm(Z) > limit_sqr) ?
0x00:0x01);

if(++bit_num == 8){ std::cout << byte_acc; bit_num = byte_acc =
0; }
else if(x == n-1) { byte_acc <<= (8-n%8);
std::cout << byte_acc;
bit_num = byte_acc = 0; }
}
}


------------------------------------------------
Hi Greg,
I have had similiar problems when
using the <complex> library for
Microsoft VC6. It ran at about half the
expected speed . After looking thru
the header file I saw that the C++
structure was somewhat involved with
a base class and several derived classes.
I ended up writing my own very simple
complex class looking like

namespace std
{
template <class Tc>
class ppcomplex
{
public:
Tc re;
Tc im;

ppcomplex(){re = 0;im = 0;}
ppcomplex(const Tc& r,const Tc& i) : re(r), im(i) {}
ppcomplex(const Tc& r) : re(r), im((Tc)0) {}

Tc real() const { return re;}
Tc imag() const { return im;}

Tc real(const Tc& x) { return ( re = x):}
Tc imag(const Tc& x) { return ( im = x):}
// the usual assignment operators
ppcomplex(const ppcomplex<Tc>& z)
{this->re = z.re; this->im = z.im;}
ppcomplex<Tc>& operator =(const ppcomplex<Tc>& y) {
if(this != &y)
{this->re = y.re; this->im = y.im;} return *this; }
ppcomplex<Tc>& operator =(const Tc& r)
{ this->re = r, this->im = (Tc)0; return *this;}

etc --- etc ---etc more stuff here

// updating by a real constant

ppcomplex<Tc>& operator +=(const Tc& y)
{ re += y; return *this;}

// more stuff

}; // end of class ppcomplex

This ran twice as fast ! so maybe you have the same problem i.e. your
complex class is just too complicated ?

Regards....Bill

Mar 6 '06 #8
Bill Shortall <pe***@cminet.net> wrote:
namespace std
{
template <class Tc>
class ppcomplex
{


You are not allowed to introduce your own names to namespace std. IIRC,
you are only allowed to add specializations of the standard template
classes, when specializing on user-defined classes.

--
Marcus Kwok
Mar 6 '06 #9

"Marcus Kwok" <ri******@gehennom.net.invalid> wrote in message
news:du**********@news-int.gatech.edu...
Bill Shortall <pe***@cminet.net> wrote:
namespace std
{
template <class Tc>
class ppcomplex
{


You are not allowed to introduce your own names to namespace std. IIRC,
you are only allowed to add specializations of the standard template
classes, when specializing on user-defined classes.

--
Marcus Kwok


OK -- change ppcomplex to complex
call it's header file <complex> and
remove the old one
Mar 7 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: bluekite2000 | last post by:
and why doesnt the standard vector have such conversion available?
14
by: LumisROB | last post by:
Is it possible to create matrixes with vector <vector <double >> ? If it is possible which is the element m23 ? You excuse but I am not an expert Thanks ROB
2
by: mhs1pk | last post by:
Hi, I m a having problem while trying to make the return type of a method in MFC appl in VC6 as complex<double>. If i declear a variable wth std::complex<double> type, it is compled successfully....
2
by: PengYu.UT | last post by:
Hi, In FFTW (http://www.fftw.org/), it defines the funciton fftw_malloc to allocate memory properly aligned. However, I only want to use new to allocate memory for std::complex<double>. Can...
3
by: J.M. | last post by:
I have data in a double array of length 2N, which actually represents complex numbers with real and imaginary parts interlaced. In other words, elements in this array with even indices represents...
1
by: perroe | last post by:
Hi I have a array of complex numbers that are stored in a simple double array. This is done since the array is part of an wrapper for an external C library, and the imaginary part of the first...
32
by: T. Crane | last post by:
Hi, I'm struggling with how to initialize a vector<vector<double>> object. I'm pulling data out of a file and storing it in the vector<vector<double>object. Because any given file will have a...
5
by: jeremit0 | last post by:
I'm trying to sort a vector<complex<double and can't figure it out. I recognize the problem is that there isn't a default operator< for complex data types. I have written my own operator and can...
3
by: huili80 | last post by:
Like in the following. Though it gives the expected result with gcc4.0, is it really safe to do that? What if it's not double but some non-POD type? -----------------------------------------...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.