473,388 Members | 1,423 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

Optimization with local vs. global arrays

The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

There is considerable work being performed in the ... section.
This is on a Linux Itanium II system, compiled both with the Intel C++
compiler (V9.1) with interprocedural optimization enabled, and with the
GNU C V 3.3.5 compiler with -O3 optimization. (The performance change is
more dramatic with the Intel Compiler.) I tried declaring the local
FOO arrays static with

static FOO a[10], b[10], c[10];

which helped with the GNU compiler but was actually worse with the Intel
compiler. I also tried

FOO d[30];
FOO *a = d, *b = d+10, *c = d+20;

with a local d array, but that had no effect.

Is this just a compiler issue, or am I missing something? I want to avoid
the external arrays, obviously, but that code compiled by the Intel
compiler gives the fastest execution speed by far. I'd like to get the
equivalent performance with something less dangerous than global arrays.
Apr 12 '07 #1
19 2417

"Jim West" <eg***********@yahoo.comwrote in message
news:9F*******************@newsfe20.lga...
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

There is considerable work being performed in the ... section.
This is on a Linux Itanium II system, compiled both with the Intel C++
compiler (V9.1) with interprocedural optimization enabled, and with the
GNU C V 3.3.5 compiler with -O3 optimization. (The performance change is
more dramatic with the Intel Compiler.) I tried declaring the local
FOO arrays static with

static FOO a[10], b[10], c[10];

which helped with the GNU compiler but was actually worse with the Intel
compiler. I also tried

FOO d[30];
FOO *a = d, *b = d+10, *c = d+20;

with a local d array, but that had no effect.

Is this just a compiler issue, or am I missing something? I want to avoid
the external arrays, obviously, but that code compiled by the Intel
compiler gives the fastest execution speed by far. I'd like to get the
equivalent performance with something less dangerous than global arrays.
Faster processor?
Apr 12 '07 #2
Jim West wrote:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

There is considerable work being performed in the ... section.
This is on a Linux Itanium II system, compiled both with the Intel C++
compiler (V9.1) with interprocedural optimization enabled, and with the
GNU C V 3.3.5 compiler with -O3 optimization. (The performance change is
more dramatic with the Intel Compiler.) I tried declaring the local
FOO arrays static with

static FOO a[10], b[10], c[10];

which helped with the GNU compiler but was actually worse with the Intel
compiler. I also tried

FOO d[30];
FOO *a = d, *b = d+10, *c = d+20;

with a local d array, but that had no effect.

Is this just a compiler issue, or am I missing something? I want to avoid
the external arrays, obviously, but that code compiled by the Intel
compiler gives the fastest execution speed by far. I'd like to get the
equivalent performance with something less dangerous than global arrays.
It's not especially surprising that the local arrays, which may be
pushed on the stack with each invocation of bar, would be slower than
the global arrays. If you want something "safer" you could try moving
the arrays to a namespace.

Mark
Apr 12 '07 #3
On 2007-04-12, GeekBoy <ne*@nerdy.comwrote:
>
Faster processor?
No, all are run on the same system, OS etc. It is compiled with
the Intel compiler using

icc -O3 -ip -c foo.cc

and with the GNU compiler using

g++ -O3 -c foo.cc
Apr 12 '07 #4
On 2007-04-12, Mark P <us****@fall2005REMOVE.fastmailCAPS.fmwrote:
It's not especially surprising that the local arrays, which may be
pushed on the stack with each invocation of bar, would be slower than
the global arrays. If you want something "safer" you could try moving
the arrays to a namespace.
OK, I had thought that the time needed to push the small arrays on the
stack (FOO isn't a very large class) would be small compared to the
heavy number crunching I do in the bar() routine. Guess not!

The namespace solution is what I needed, since some of the array names
are reused through-out the code. Seems obvious once it was pointed out.
:)

Thanks for the help.
Apr 12 '07 #5
Jim West wrote:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}
What is a FOO?

Does it require construction?

Do you call bar() in a loop?

--
Ian Collins.
Apr 12 '07 #6
On 2007-04-12, Ian Collins <ia******@hotmail.comwrote:
Jim West wrote:
>The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}
What is a FOO?

Does it require construction?

Do you call bar() in a loop?

FOO is actually a three-dimensional space vector:

class FOO {
float x, y, z;
FOO() : x_(0), y_(0), z_(0) { };
FOO(float x, float y, float z) : x_(x), y_(y), z_(z) { };
inline FOO& operator+=(const FOO& a);
/* Many more inline operators and member functions included */
};

bar() is called many times in a loop.
Apr 12 '07 #7
Jim West wrote:
On 2007-04-12, Ian Collins <ia******@hotmail.comwrote:
>>Jim West wrote:
>>>The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

What is a FOO?

Does it require construction?

Do you call bar() in a loop?

FOO is actually a three-dimensional space vector:

class FOO {
float x, y, z;
FOO() : x_(0), y_(0), z_(0) { };
FOO(float x, float y, float z) : x_(x), y_(y), z_(z) { };
inline FOO& operator+=(const FOO& a);
/* Many more inline operators and member functions included */
};

bar() is called many times in a loop.
So there's your reason - FOO() gets called 30 times for each call of bar().

--
Ian Collins.
Apr 12 '07 #8

"Ian Collins" <ia******@hotmail.comwrote in message
news:58*************@mid.individual.net...
Jim West wrote:
>The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}
What is a FOO?
Foobar is a universal variable understood to represent whatever is being
discussed.
It's usually used in examples that illustrate concepts and ideas in computer
science.
For instance, a computer science professor may be discussing different file
formats. In this case, he would call the generic-example file foo or foobar,
then list the extensions associated with the file formats (e.g. foobar.txt,
foobar.gif, foobar.exe, foobar.tar).

When foo or foobar is used, everyone understands that these are just
examples, and they don't really exist.
Programmers and administrators also use foo and foobar in a similar context.
Files or program s named with foo or foobar are understood not to be
permanent and will be changed or deleted at anytime.
Foo, bar, and the compound foobar were commonly used at MIT, Stanford and
the Helsinki University of Technology, Finland. Other generic variables are
used other places, but only these three are considered universal.

Does it require construction?

Do you call bar() in a loop?

--
Ian Collins.

Apr 12 '07 #9
GeekBoy wrote:
"Ian Collins" <ia******@hotmail.comwrote in message
news:58*************@mid.individual.net...
>>
What is a FOO?

When foo or foobar is used, everyone understands that these are just
examples, and they don't really exist.
Not in this case, if you read the OP's reply.
>
Foo, bar, and the compound foobar were commonly used at MIT, Stanford and
the Helsinki University of Technology, Finland. Other generic variables are
used other places, but only these three are considered universal.
If you haven't done so already, research the origin of the term.
>>
--
Ian Collins.
*Please* don't quote signatures.

--
Ian Collins.
Apr 12 '07 #10
Jim West wrote:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}
If, as seems to be the case, the culprit is the constructor for FOO, you
can just remove it. Since the version with the global array apparently
works correctly, the rest of the code isn't relying on having the FOO
values initialized on entry into bar.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
Apr 12 '07 #11

"Pete Becker" <pe**@versatilecoding.comwrote in message
news:Qa******************************@giganews.com ...
Jim West wrote:
>The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

If, as seems to be the case, the culprit is the constructor for FOO, you
can just remove it. Since the version with the global array apparently
works correctly, the rest of the code isn't relying on having the FOO
values initialized on entry into bar.
In another part of the thread, he shows two constructors, one default and
one parameterized. He can't remove the default constructor because the
array definitions require one, and the compiler can't create one if there's
a parameterized version, right? (Of course, he may be able to remove the
parameterized version as well; I don't know.) But won't the
compiler-generated constructor perform the same default initializations, and
still get called 30 times for each call to bar? Or am I misunderstanding
something here?

In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.

-Howard


Apr 12 '07 #12
In article <hP*********************@bgtnsc04-news.ops.worldnet.att.net>,
Howard <al*****@hotmail.comwrote:
>"Pete Becker" <pe**@versatilecoding.comwrote in message
news:Qa******************************@giganews.co m...
>Jim West wrote:
>>The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

If, as seems to be the case, the culprit is the constructor for FOO, you
can just remove it. Since the version with the global array apparently
works correctly, the rest of the code isn't relying on having the FOO
values initialized on entry into bar.

In another part of the thread, he shows two constructors, one default and
one parameterized. He can't remove the default constructor because the
array definitions require one, and the compiler can't create one if there's
a parameterized version, right? (Of course, he may be able to remove the
parameterized version as well; I don't know.) But won't the
compiler-generated constructor perform the same default initializations, and
still get called 30 times for each call to bar? Or am I misunderstanding
something here?

In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.
That was added later in the thread. Pete's point was probably
that (even in light of any new info) that since he is worried
about speed that the global array's (even if tossed into a named
or named namespace) are zero-initialized first because they are static
before other intialization on them occurs.

BTW, this can be different from having static's inside the function.
Actually, a bit of this, including the timing of the zero'ing of the
gloabals, is up to the compiler [system].

Lastly to Jim, if this is utterly crucial, there is probably
some other solution possible even faster than the globals,
but we probably don't know enough about what you're doing
to offer that at this point.
--
Greg Comeau / 4.3.9 with C++0xisms now in beta!
Comeau C/C++ ONLINE == http://www.comeaucomputing.com/tryitout
World Class Compilers: Breathtaking C++, Amazing C99, Fabulous C90.
Comeau C/C++ with Dinkumware's Libraries... Have you tried it?
Apr 12 '07 #13
Howard wrote:
"Pete Becker" <pe**@versatilecoding.comwrote in message
news:Qa******************************@giganews.com ...
>Jim West wrote:
>>The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}
If, as seems to be the case, the culprit is the constructor for FOO, you
can just remove it. Since the version with the global array apparently
works correctly, the rest of the code isn't relying on having the FOO
values initialized on entry into bar.

In another part of the thread, he shows two constructors, one default and
one parameterized. He can't remove the default constructor because the
array definitions require one, and the compiler can't create one if there's
a parameterized version, right?
Shrug. Write an empty default constructor, or refactor. I'm not
particularly interested in getting into implementation details. The
point is that the initialization apparently isn't actually required, so
doesn't belong in the class.

(Of course, he may be able to remove the
parameterized version as well; I don't know.) But won't the
compiler-generated constructor perform the same default initializations, and
still get called 30 times for each call to bar? Or am I misunderstanding
something here?
The compiler-generated constructor uses the default initializer for each
of the float fields, and that initializer does nothing. Any reasonable
compiler will generate no code.
>
In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.
But that unnecessarily increases coupling because every caller now has
to provide scratch data for bar.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
Apr 12 '07 #14
Greg Comeau wrote:
>
That was added later in the thread. Pete's point was probably
that (even in light of any new info) that since he is worried
about speed that the global array's (even if tossed into a named
or named namespace) are zero-initialized first because they are static
before other intialization on them occurs.
Not quite. My point was that the global arrays demonstrate that
initialization is irrelevant, because on calls to bar after the first
one they have whatever junk was left over from the previous call. If
that works, then skip the initialization entirely.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
Apr 12 '07 #15

"Pete Becker" <pe**@versatilecoding.comwrote in message
news:6Z******************************@giganews.com ...
>>
In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.

But that unnecessarily increases coupling because every caller now has to
provide scratch data for bar.
True. I hadn't thought of that. I guess without knowing more details of
how bar is called and what it's used for, it's tough to come up with the
"best" solution.

-Howard

Apr 12 '07 #16
On 2007-04-12, Howard <al*****@hotmail.comwrote:
>
"Pete Becker" <pe**@versatilecoding.comwrote in message
news:6Z******************************@giganews.com ...
>>>
In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.

But that unnecessarily increases coupling because every caller now has to
provide scratch data for bar.

True. I hadn't thought of that. I guess without knowing more details of
how bar is called and what it's used for, it's tough to come up with the
"best" solution.
Following up on my situation...I unfortunately can't post to newsgroups
at work, but I followed the discussions on google groups. While the FOO
class need not be initialized in this particular case, it is part of a
numerical library that my group has put together over the years, and any
changes to the default constructor would potentially break a lot of
code. Initializing all components to zero seemed like a Good Idea(tm)
when I first wrote it a long time ago, but in hindsight it was clearly a
mistake. In any event, I copied the header to this particular file and
changed FOO to FOO2 everywhere and changed the default constructor to

FOO2() { };

This did not give any improvement, surprisingly. (At least with the
Intel compiler...I forgot to try g++.)

Allocating the arrays in the calling routines would work, but bar() is
indeed called in several different places and would require more editing
than I want to do at this point.

So, my solution is to use the namespace idea that Mark P (I believe)
suggested (already implemented). I need to protect them because their
real names are

FOO Observation_Points[10], Source_Points[10], Basis_Functions[10];

names which get reused in a lot of different places throughout the code.
If I get in the habit of declaring things globally but not protected by
namespaces I'm afraid it would only be a matter of time before bar7()
calls bar8() and changes values unexpectedly.

Finally, I think that this change is going to be sufficient. This is a
computational electromagnetics code, and the complex math involved
within bar() is quite intensive (it performs a four-dimensional numerical
quadrature to yield an electromagnetic field), and should overwhelm
any more minor tweaking that is possible. (Famous last words...I thought
the same about the constructor!)

Thanks for the help everyone. I really appreciate it, and I have learned
quite a bit about the cost of constructors that seem really, really
simple on the surface!

- Jim
Apr 13 '07 #17
On 2007-04-13, Jim West <eg***********@yahoo.comwrote:
code. Initializing all components to zero seemed like a Good Idea(tm)
when I first wrote it a long time ago, but in hindsight it was clearly a
mistake. In any event, I copied the header to this particular file and
changed FOO to FOO2 everywhere and changed the default constructor to

FOO2() { };

This did not give any improvement, surprisingly.
I take that back. I mis-read 33.8 seconds as 38 seconds (I must be
really tired). 33.8 is only slightly slower than I get with the
global arrays. I'll have to re-consider changing the class default
constructor and see how many things really do break.

I think I'll go to bed now before I hurt myself.
Apr 13 '07 #18
Jim West wrote:
On 2007-04-13, Jim West <eg***********@yahoo.comwrote:
>code. Initializing all components to zero seemed like a Good Idea(tm)
when I first wrote it a long time ago, but in hindsight it was clearly a
mistake. In any event, I copied the header to this particular file and
changed FOO to FOO2 everywhere and changed the default constructor to

FOO2() { };

This did not give any improvement, surprisingly.

I take that back. I mis-read 33.8 seconds as 38 seconds (I must be
really tired). 33.8 is only slightly slower than I get with the
global arrays. I'll have to re-consider changing the class default
constructor and see how many things really do break.

I think I'll go to bed now before I hurt myself.
Benchmarking sure is fun!

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
Apr 13 '07 #19
Jim West wrote:
On 2007-04-13, Jim West <eg***********@yahoo.comwrote:
>>code. Initializing all components to zero seemed like a Good Idea(tm)
when I first wrote it a long time ago, but in hindsight it was clearly a
mistake. In any event, I copied the header to this particular file and
changed FOO to FOO2 everywhere and changed the default constructor to

FOO2() { };

This did not give any improvement, surprisingly.


I take that back. I mis-read 33.8 seconds as 38 seconds (I must be
really tired). 33.8 is only slightly slower than I get with the
global arrays. I'll have to re-consider changing the class default
constructor and see how many things really do break.
I suggest you investigate profilers for your platform/tools, I'm
guessing your platform is Linux, if so, Sun Studio has some excellent
profiling and analysis tools you could use.

--
Ian Collins.
Apr 13 '07 #20

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

33
by: MLH | last post by:
I've read some posts indicating that having tons of GV's in an Access app is a bad idea. Personally, I love GVs and I use them (possibly abuse them) all the time for everything imaginable - have...
7
by: Rajeev | last post by:
Hello, I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations turned on. For the following loop R=(evaluate here); // float N=(evaluate here); // N min=1 max=100...
15
by: MackS | last post by:
The system I am working on supports a subset of C99, among which "standard-compliant VLAs". I've already learnt that VLAs can't have global scope. My question is whether I can safely declare a...
12
by: rodneys | last post by:
Hi, please take a look to this sample code: class MyClass { private: static int length ; public: static void setLength(int newLength) ; void do() ;
5
by: wkaras | last post by:
I've compiled this code: const int x0 = 10; const int x1 = 20; const int x2 = 30; int x = { x2, x0, x1 }; struct Y {
206
by: WaterWalk | last post by:
I've just read an article "Building Robust System" by Gerald Jay Sussman. The article is here: http://swiss.csail.mit.edu/classes/symbolic/spring07/readings/robust-systems.pdf In it there is a...
18
by: terminator(jam) | last post by:
consider: struct memory_pig{//a really large type: memory_pig(){ std::cout<<"mem pig default\n"; //etc... }; memory_pig(memory_pig const&){
7
by: =?GB2312?B?zPC5zw==?= | last post by:
Howdy, I wonder whether python compiler does basic optimizations to .py. Eg: t = self.a.b t.c = ... t.d = ... ..vs. self.a.b.c = ... self.a.b.d = ... which one is more effective? Since each...
4
by: raylopez99 | last post by:
Why is the same variable local inside a 'foreach' loop yet 'global' in scope (or to the class) outside it? RL class MyClass { int MyMemberArray1; //member variables, arrays, that are...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.