By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,483 Members | 3,229 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,483 IT Pros & Developers. It's quick & easy.

are arrays contiguous in memory?

P: n/a
I looked at the addresses in an 'array<>' during debug and noticed that the
addresses were contiguous. Is this guaranteed, or just something it does if
it can?

[==P==]

PS = VS C++.NET 2005 Express using clr:/pure syntax
Dec 18 '05 #1
Share this Question
Share on Google+
38 Replies


P: n/a
I guess that if you lock the array, the array will be contiguous to use a
pointer as it is the only safe way you can access the array through a
pointer :-)
"Peteroid" <pe************@msn.com> a écrit dans le message de
news:%2***************@TK2MSFTNGP12.phx.gbl...
I looked at the addresses in an 'array<>' during debug and noticed that the addresses were contiguous. Is this guaranteed, or just something it does if it can?

[==P==]

PS = VS C++.NET 2005 Express using clr:/pure syntax

Dec 18 '05 #2

P: n/a
Peteroid wrote:
I looked at the addresses in an 'array<>' during debug and noticed
that the addresses were contiguous. Is this guaranteed, or just
something it does if it can?


Managed arrays are always contiguous, but you can only observe/take
advantange of that fact by getting a pinning pointer into the array.
Naturally, if it's an array of reference type, then the contiguous array
will contain only pointers to the contained objects, which in all likelihood
will not be contiguous.

-cd
Dec 18 '05 #3

P: n/a

"Peteroid" <pe************@msn.com> wrote in message
news:%2***************@TK2MSFTNGP12.phx.gbl...
I looked at the addresses in an 'array<>' during debug and noticed that the
addresses were contiguous. Is this guaranteed, or just something it does if
it can?

[==P==]

PS = VS C++.NET 2005 Express using clr:/pure syntax


Yes.

Willy.
Dec 18 '05 #4

P: n/a

"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:Og**************@TK2MSFTNGP14.phx.gbl...
Peteroid wrote:
I looked at the addresses in an 'array<>' during debug and noticed
that the addresses were contiguous. Is this guaranteed, or just
something it does if it can?


Managed arrays are always contiguous, but you can only observe/take
advantange of that fact by getting a pinning pointer into the array.
Naturally, if it's an array of reference type, then the contiguous array
will contain only pointers to the contained objects, which in all
likelihood will not be contiguous.

-cd


Not sure what you mean with "take/advantage", in managed code there is no
need to pin pointers unless you need to pass the array or one of its
elements to unmanaged code. But pinned or not, managed array's are always
contigious, just like non managed array's.

Willy.
Dec 18 '05 #5

P: n/a
Willy Denoyette [MVP] wrote:
Not sure what you mean with "take/advantage", in managed code there
is no need to pin pointers unless you need to pass the array or one
of its elements to unmanaged code. But pinned or not, managed array's
are always contigious, just like non managed array's.


take-advantage-of as in use pointer arithmetic to access array elements.
Not of much (any?) value unless you're writing unmanaged code.

--cd
Dec 19 '05 #6

P: n/a
It could be very useful to use pointer arithmetic, for performance reasons,
in managed code, at least in C#. I guest that is the same in managed
C++/CLI.
For example:
http://msdn.microsoft.com/library/de...rp11152001.asp
--
Un saludo
Rodrigo Corral González [MVP]

FAQ de microsoft.public.es.vc++
http://rcorral.mvps.org


Dec 19 '05 #7

P: n/a
Yes, arrays are ref types and the GC always allocates memory in contiguous
blocks.

--
Regards,
Nish [VC++ MVP]
"Peteroid" <pe************@msn.com> wrote in message
news:%2***************@TK2MSFTNGP12.phx.gbl...
I looked at the addresses in an 'array<>' during debug and noticed that the
addresses were contiguous. Is this guaranteed, or just something it does if
it can?

[==P==]

PS = VS C++.NET 2005 Express using clr:/pure syntax

Dec 19 '05 #8

P: n/a
>>Managed arrays are always contiguous, but you can only observe/take
advantange of that fact by getting a pinning pointer into the array. <<

An interior pointer would suffice too, Carl.

--
Regards,
Nish [VC++ MVP]
"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:Og**************@TK2MSFTNGP14.phx.gbl...
Peteroid wrote:
I looked at the addresses in an 'array<>' during debug and noticed
that the addresses were contiguous. Is this guaranteed, or just
something it does if it can?


Managed arrays are always contiguous, but you can only observe/take
advantange of that fact by getting a pinning pointer into the array.
Naturally, if it's an array of reference type, then the contiguous array
will contain only pointers to the contained objects, which in all
likelihood will not be contiguous.

-cd

Dec 19 '05 #9

P: n/a

"Rodrigo Corral [MVP]" <co*********@hotmail.com> wrote in message
news:ei*************@TK2MSFTNGP15.phx.gbl...
It could be very useful to use pointer arithmetic, for performance
reasons, in managed code, at least in C#. I guest that is the same in
managed C++/CLI.
For example:
http://msdn.microsoft.com/library/de...rp11152001.asp
--
Un saludo
Rodrigo Corral González [MVP]

FAQ de microsoft.public.es.vc++
http://rcorral.mvps.org

Beware that the sample in the article uses pointer arithmetic (C# unsafe
code) to directly access an unmanaged array (a bitmap is unmanaged data!!),
there is no need to pin here.
I see no reason whatsoever to use this to access managed arrays.

Willy.

Dec 19 '05 #10

P: n/a
Do you guy's know any "arrays" that aren't contigious?

Willy.

"Nishant Sivakumar" <ni**@nospam.asianetindia.com> wrote in message
news:%2*****************@TK2MSFTNGP15.phx.gbl...
Yes, arrays are ref types and the GC always allocates memory in contiguous
blocks.

--
Regards,
Nish [VC++ MVP]
"Peteroid" <pe************@msn.com> wrote in message
news:%2***************@TK2MSFTNGP12.phx.gbl...
I looked at the addresses in an 'array<>' during debug and noticed that
the addresses were contiguous. Is this guaranteed, or just something it
does if it can?

[==P==]

PS = VS C++.NET 2005 Express using clr:/pure syntax


Dec 19 '05 #11

P: n/a

"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:e9**************@TK2MSFTNGP15.phx.gbl...
Willy Denoyette [MVP] wrote:
Not sure what you mean with "take/advantage", in managed code there
is no need to pin pointers unless you need to pass the array or one
of its elements to unmanaged code. But pinned or not, managed array's
are always contigious, just like non managed array's.


take-advantage-of as in use pointer arithmetic to access array elements.
Not of much (any?) value unless you're writing unmanaged code.

--cd


Carl,

Sorry to be dense, but when talking about advantage, you mean advantage in
terms of what and compared to what? I don't see any advantage to use pointer
arithmetic to access "managed" array elements over let's say indexed access,
I'm I missing something?

Willy.

Dec 19 '05 #12

P: n/a
Willy Denoyette [MVP] wrote:
Carl,

Sorry to be dense, but when talking about advantage, you mean
advantage in terms of what and compared to what? I don't see any
advantage to use pointer arithmetic to access "managed" array
elements over let's say indexed access, I'm I missing something?


It's a figure of speach - don't put too much weight in it!

The only advantage would be to use it with existing code that does pointer
arithmetic when accessing an array.

In real terms, there is no advantage, since in C, a[n] is equivalent to
*(a+n) by definition.

-cd
Dec 19 '05 #13

P: n/a
Willy Denoyette [MVP] wrote:
Do you guy's know any "arrays" that aren't contigious?


Matrix arithmetic libraries typically supply some kind of sparse array
that's not contiguous, but those are definitely special purpose things. It
would seem very silly to make a general purpose array non-contiguous.

-cd
Dec 19 '05 #14

P: n/a

"Carl Daniel [VC++ MVP]" <cp*****************************@mvps.org.nospam >
wrote in message news:%2****************@TK2MSFTNGP14.phx.gbl...
Willy Denoyette [MVP] wrote:
Carl,

Sorry to be dense, but when talking about advantage, you mean
advantage in terms of what and compared to what? I don't see any
advantage to use pointer arithmetic to access "managed" array
elements over let's say indexed access, I'm I missing something?


It's a figure of speach - don't put too much weight in it!

The only advantage would be to use it with existing code that does pointer
arithmetic when accessing an array.

In real terms, there is no advantage, since in C, a[n] is equivalent to
*(a+n) by definition.

-cd


Carl,

Agreed, all I want is to make sure people don't start to believe that using
pointer arithmetic has performance advantages when accessing arrays
(elements). I know a lot of C# (and some C++) developers believe that there
is such advantage when using unsafe constructs, but it's most often not the
case, worse, they start to pin arrays for a long period of time, effectively
disturbing the GC activity which results in reduced overall performance.

Willy.
Dec 19 '05 #15

P: n/a
Willy Denoyette [MVP] wrote:
Agreed, all I want is to make sure people don't start to believe that
using pointer arithmetic has performance advantages when accessing
arrays (elements). I know a lot of C# (and some C++) developers believe
that there is such advantage when using unsafe constructs, but it's
most often not the case, worse, they start to pin arrays for a long
period of time, effectively disturbing the GC activity which results in
reduced overall performance.


There is one performance benefit for using pointers over indexing for CLR
arrays. When indexing a CLR array, there is a bounds check. That check is
eliminated when using a pointer to iterate over the array.

Of course, that's a trade off between safety and performance. One should be
very careful when making that decision. Because the JIT compiler has the
ability to eliminate bounds checks in some cases, switching to pointers is
not always a worthwhile exercise. Though, it is definitely useful in some
cases where bitmap manipulation is done. This is probably the only reason
why interior pointers exist in the new C++. If it wasn't for this case, I
probably would have succeeded at keeping them out of the language entirely.

--
Brandon Bray, Visual C++ Compiler http://blogs.msdn.com/branbray/
Bugs? Suggestions? Feedback? http://msdn.microsoft.com/productfeedback/
Dec 19 '05 #16

P: n/a
Willy Denoyette [MVP] wrote:
Do you guy's know any "arrays" that aren't contigious?


A double-linked list?

Arnaud
MVP - VC
Dec 19 '05 #17

P: n/a

"Arnaud Debaene" <ad******@club-internet.fr> wrote in message
news:Oa**************@TK2MSFTNGP09.phx.gbl...
Willy Denoyette [MVP] wrote:
Do you guy's know any "arrays" that aren't contigious?


A double-linked list?

Arnaud
MVP - VC


Why a double linked list and not a linked list?
Anyway, it's not what I would call an array (or a vector).

Willy.
Dec 19 '05 #18

P: n/a

"Brandon Bray [MSFT]" <br******@online.microsoft.com> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...
Willy Denoyette [MVP] wrote:
Agreed, all I want is to make sure people don't start to believe that
using pointer arithmetic has performance advantages when accessing
arrays (elements). I know a lot of C# (and some C++) developers believe
that there is such advantage when using unsafe constructs, but it's
most often not the case, worse, they start to pin arrays for a long
period of time, effectively disturbing the GC activity which results in
reduced overall performance.
There is one performance benefit for using pointers over indexing for CLR
arrays. When indexing a CLR array, there is a bounds check. That check is
eliminated when using a pointer to iterate over the array.

That's true, though the overhead is real small (something like a cmp and a
jmp) and only important to consider for some operations on arrays.
Of course, that's a trade off between safety and performance. One should
be very careful when making that decision. Because the JIT compiler has
the ability to eliminate bounds checks in some cases, switching to
pointers is not always a worthwhile exercise. Though, it is definitely
useful in some cases where bitmap manipulation is done. This is probably
the only reason why interior pointers exist in the new C++. If it wasn't
for this case, I probably would have succeeded at keeping them out of the
language entirely.

Well, as I said in another message, a Bitmap (assumed we talk about
GDI/GDI+) is stored in unmanaged (GDI+ allocated) memory and accessing them
directly using pointers is a big perf. win, but that's because accessing
them using the GDI+ methods is extremely slow.
--
Brandon Bray, Visual C++ Compiler http://blogs.msdn.com/branbray/
Bugs? Suggestions? Feedback? http://msdn.microsoft.com/productfeedback/


Willy.
Dec 19 '05 #19

P: n/a
Wow, you ask a simple question..... : )

As I understand it, the 'old' STL containers make a distinction in this
regard. List's are linked lists, and so are not necessarily contiguous in
memory. Vector's are the STL equivalent of contiguous lists. The reason for
both is that vector's are better at lookup performance (randomly accessible
and therefore a fixed length of lookup time); but suffer when being deleted
from or resized, since this is often done by moving the entire end of the
list and allocating a bigger vector somewhere else and copying over,
respectively. List's are worse at lookup performance since they require
serial lookup (linking thorugh the list from the header, which is takes a
variable length of time based on 'distance' from header), but benefit when
it comes to deletion and resizing since an element can be removed without
moving anything else, and resizing just means finding a large enough memory
space anywhere for the added element.

My opinion is this: with a faster computer and more memory it really doesn't
matter which method is used! : )

[==P==]

"Willy Denoyette [MVP]" <wi*************@telenet.be> wrote in message
news:um**************@tk2msftngp13.phx.gbl...

"Arnaud Debaene" <ad******@club-internet.fr> wrote in message
news:Oa**************@TK2MSFTNGP09.phx.gbl...
Willy Denoyette [MVP] wrote:
Do you guy's know any "arrays" that aren't contigious?


A double-linked list?

Arnaud
MVP - VC


Why a double linked list and not a linked list?
Anyway, it's not what I would call an array (or a vector).

Willy.

Dec 19 '05 #20

P: n/a
Peter Oliphant wrote:
Wow, you ask a simple question..... : )

As I understand it, the 'old' STL containers make a distinction in
this regard. List's are linked lists, and so are not necessarily
contiguous in memory. Vector's are the STL equivalent of contiguous
lists. The reason for both is that vector's are better at lookup
performance (randomly accessible and therefore a fixed length of
lookup time); but suffer when being deleted from or resized, since
this is often done by moving the entire end of the list and
allocating a bigger vector somewhere else and copying over,
respectively. List's are worse at lookup performance since they
require serial lookup (linking thorugh the list from the header,
which is takes a variable length of time based on 'distance' from
header), but benefit when it comes to deletion and resizing since an
element can be removed without moving anything else, and resizing
just means finding a large enough memory space anywhere for the added
element. And there is also in intermediate "mix" between the two : the deque.

My opinion is this: with a faster computer and more memory it really
doesn't matter which method is used! : )

This is of course basically wrong : processor speed has never changed
anything to algorithm complexity. When manipulating great number of elements
(several tens of thousands), the difference between logarithmic and linear
complexity is as important today as it was 10 years ago.

Arnaud
MVP - VC
Dec 19 '05 #21

P: n/a
> This is of course basically wrong : processor speed has never changed
anything to algorithm complexity. When manipulating great number of
elements (several tens of thousands), the difference between logarithmic
and linear complexity is as important today as it was 10 years ago.
Since in my days I've had to write 'clever' code to combat slow computer
speed, I have to disagree with any blanketing statement along the lines of
"processor speed has never changed anything to algorithm complexity".

For example, this is very algorithmically simple:

int Square( int X )
{
return X * X ;
}

This however, is not:

int Square( int X )
{
if ( X < 0 ) { X = -X ; }
int Y = 1 ;
int DY = 3 ;
for ( int i = 1 ; i < X ; i++ )
{
Y += DY ;
DY += 2 ;
}
return Y ;
}

These methods both do the same thing (if you don't believe me, test them!).
But, in the 'old days' the former method would have been the wrong way to go
for small values of X since in those the days people counted clock cycles,
and the multiply instruction took about 50-100 longers to do than the add
instruction.

But since computers, through piping and other methods, have gotten adds and
mutliplies down to the same 1-cycle speed, nowadays the algorithmically
simpler method is the way to go. That's because today this algorithmically
simpler method executes faster than the 'clever' one (and as a side benefit
saves memory, is more intuitive, is less prone to error, and can be
programmed in less time).

So, theoretically you are correct. In practice, you're not-so-correct... : )

[==P==]

PS - A third method would be to create a lookup table by pre-calculating all
the squares and look them up, and works for even large values of X. This is
even more algorithmically complex as it requires a setup phase. But look-up
tables were the norm until a few years ago when computers got fast enough to
do things 'on the fly'. This is why we can entertain 'procedural textures'
in 3D worlds, something that would have been algorithmic suicide to attempt
when computers were slower and more inefficient when dealing with
graphics...
"Arnaud Debaene" <ad******@club-internet.fr> wrote in message
news:OM**************@TK2MSFTNGP12.phx.gbl... Peter Oliphant wrote:
Wow, you ask a simple question..... : )

As I understand it, the 'old' STL containers make a distinction in
this regard. List's are linked lists, and so are not necessarily
contiguous in memory. Vector's are the STL equivalent of contiguous
lists. The reason for both is that vector's are better at lookup
performance (randomly accessible and therefore a fixed length of
lookup time); but suffer when being deleted from or resized, since
this is often done by moving the entire end of the list and
allocating a bigger vector somewhere else and copying over,
respectively. List's are worse at lookup performance since they
require serial lookup (linking thorugh the list from the header,
which is takes a variable length of time based on 'distance' from
header), but benefit when it comes to deletion and resizing since an
element can be removed without moving anything else, and resizing
just means finding a large enough memory space anywhere for the added
element.

And there is also in intermediate "mix" between the two : the deque.

My opinion is this: with a faster computer and more memory it really
doesn't matter which method is used! : )

This is of course basically wrong : processor speed has never changed
anything to algorithm complexity. When manipulating great number of
elements (several tens of thousands), the difference between logarithmic
and linear complexity is as important today as it was 10 years ago.

Arnaud
MVP - VC

Dec 19 '05 #22

P: n/a
One thing that is missing from this discussion is the importance of cache effects when iterating. If your vector accesses are distributed all over memory like they might be with a linked list that has been sorted, then your cache miss rate will be significantly higher than if you iterate over a contiguous block of memory. This effect is drastic because L1 cache can be well over 20 times faster than RAM.

Another thing here is the fact that modern compilers have extremely good optimizations for looping over contiguous arrays because the compiler knows that it is working with contiguous memory.

That said, depending on what you are doing, it is not always worth it to sacrifice an order of complexity on an algorithm just to get these benefits. The two things I describe above are only constant time improvements, while dispersed memory vectors (such as linked lists) can sometimes give you order N or greater improvements over contiguous arrays.

We kind of got off topic from the original question but this is a good discussion!

-Alex

-----Original Message-----
From: Arnaud Debaene
Posted At: Monday, December 19, 2005 1:38 PM
Posted To: microsoft.public.dotnet.languages.vc
Conversation: are arrays contiguous in memory?
Subject: Re: are arrays contiguous in memory?
Peter Oliphant wrote:
Wow, you ask a simple question..... : )

As I understand it, the 'old' STL containers make a distinction in
this regard. List's are linked lists, and so are not necessarily
contiguous in memory. Vector's are the STL equivalent of contiguous
lists. The reason for both is that vector's are better at lookup
performance (randomly accessible and therefore a fixed length of
lookup time); but suffer when being deleted from or resized, since
this is often done by moving the entire end of the list and
allocating a bigger vector somewhere else and copying over,
respectively. List's are worse at lookup performance since they
require serial lookup (linking thorugh the list from the header,
which is takes a variable length of time based on 'distance' from
header), but benefit when it comes to deletion and resizing since an
element can be removed without moving anything else, and resizing
just means finding a large enough memory space anywhere for the added
element. And there is also in intermediate "mix" between the two : the deque.

My opinion is this: with a faster computer and more memory it really
doesn't matter which method is used! : )

This is of course basically wrong : processor speed has never changed
anything to algorithm complexity. When manipulating great number of elements
(several tens of thousands), the difference between logarithmic and linear
complexity is as important today as it was 10 years ago.

Arnaud
MVP - VC
Dec 19 '05 #23

P: n/a
> These methods both do the same thing (if you don't believe me, test
them!). But, in the 'old days' the former method would have been the wrong
way to go for small values of X since in those the days people counted
clock cycles, and the multiply instruction took about 50-100 longers to
do than the add instruction.

But since computers, through piping and other methods, have gotten adds
and mutliplies down to the same 1-cycle speed, nowadays the
algorithmically simpler method is the way to go. That's because today this
algorithmically simpler method executes faster than the 'clever' one (and
as a side benefit saves memory, is more intuitive, is less prone to error,
and can be programmed in less time).

So, theoretically you are correct. In practice, you're not-so-correct...
: )


This argument is entirely irrelevant.

Arnaud has correctly pointed out that lookups with lists are of O(n) whereas
lookups with vectors are of constant order. Advances in processor speeds
don't change this. Advances in floating point arithmetic don't change this.
He has correctly challenged your statement that "with a faster computer and
more memory it really doesn't matter which method is used!". Indeed, for
large amounts of data, it matters a lot.

Best regards,

Brian
Dec 20 '05 #24

P: n/a
"Brandon Bray [MSFT]" <br******@online.microsoft.com> wrote
Of course, that's a trade off between safety and performance. One should
be very careful when making that decision. Because the JIT compiler has
the ability to eliminate bounds checks in some cases, switching to
pointers is not always a worthwhile exercise. Though, it is definitely
useful in some cases where bitmap manipulation is done. This is probably
the only reason why interior pointers exist in the new C++. If it wasn't
for this case, I probably would have succeeded at keeping them out of the
language entirely.

Ok, maybe I got too many islays today. But you can't be serious about that,
can you?

Am I missing something or do you say (plain) pointers are (and have always
been)
useless? What about by-ref params in managed signatures?

-hg
Dec 20 '05 #25

P: n/a
Willy Denoyette [MVP] wrote:
That's true, though the overhead is real small (something like a cmp and a
jmp) and only important to consider for some operations on arrays.


In most cases it is small, but if you have 200 million bytes to iterate
through in a large scanned image (let's say to apply a lookup table to
each pixel), it's better to go back to unmanaged. It's not that hard to
make such operations pretty safe. So it's good to know that a pinned
array is contiguous. In an average algorithm with 100 items in an array
the performance most likely doesn't matter and it's better to be on the
safe side.

Tom
Dec 20 '05 #26

P: n/a
Peter Oliphant wrote:
This is of course basically wrong : processor speed has never changed
anything to algorithm complexity. When manipulating great number of
elements (several tens of thousands), the difference between logarithmic
and linear complexity is as important today as it was 10 years ago.

Since in my days I've had to write 'clever' code to combat slow computer
speed, I have to disagree with any blanketing statement along the lines of
"processor speed has never changed anything to algorithm complexity".

For example, this is very algorithmically simple:

int Square( int X )
{
return X * X ;
}

This however, is not:

int Square( int X )
{
if ( X < 0 ) { X = -X ; }
int Y = 1 ;
int DY = 3 ;
for ( int i = 1 ; i < X ; i++ )
{
Y += DY ;
DY += 2 ;
}
return Y ;
}

These methods both do the same thing (if you don't believe me, test them!).
But, in the 'old days' the former method would have been the wrong way to go
for small values of X since in those the days people counted clock cycles,
and the multiply instruction took about 50-100 longers to do than the add
instruction.

[snip]

Peter:

Cute, but actually your function does not work for X=0. A correct (and
likely more efficient) version of the same idea is:

int Square( int X )
{
if ( X < 0 ) { X = -X ; }
int Y = 0 ;
for ( int i = 1 ; i < X ; i++ )
{
Y += i ;
}
return Y + Y + X ;
}

A simple way to see that this is correct is to observe that Y is a sum
of (X-1) terms whose average is X/2 (average of first and last term).

David Wilkinson
Dec 20 '05 #27

P: n/a
> He has correctly challenged your statement that "with a faster computer
and more memory it really doesn't matter which method is used!". Indeed,
for large amounts of data, it matters a lot.


Except this statement is taken out of context. My ORIGINAL statement was
made in an obviously jovial manner (note smiley at end that has been omitted
since), it was about the SPECIFIC problem regarding contiguous memory (the
topic of this thread), and the 'challenge' was then made in the GENERAL
case. That's not fair... : )

To be serious here, we are talking about two different things here. I'm
talking about how processor speed makes a big difference in algorithmic
design choice in most real world applications. You guys are talking about
computer science. Apples and ibms - er - oranges...

[==P==]
"Brian Muth" <bm***@mvps.org> wrote in message
news:ul**************@TK2MSFTNGP14.phx.gbl...
These methods both do the same thing (if you don't believe me, test
them!). But, in the 'old days' the former method would have been the
wrong way to go for small values of X since in those the days people
counted clock cycles, and the multiply instruction took about 50-100
longers to do than the add instruction.

But since computers, through piping and other methods, have gotten adds
and mutliplies down to the same 1-cycle speed, nowadays the
algorithmically simpler method is the way to go. That's because today
this algorithmically simpler method executes faster than the 'clever' one
(and as a side benefit saves memory, is more intuitive, is less prone to
error, and can be programmed in less time).

So, theoretically you are correct. In practice, you're not-so-correct...
: )


This argument is entirely irrelevant.

Arnaud has correctly pointed out that lookups with lists are of O(n)
whereas lookups with vectors are of constant order. Advances in processor
speeds don't change this. Advances in floating point arithmetic don't
change this. He has correctly challenged your statement that "with a
faster computer and more memory it really doesn't matter which method is
used!". Indeed, for large amounts of data, it matters a lot.

Best regards,

Brian

Dec 20 '05 #28

P: n/a
I agree that pin is not needed.
The point in the article is that you get better performance using pointer
arithmetic, so in some scenarios to use pointer arithmetic is valuable.

Carl Daniel wrote:

"Not of much (any?) value unless you're writing unmanaged code."

I wanted to show that pointer arithmetic is valuable even in unmanaged code.

--
Un saludo
Rodrigo Corral González [MVP]

FAQ de microsoft.public.es.vc++
http://rcorral.mvps.org
Dec 20 '05 #29

P: n/a
Dang, you're right! I needed to add the line at the top:

if ( X == 0 ) { return 0 ; }

And your Square( ) method is cool too! : )

[==P==]

"David Wilkinson" <no******@effisols.com> wrote in message
news:O2*************@TK2MSFTNGP15.phx.gbl...
Peter Oliphant wrote:
This is of course basically wrong : processor speed has never changed
anything to algorithm complexity. When manipulating great number of
elements (several tens of thousands), the difference between logarithmic
and linear complexity is as important today as it was 10 years ago.

Since in my days I've had to write 'clever' code to combat slow computer
speed, I have to disagree with any blanketing statement along the lines
of "processor speed has never changed anything to algorithm complexity".

For example, this is very algorithmically simple:

int Square( int X )
{
return X * X ;
}

This however, is not:

int Square( int X )
{
if ( X < 0 ) { X = -X ; }
int Y = 1 ;
int DY = 3 ;
for ( int i = 1 ; i < X ; i++ )
{
Y += DY ;
DY += 2 ;
}
return Y ;
}

These methods both do the same thing (if you don't believe me, test
them!). But, in the 'old days' the former method would have been the
wrong way to go for small values of X since in those the days people
counted clock cycles, and the multiply instruction took about 50-100
longers to do than the add instruction.

[snip]

Peter:

Cute, but actually your function does not work for X=0. A correct (and
likely more efficient) version of the same idea is:

int Square( int X )
{
if ( X < 0 ) { X = -X ; }
int Y = 0 ;
for ( int i = 1 ; i < X ; i++ )
{
Y += i ;
}
return Y + Y + X ;
}

A simple way to see that this is correct is to observe that Y is a sum of
(X-1) terms whose average is X/2 (average of first and last term).

David Wilkinson

Dec 20 '05 #30

P: n/a

"Tamas Demjen" <td*****@yahoo.com> wrote in message
news:OS*************@TK2MSFTNGP10.phx.gbl...
Willy Denoyette [MVP] wrote:
That's true, though the overhead is real small (something like a cmp and
a jmp) and only important to consider for some operations on arrays.


In most cases it is small, but if you have 200 million bytes to iterate
through in a large scanned image (let's say to apply a lookup table to
each pixel), it's better to go back to unmanaged. It's not that hard to
make such operations pretty safe. So it's good to know that a pinned array
is contiguous. In an average algorithm with 100 items in an array the
performance most likely doesn't matter and it's better to be on the safe
side.

Tom


Tom,

All depends on the time you spend to build the lookup table (per pixel), say
that each pixel requires 8 instructions, then agreed, the two instructions
overhead is important, however if it takes 40 instructions per pixel (not to
forget the high number of L1 and L2 cache misses, typical for such large
arrays), the overhead is just negligible. Note that you are talking about a
pinned array, but I'm pretty sure your scanned image will reside in
unmanaged heap memory, so pinning isn't applicable here. Also, if you are
building your lookup table in the native heap and you are looking for the
highest performance, I would go for pure native code and forget about
managed code.
Note, as I said in another reply, arrays are always contagious (pinned or
not), if they aren't they are no arrays.

Willy.
Dec 20 '05 #31

P: n/a
emphasis mine:
Note, as I said in another reply, arrays are always **contagious** (pinned
or not), if they aren't they are no arrays.
AHHH! That might explain why I think I'm getting the flu! LOL

[==P==]

PS - I'm the KING of typos, so don't feel too bad... : )

"Willy Denoyette [MVP]" <wi*************@telenet.be> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
"Tamas Demjen" <td*****@yahoo.com> wrote in message
news:OS*************@TK2MSFTNGP10.phx.gbl...
Willy Denoyette [MVP] wrote:
That's true, though the overhead is real small (something like a cmp and
a jmp) and only important to consider for some operations on arrays.


In most cases it is small, but if you have 200 million bytes to iterate
through in a large scanned image (let's say to apply a lookup table to
each pixel), it's better to go back to unmanaged. It's not that hard to
make such operations pretty safe. So it's good to know that a pinned
array is contiguous. In an average algorithm with 100 items in an array
the performance most likely doesn't matter and it's better to be on the
safe side.

Tom


Tom,

All depends on the time you spend to build the lookup table (per pixel),
say that each pixel requires 8 instructions, then agreed, the two
instructions overhead is important, however if it takes 40 instructions
per pixel (not to forget the high number of L1 and L2 cache misses,
typical for such large arrays), the overhead is just negligible. Note that
you are talking about a pinned array, but I'm pretty sure your scanned
image will reside in unmanaged heap memory, so pinning isn't applicable
here. Also, if you are building your lookup table in the native heap and
you are looking for the highest performance, I would go for pure native
code and forget about managed code.
Note, as I said in another reply, arrays are always contagious (pinned or
not), if they aren't they are no arrays.

Willy.

Dec 20 '05 #32

P: n/a

"Rodrigo Corral [MVP]" <co*********@hotmail.com> wrote in message
news:%2***************@TK2MSFTNGP09.phx.gbl...
I agree that pin is not needed.
The point in the article is that you get better performance using pointer
arithmetic, so in some scenarios to use pointer arithmetic is valuable.


Yes better performance, but compared to what? What is done in this articles
code (C#) is "compensating" the inherently slow GDI+ access to image frame
buffers by using native pointer accesses (by the way, there is no other way
to access native heap memory from managed code). IMO if you feel that you
need to resort to "unsafe" constructs like this (in C#) you probably have
choosen the wrong language :-).

Willy.


Dec 20 '05 #33

P: n/a
Willy Denoyette [MVP] wrote:
All depends on the time you spend to build the lookup table (per pixel), say
that each pixel requires 8 instructions, then agreed, the two instructions
overhead is important


An extra array bounds checking matters with very simple algorithms (such
as applying a lookup table, FIR filter, or FFT). Even the difference
between pointer and index arithmetics matters. array[i++] requires two
registers, while *p_array++ requires only one. The Intel processors have
a very unforunate design with especially few registers, and in a triple
nested loop it may make a significant difference. Often if you can
eliminate a register at the price of some extra calculation (such as * 2
or + Const), overall you may still gain speed. I wish the CPU had at
least twice as many registers -- image processing could be 25% faster.

On the other hand, optimizing compilers are so good these days that hand
optimizing the assembly code will rarely be any faster than writing it
in C++. FIR filtering is one of the cases when a carefully hand
optimized assembly code is twice as fast as the best C code.

However, what matters by far the most is to optimize (reduce) the cache
misses. If you can organize the loops so that you minimize cache misses,
you can sometimes be twice as fast, even if such organization requires
quite a bit of extra calculations and a couple of additional loop
nesting levels.

Tom
Dec 20 '05 #34

P: n/a

"Peter Oliphant" <po*******@RoundTripInc.com> wrote in message
news:Oe****************@TK2MSFTNGP12.phx.gbl...
emphasis mine:
Note, as I said in another reply, arrays are always **contagious** (pinned
or not), if they aren't they are no arrays.


AHHH! That might explain why I think I'm getting the flu! LOL

[==P==]

PS - I'm the KING of typos, so don't feel too bad... : )


Now I'm disappointed I thought I was the KING :-((.

Willy.


Dec 20 '05 #35

P: n/a
Tom,

"Tamas Demjen" <td*****@yahoo.com> wrote in message
news:ec**************@tk2msftngp13.phx.gbl...
Willy Denoyette [MVP] wrote:
All depends on the time you spend to build the lookup table (per pixel),
say that each pixel requires 8 instructions, then agreed, the two
instructions overhead is important
An extra array bounds checking matters with very simple algorithms (such
as applying a lookup table, FIR filter, or FFT). Even the difference
between pointer and index arithmetics matters. array[i++] requires two
registers, while *p_array++ requires only one. The Intel processors have a
very unforunate design with especially few registers, and in a triple
nested loop it may make a significant difference. Often if you can
eliminate a register at the price of some extra calculation (such as * 2
or + Const), overall you may still gain speed. I wish the CPU had at least
twice as many registers -- image processing could be 25% faster.

I assume we are still talking about managed code do we? Well, here you are
at the mercy of the JIT compiler to do the right thing, and believe me while
he does a great job, he will not (yet) be able to optimize this the way you
expect. That's why I said, if you ever find out that there is a performance
issue (after carefull measurement/profiling), you probably have choosen the
wrong language/platform (by platform I mean the CLR).
But again, don't underestimate the impact of cache misses when accessing
large data types (nothing kills your throughput faster than a ton of L1/L2
cache misses), and don't overestimate the importance of register
allocations, the on chip cache (L1) is extremely fast as well.
I've run a lot of benchmarks lately, using the 32 bit and 64 bit native and
managed code on both X86 and on X64, and believe me the results where on par
despite the fact that X64 (AMD Athlon64 dual core) has a much larger
register stack (and SSE/SSE2).
On the other hand, optimizing compilers are so good these days that hand
optimizing the assembly code will rarely be any faster than writing it in
C++. FIR filtering is one of the cases when a carefully hand optimized
assembly code is twice as fast as the best C code.
Well, I'm sure you could prove your claim by running some microbench marks,
but is this representative for a real world application, where a small
'mistake' in your design might ruin all you did in your highly optimized ASM
code? Could be wrong, but I don't think so. Anyway we are still talking
about managed code, and this is not the problem domain targetetted by the
CLR.
However, what matters by far the most is to optimize (reduce) the cache
misses. If you can organize the loops so that you minimize cache misses,
you can sometimes be twice as fast, even if such organization requires
quite a bit of extra calculations and a couple of additional loop nesting
levels.

How would you reduce the cache misses when you have to deal with large (>
1Mb) array's? The cache lines are restricted in size and number, and the
caches are shared resources (beware the multi core rage!), a few context
switches (your process is not the only player) might ruin the cache you
know.

Willy.


Dec 20 '05 #36

P: n/a
Holger Grund wrote:
Ok, maybe I got too many islays today. But you can't be serious about
that, can you?
For a while, we were planning on exposing interior pointers only for the
purpose of iterating over arrays. Effectively, array would have an iterator
nested type and that's it. Other than iterating over arrays, interior
pointers have very little use. The only other uses they have are
intermediate stage towards pinning, and having something to talk about for
how tracking references work.

Admittedly, the motivation behind avoiding interior pointers in the new
syntax was the tendency to overuse __gc in the old syntax. Much of that was
due to the compiler issuing diagnostics that went away by adding __gc to
pointers everywhere. Many programmers didn't know what they were doing other
than making the compiler happy.
Am I missing something or do you say (plain) pointers are (and have
always been) useless? What about by-ref params in managed signatures?


Interior pointers do have very little use in managed programming. Ref
parameters do have a purpose, and I never advocated leaving those out. In
fact, we spent a lot of time discussing how to go even further and support
C# out parameters in syntax.

--
Brandon Bray, Visual C++ Compiler http://blogs.msdn.com/branbray/
Bugs? Suggestions? Feedback? http://msdn.microsoft.com/productfeedback/
Dec 20 '05 #37

P: n/a
Willy Denoyette [MVP] wrote:
How would you reduce the cache misses when you have to deal with large (>
1Mb) array's?


By reading and/or writing to the memory in the optimal order, even if
the algorithm would normally require a random (or non-linear) order.
Example: break enormously large images into tiles. Group read/write
those pixels that fit in the cache. For example, read lines, not
columns. It gets tricky when you have to turn an image by 90 degrees,
because you have to involve columns. The correctly written code can be
twice as fast as the incorrect one, due to the enormous number of cache
misses involved while reading/writing columns instead of rows.

Tom
Dec 20 '05 #38

P: n/a

"Tamas Demjen" <td*****@yahoo.com> wrote in message
news:eB**************@TK2MSFTNGP10.phx.gbl...
Willy Denoyette [MVP] wrote:
How would you reduce the cache misses when you have to deal with large (>
1Mb) array's?


By reading and/or writing to the memory in the optimal order, even if the
algorithm would normally require a random (or non-linear) order. Example:
break enormously large images into tiles. Group read/write those pixels
that fit in the cache. For example, read lines, not columns. It gets
tricky when you have to turn an image by 90 degrees, because you have to
involve columns. The correctly written code can be twice as fast as the
incorrect one, due to the enormous number of cache misses involved while
reading/writing columns instead of rows.


I see, but that's my point, you might need some assembly code to do this,
you won't do this from managed code (at least not that part). That's exactly
why we won't see a managed version of DirectX any time soon, note I'm not
talking about the managed DirectX wrapper!

Willy.
Dec 20 '05 #39

This discussion thread is closed

Replies have been disabled for this discussion.