By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,836 Members | 2,011 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,836 IT Pros & Developers. It's quick & easy.

Framework 2.0 array redim unsatisfactory performance

P: n/a
Hello,

I was just testing VB.Net on Framework.Net 2.0 performance when I run into
the this problem.
This trivial code attached below executed hundreds, if not thousand times
faster in VB 6.0 than in .Net environment, under VS 2005 Beta 2. Does anyone
have any idea whether this will be addressed in the final release?

Thanks,
Tomasz

Public Sub Main()
Dim foos() As Long

Dim n As Long
n = 100000
Dim i As Long
For i = 1 To n
ReDim Preserve foos(i)
foos(i) = i
Next i
Debug.Print(foos(1000))
End Sub
Nov 21 '05 #1
Share this Question
Share on Google+
19 Replies


P: n/a
Tom,
| Does anyone
| have any idea whether this will be addressed in the final release?
No, it will not be addressed per se. As each time you do the ReDim Preserve
a brand new array is defined & every element is copied from the existing
array into the new array!

NOTE: Long in VB. NET is a 640Bit value, I suspect you mean Integer which is
now 32-bit (as compared to VB6's Long which was 32-bit). If you are using
Long thinking you have 32-bit values you are actually copying & using twice
as much memory as you need to!

I would recommend instead of ReDim Preserve, that you consider using
System.Collections.Generic.List(Of T) instead, as it will dynamically
allocate memory on a as needed bases (it starts with a buffer of 16
elements, then doubles it each time it needs more space. The List(Of
T).Count reflects the number of elements currently in the collection, while
List(Of T).Capacity reflects the number of elements that can be in the
collection (in other words the size of the buffer). When Count reaches
Capacity the buffer is increased...
For details on List(Of T) see:
http://msdn2.microsoft.com/library/6...us,vs.80).aspx

Something like (from memory):

Dim foos As List(Of Integer)

Dim n As Integer = 100000

For index As Integer = 1 To n
foos.Add(index)
Next

NOTE: In .NET arrays & collections use base 0 indexing instead of base 1
indexing. Which means that foos(0) = 1.

NOTE: The "double the size of the buffer" is the general algorithm that .NET
1.0 & 1.1 use, .NET 2.0 may or may not change the algorithm to increase the
size of the buffer...

Hope this helps
Jay

"Tom Jastrzebski" <to*@tom.com> wrote in message
news:EO***************@newssvr33.news.prodigy.com. ..
| Hello,
|
| I was just testing VB.Net on Framework.Net 2.0 performance when I run into
| the this problem.
| This trivial code attached below executed hundreds, if not thousand times
| faster in VB 6.0 than in .Net environment, under VS 2005 Beta 2. Does
anyone
| have any idea whether this will be addressed in the final release?
|
| Thanks,
| Tomasz
|
| Public Sub Main()
| Dim foos() As Long
|
| Dim n As Long
| n = 100000
| Dim i As Long
| For i = 1 To n
| ReDim Preserve foos(i)
| foos(i) = i
| Next i
| Debug.Print(foos(1000))
| End Sub
|
|
Nov 21 '05 #2

P: n/a
Jay,

Whether Integer or Long type is being used of course makes a difference.
However, my tests show that this difference is not that significant, and
still old good VB 6.0 performs way better.

The reason why I test Array is because I need a dynamic data structure with
as small memory overhead as possible.

My tests show that generic based List<> uses at least 5 additional bytes for
each entry. Not a surprise, most likely it is implemented as single-linked
list. In most applications it is not a concern, but in my cases it is.

Since dynamic array is not available in C# I was just wondering how
efficient this VB structure was before starting thinking about alternative
approaches using C++.

Thanks,

Tomasz
Nov 21 '05 #3

P: n/a
Tom,

Did you read what Jay wrote?
As each time you do the ReDim Preserve a brand new array is defined &
every element is copied from the existing array into the new array!
The reason why I test Array is because I need a dynamic data structure
with as small memory overhead as possible.

You are creating 100000 arrays in memory which are growing all the time. In
addition, because that it is in your case inside a routine, probably with
not much interaction from the GC because the process is going on (Garbage
Collector) and therefore the used memory will grow and grow

Normally this would (could) be done in VB.Net as

\\\
Public Sub Main()
dim Foos as new arraylist 'or any other Ilist
for n as integer = i to 100000
Foos.add(I)
next
Debut.Print(foos(i).ToString)
End sub
///

However with a lot of alternatives, that is why Jay wrote (as I assume) to
look for System.Collections.Generic.List(Of T) to help you quick to find by
instance this link

http://msdn.microsoft.com/library/de...ctiontypes.asp

I hope this helps,

Cor

Nov 21 '05 #4

P: n/a
Tom,
My tests show that generic based List<> uses at least 5 additional bytes for
each entry. Not a surprise, most likely it is implemented as single-linked
list.


No, List(Of T) is backed by a simple array, it's not a linked list.
There's no per-item overhead, not sure how your tests could indicate
that.

Mattias

--
Mattias Sjögren [MVP] mattias @ mvps.org
http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com
Please reply only to the newsgroup.
Nov 21 '05 #5

P: n/a
Doh!
| NOTE: Long in VB. NET is a 640Bit value, I suspect you mean Integer which
is
That should be 64-bit value.

Jay

"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message
news:us**************@tk2msftngp13.phx.gbl...
| Tom,
|| Does anyone
|| have any idea whether this will be addressed in the final release?
| No, it will not be addressed per se. As each time you do the ReDim
Preserve
| a brand new array is defined & every element is copied from the existing
| array into the new array!
|
| NOTE: Long in VB. NET is a 640Bit value, I suspect you mean Integer which
is
| now 32-bit (as compared to VB6's Long which was 32-bit). If you are using
| Long thinking you have 32-bit values you are actually copying & using
twice
| as much memory as you need to!
|
| I would recommend instead of ReDim Preserve, that you consider using
| System.Collections.Generic.List(Of T) instead, as it will dynamically
| allocate memory on a as needed bases (it starts with a buffer of 16
| elements, then doubles it each time it needs more space. The List(Of
| T).Count reflects the number of elements currently in the collection,
while
| List(Of T).Capacity reflects the number of elements that can be in the
| collection (in other words the size of the buffer). When Count reaches
| Capacity the buffer is increased...
|
|
| For details on List(Of T) see:
| http://msdn2.microsoft.com/library/6...us,vs.80).aspx
|
| Something like (from memory):
|
| Dim foos As List(Of Integer)
|
| Dim n As Integer = 100000
|
| For index As Integer = 1 To n
| foos.Add(index)
| Next
|
| NOTE: In .NET arrays & collections use base 0 indexing instead of base 1
| indexing. Which means that foos(0) = 1.
|
| NOTE: The "double the size of the buffer" is the general algorithm that
..NET
| 1.0 & 1.1 use, .NET 2.0 may or may not change the algorithm to increase
the
| size of the buffer...
|
| Hope this helps
| Jay
|
| "Tom Jastrzebski" <to*@tom.com> wrote in message
| news:EO***************@newssvr33.news.prodigy.com. ..
|| Hello,
||
|| I was just testing VB.Net on Framework.Net 2.0 performance when I run
into
|| the this problem.
|| This trivial code attached below executed hundreds, if not thousand times
|| faster in VB 6.0 than in .Net environment, under VS 2005 Beta 2. Does
| anyone
|| have any idea whether this will be addressed in the final release?
||
|| Thanks,
|| Tomasz
||
|| Public Sub Main()
|| Dim foos() As Long
||
|| Dim n As Long
|| n = 100000
|| Dim i As Long
|| For i = 1 To n
|| ReDim Preserve foos(i)
|| foos(i) = i
|| Next i
|| Debug.Print(foos(1000))
|| End Sub
||
||
|
|
Nov 21 '05 #6

P: n/a
Tom,
| My tests show that generic based List<> uses at least 5 additional bytes
for
| each entry. Not a surprise, most likely it is implemented as single-linked
| list. In most applications it is not a concern, but in my cases it is.

List(Of T) is implemented as an Array not a single-linked list, per:

http://msdn2.microsoft.com/library/6...us,vs.80).aspx

<quote>
Implements the System.Collections.Generic.IList<> interface using an array
whose size is dynamically increased as required.

</quote>

Note it states "using an array whose size is dynamically increased", this is
the double action I was referring to.

How are you determining "at least 5 additional bytes for each entry"?

You are testing against a List(Of Integer) & not List(Of Object) where each
element is being boxed?

Are you looking at the Capacity compared to Count? Capacity needs to be
larger as this is where the performance comes in. Rather then alloc a new
array & copy all the elements for each element added.

Yes there is some overhead as compared to an array, however I would expect
this to be 4 to 12 bytes per instance of List(Of T), not per element.

FWIW: List(Of T) is the generic version of ArrayList. A couple of advantages
of List(Of T) are that it avoids the "boxing penalty" that ArrayList has,
plus it ensures type safety.

Hope this helps
Jay
"Tom Jastrzebski" <to*@tom.com> wrote in message
news:rN**************@newssvr23.news.prodigy.net.. .
| Jay,
|
|
|
| Whether Integer or Long type is being used of course makes a difference.
| However, my tests show that this difference is not that significant, and
| still old good VB 6.0 performs way better.
|
|
|
| The reason why I test Array is because I need a dynamic data structure
with
| as small memory overhead as possible.
|
| My tests show that generic based List<> uses at least 5 additional bytes
for
| each entry. Not a surprise, most likely it is implemented as single-linked
| list. In most applications it is not a concern, but in my cases it is.
|
|
|
| Since dynamic array is not available in C# I was just wondering how
| efficient this VB structure was before starting thinking about alternative
| approaches using C++.
|
|
|
| Thanks,
|
| Tomasz
|
|
Nov 21 '05 #7

P: n/a
Hi,

I simply created a List(Of Integer) with one milion Integers (4byte) and
check process memory utilization before and after that.
Yes, I agree that this is a very aproximate method, but I just wanted to
have a general idea.

Tomasz
Nov 21 '05 #8

P: n/a
> No, List(Of T) is backed by a simple array, it's not a linked list.
There's no per-item overhead, not sure how your tests could indicate
that.


Thanks, I just did more research, and figured out that the underlying data
structure is actually more complicated, and that this structure is design
for performance rather then memory usage.

Tomasz
Nov 21 '05 #9

P: n/a
> You are creating 100000 arrays in memory which are growing all the time.
In addition, because that it is in your case inside a routine, probably
with not much interaction from the GC because the process is going on
(Garbage Collector) and therefore the used memory will grow and grow

Normally this would (could) be done in VB.Net as


Och, yes, I understand this by now.
I just wanted to bring this to the attention, since this array usage pattern
produces way better results in VB 6.0, which means that this probably could
be done better. That is really it.

Probably down the road I am not going to use .Net data structures available
out of the box, since memory usage is at stake.
For prototyping and initial testing, however, they are good enough.

Tomasz
Nov 21 '05 #10

P: n/a
Tom,
| Yes, I agree that this is a very aproximate method, but I just wanted to
| have a general idea.
I would consider it a highly inaccurate method! :-|
| I simply created a List(Of Integer) with one milion Integers (4byte) and
| check process memory utilization before and after that.

How specifically did you check "process memory utilization"?

Did you use Task Manager or Performance counters?

Did you use the .NET Performance counters or Win32?

Did you take into account garbage (memory) that the GC (garbage collector)
has not yet collected?

To "accurately" see process memory utilization you need to use a tool such
as CLR profiler to see what memory is allocated to what objects. Plus what
memory is currently garbage.
Remember that the GC generally does not return memory to the OS, unless the
OS requests the memory be returned (as other memory hungry apps are
requesting it). This causes "process memory utilization" to appear larger
then it really is. One really needs to look at at the .NET memory
performance counters, specifically the ".NET CLR Memory" performance
object's various heap size counters are a good place to start.
| I simply created a List(Of Integer) with one milion Integers (4byte) and
Did you create the List(Of Integer) with the default capacity or did you set
set the initial capacity?

' Create list with default capacity
Dim list As New List(Of Integer) ()

' Create list with specific capacity
Dim list As New List(Of Integer) (1000000)

Using the second list will result in significantly fewer reallocations then
the first, which generally will reduce the "process memory utilization".

Hope this helps
Jay

"Tom Jastrzebski" <to*@tom.com> wrote in message
news:oa***************@newssvr19.news.prodigy.com. ..
| Hi,
|
| I simply created a List(Of Integer) with one milion Integers (4byte) and
| check process memory utilization before and after that.
| Yes, I agree that this is a very aproximate method, but I just wanted to
| have a general idea.
|
| Tomasz
|
|
Nov 21 '05 #11

P: n/a
Tom,
| I just wanted to bring this to the attention, since this array usage
pattern
| produces way better results in VB 6.0, which means that this probably
could
| be done better. That is really it.
VB 6.0 arrays are based on the COM SafeArray structure. VB.NET arrays are
based on the immutable (size wise) System.Array structure.

COM is based on deterministic memory management, .NET is based on
non-deterministic memory management (a GC or Garbage Collector).

I don't remember specifically I thought a SafeArray could be resized without
always needing to move the array itself. As I've pointed out System.Array
always need to be moved to be resized.

Also, I suspect you are comparing to the Debug build under VS.NET, rather
then comparing to the Release Build outside of VS.NET. When you run the
Release Build outside of VS.NET the compiler & more specifically the JIT
(Just In Time) compiler are able to make more optimizations, closing the
perceived performance gap.

It sounds like you really are attempting to understand how this new platform
(.NET) works, while comparing it to VB6. Personally I would recommend *not*
to compare it to VB6 on specifics. This would be like comparing the original
VW Beattle with a Modern SUV. Yes they both have 4 wheels & move along
roads, however that's about it.
The following articles discuss how to write .NET apps (based on .NET
assumptions) that perform well:

http://msdn.microsoft.com/library/de...anagedapps.asp

http://msdn.microsoft.com/library/de...l/scalenet.asp

http://msdn.microsoft.com/library/de...nethowto13.asp

Plus a plethora of other articles, post if you would like more articles...

Hope this helps
Jay

"Tom Jastrzebski" <to*@tom.com> wrote in message
news:vk***************@newssvr19.news.prodigy.com. ..
|> You are creating 100000 arrays in memory which are growing all the time.
| > In addition, because that it is in your case inside a routine, probably
| > with not much interaction from the GC because the process is going on
| > (Garbage Collector) and therefore the used memory will grow and grow
| >
| > Normally this would (could) be done in VB.Net as
|
| Och, yes, I understand this by now.
| I just wanted to bring this to the attention, since this array usage
pattern
| produces way better results in VB 6.0, which means that this probably
could
| be done better. That is really it.
|
| Probably down the road I am not going to use .Net data structures
available
| out of the box, since memory usage is at stake.
| For prototyping and initial testing, however, they are good enough.
|
| Tomasz
|
|
Nov 21 '05 #12

P: n/a
Jay,

I just noticed something I thought was interesting and decided to write
about it, thinking that someone migrating from VB 6.0 may experience this
issue, regardless of what the root cause is and whether these environments
should be compared or not.

As I said, the only reason why I tried "array redim" was because I was
hoping to be nicely surprised - and I was not.

What I am looking for is dynamic data structure which has the least memory
overhead.
It seems like at some point I will have to develop my own. But that is OK,
it by no means diminish the value of .Net Framework.

To check process memory utilization I use:
Process.GetCurrentProcess().WorkingSet64

Peace,
Tomasz
Nov 21 '05 #13

P: n/a
Tom,
| Process.GetCurrentProcess().WorkingSet64
WorkingSet64 is the Working Set or total amount of physical memory in use,
this is by no means the amount of memory the GC has allocated. In other
words it includes objects that have not yet been garbage collected, plus it
includes the space for objects that were previously allocated but have
already been collected.

As I stated, the GC over allocates physical memory as needed & will only
return it when the OS requests it as another app is asking for more
memory... In other words the Working Set includes both allocated memory &
free memory. You need to use the .NET performance counters or CLR Profiler
to see how much memory is currently allocated.

| What I am looking for is dynamic data structure which has the least memory
| overhead.
That would be List(Of T) with the "capacity" constructor set to the expected
number of elements. For example if you know there are going to be 50 to 75
elements in the list, but no more then 100. I would pass 50 or 75 to the
constructor. With 50, there would be at most a single reallocation.

If your not sure what the limit is going to be, but once the list is filled
its "constant", the I would recommend using List(Of T).TrimExcess when I
finished filling the list.

http://msdn2.microsoft.com/library/m...us,vs.80).aspx

Hope this helps
Jay

"Tom Jastrzebski" <to*@tom.com> wrote in message
news:qN*************@newssvr30.news.prodigy.com...
| Jay,
|
| I just noticed something I thought was interesting and decided to write
| about it, thinking that someone migrating from VB 6.0 may experience this
| issue, regardless of what the root cause is and whether these environments
| should be compared or not.
|
| As I said, the only reason why I tried "array redim" was because I was
| hoping to be nicely surprised - and I was not.
|
| What I am looking for is dynamic data structure which has the least memory
| overhead.
| It seems like at some point I will have to develop my own. But that is OK,
| it by no means diminish the value of .Net Framework.
|
| To check process memory utilization I use:
| Process.GetCurrentProcess().WorkingSet64
|
| Peace,
| Tomasz
|
|
Nov 21 '05 #14

P: n/a
Jay,

I am aware of GC issues. However, for the purpose of simple collection test
it simply does not matter, since no object is released for GC, and no memory
deallocated until the application exits. Of course, that is not true for the
"array redim" test, but I do not even attempt to test its memory efficiency.
There also might be an exception for Hashtable with too small initial
capacity, but I would not bet on this.

But, if I you would like to continue, I am wondering what memory overhead
your tests show. Let's say List(Of Integer) with one million elements.
My tests show that static arrays are surprisingly efficient, virtually no
overhead. Dynamic structures, however, is the whole different story.

Cheers,
Tomasz
"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message
news:OD**************@TK2MSFTNGP15.phx.gbl...
Tom,
| Process.GetCurrentProcess().WorkingSet64
WorkingSet64 is the Working Set or total amount of physical memory in use,
this is by no means the amount of memory the GC has allocated. In other
words it includes objects that have not yet been garbage collected, plus
it
includes the space for objects that were previously allocated but have
already been collected.

As I stated, the GC over allocates physical memory as needed & will only
return it when the OS requests it as another app is asking for more
memory... In other words the Working Set includes both allocated memory &
free memory. You need to use the .NET performance counters or CLR Profiler
to see how much memory is currently allocated.

| What I am looking for is dynamic data structure which has the least
memory
| overhead.
That would be List(Of T) with the "capacity" constructor set to the
expected
number of elements. For example if you know there are going to be 50 to 75
elements in the list, but no more then 100. I would pass 50 or 75 to the
constructor. With 50, there would be at most a single reallocation.

If your not sure what the limit is going to be, but once the list is
filled
its "constant", the I would recommend using List(Of T).TrimExcess when I
finished filling the list.

http://msdn2.microsoft.com/library/m...us,vs.80).aspx

Hope this helps
Jay

"Tom Jastrzebski" <to*@tom.com> wrote in message
news:qN*************@newssvr30.news.prodigy.com...
| Jay,
|
| I just noticed something I thought was interesting and decided to write
| about it, thinking that someone migrating from VB 6.0 may experience
this
| issue, regardless of what the root cause is and whether these
environments
| should be compared or not.
|
| As I said, the only reason why I tried "array redim" was because I was
| hoping to be nicely surprised - and I was not.
|
| What I am looking for is dynamic data structure which has the least
memory
| overhead.
| It seems like at some point I will have to develop my own. But that is
OK,
| it by no means diminish the value of .Net Framework.
|
| To check process memory utilization I use:
| Process.GetCurrentProcess().WorkingSet64
|
| Peace,
| Tomasz
|
|

Nov 21 '05 #15

P: n/a
Tom,
| I am aware of GC issues.

IMHO No you are not aware of the GC issues per se, otherwise we would NOT be
having this discussion. :-|

| However, for the purpose of simple collection test
| it simply does not matter, since no object is released for GC,
Yes it does matter! As there are a number of objects allocated in both your
original sample & my modified List(Of T) sample! These objects are
implementation details of Redim, Redim Preserve, and List(Of T). Redim has a
significantly higher object allocated overhead then List(Of T). HashTable &
Dictionary(Of K, T) would have higher overhead then List(Of T), but
significantly lower then Redim. For Integers Hashtable will have
significantly higher object allocated overhead then Dictionary(Of K,T) as
Hashtable will box all the integers going in. Generics, such as
Dictionary(Of K,T), eliminate this boxing penalty. Which is why List(Of T)
should be preferred over ArrayList.
Remember that System.Array in .NET is a full fledged object. When you do a
Redim Preserve a new object is allocated, the old object is copied to the
new object, then the old object is discard. ReDim simply allocates a new
object & discards the old object. Which means that 99,999 objects were
released for GC!

In your original sample 100000 objects are allocated! You can use CLR
Profiler to confirm the fact, unfortunately your sample takes a long time to
run.

Public Sub Main()
Dim foos() As Long

Dim n As Long
n = 100000
Dim i As Long
For i = 1 To n
ReDim Preserve foos(i)
foos(i) = i
Next i
Debug.Print(foos(1000))
End Sub

If you change your sample to use List(Of T)

Public Sub Main()
Dim foos As List(Of Integer)

Dim n As Integer = 100000

For index As Integer = 1 To n
foos.Add(index)
Next

End Sub

Remember List(Of T) is implemented internally as an Array, not a linked
list.

Then I would estimate there would only about 15 objects allocated. The
List(Of T) itself, plus 14 generations of buffers. Remember the buffer (the
internal array) starts out at 16 & doubles... that means that 13 objects
were released for GC.

Generation - Capacity
1 - 16
2 - 32
3 - 64
4 - 128
5 - 256
6 - 512
7 - 1,024
8 - 2,048
9 - 4,096
10 - 8,192
11 - 16,384
12 - 32,768
13 - 65,536
14 - 131,072
15 - 262,144
16 - 524,288
17 - 1,048,576

Of course if you start out with a capacity of 100,000, then there would only
be 2 objects allocated. The List(Of T) itself & its internal buffer.

Again one could use CLR Profiler to verify, I'll see if I have CLR Profiler
loaded on my VS 2005 VPC, & report back later...

| since no object is released for GC,
I would expect that to be true if we were talking about LinkedList(Of T),
however we are talking about Array & List(Of T) which are both based on an
Array of values.

Hope this helps
Jay

"Tom Jastrzebski" <to*@tom.com> wrote in message
news:nu*************@newssvr33.news.prodigy.com...
| Jay,
|
| I am aware of GC issues. However, for the purpose of simple collection
test
| it simply does not matter, since no object is released for GC, and no
memory
| deallocated until the application exits. Of course, that is not true for
the
| "array redim" test, but I do not even attempt to test its memory
efficiency.
| There also might be an exception for Hashtable with too small initial
| capacity, but I would not bet on this.
|
| But, if I you would like to continue, I am wondering what memory overhead
| your tests show. Let's say List(Of Integer) with one million elements.
| My tests show that static arrays are surprisingly efficient, virtually no
| overhead. Dynamic structures, however, is the whole different story.
|
| Cheers,
| Tomasz
|
|
| "Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message
| news:OD**************@TK2MSFTNGP15.phx.gbl...
| > Tom,
| > | Process.GetCurrentProcess().WorkingSet64
| > WorkingSet64 is the Working Set or total amount of physical memory in
use,
| > this is by no means the amount of memory the GC has allocated. In other
| > words it includes objects that have not yet been garbage collected, plus
| > it
| > includes the space for objects that were previously allocated but have
| > already been collected.
| >
| > As I stated, the GC over allocates physical memory as needed & will only
| > return it when the OS requests it as another app is asking for more
| > memory... In other words the Working Set includes both allocated memory
&
| > free memory. You need to use the .NET performance counters or CLR
Profiler
| > to see how much memory is currently allocated.
| >
| > | What I am looking for is dynamic data structure which has the least
| > memory
| > | overhead.
| > That would be List(Of T) with the "capacity" constructor set to the
| > expected
| > number of elements. For example if you know there are going to be 50 to
75
| > elements in the list, but no more then 100. I would pass 50 or 75 to the
| > constructor. With 50, there would be at most a single reallocation.
| >
| > If your not sure what the limit is going to be, but once the list is
| > filled
| > its "constant", the I would recommend using List(Of T).TrimExcess when I
| > finished filling the list.
| >
| > http://msdn2.microsoft.com/library/m...us,vs.80).aspx
| >
| > Hope this helps
| > Jay
| >
| > "Tom Jastrzebski" <to*@tom.com> wrote in message
| > news:qN*************@newssvr30.news.prodigy.com...
| > | Jay,
| > |
| > | I just noticed something I thought was interesting and decided to
write
| > | about it, thinking that someone migrating from VB 6.0 may experience
| > this
| > | issue, regardless of what the root cause is and whether these
| > environments
| > | should be compared or not.
| > |
| > | As I said, the only reason why I tried "array redim" was because I was
| > | hoping to be nicely surprised - and I was not.
| > |
| > | What I am looking for is dynamic data structure which has the least
| > memory
| > | overhead.
| > | It seems like at some point I will have to develop my own. But that is
| > OK,
| > | it by no means diminish the value of .Net Framework.
| > |
| > | To check process memory utilization I use:
| > | Process.GetCurrentProcess().WorkingSet64
| > |
| > | Peace,
| > | Tomasz
| > |
| > |
| >
| >
|
|
Nov 21 '05 #16

P: n/a
Jay,

Again, I am not talking about "array redim" anymore.
My understanding was that List(Of T) internally creates a new array of the
double the previous size but, and this is important, does not copy the data
over, but rather uses some simple calculations to figure out where to look
for data when it comes to returning values by index. Well, if it is not how
List(Of T) works than it might be worth trying to develop one which works
that way.

What I am really trying to determine is if my problem - potentially hundreds
of large data structures, created and released during calculations and data
transformations is solvable under .Net Framework. I hope it is, although
List(Of T) may not be good enough.

All right, I just did two more tests: filling in List(Of Integer) with
initial capacity defined, and undefined.
Process.GetCurrentProcess().WorkingSet64 showed much different values, but
GC.GetTotalMemory(true) almost identical!
I guess I am going to write my custom collection.

Thanks,
Tomasz
"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message
news:OL**************@TK2MSFTNGP10.phx.gbl...
Tom,
| I am aware of GC issues.

IMHO No you are not aware of the GC issues per se, otherwise we would NOT
be
having this discussion. :-|

| However, for the purpose of simple collection test
| it simply does not matter, since no object is released for GC,
Yes it does matter! As there are a number of objects allocated in both
your
original sample & my modified List(Of T) sample! These objects are
implementation details of Redim, Redim Preserve, and List(Of T). Redim has
a
significantly higher object allocated overhead then List(Of T). HashTable
&
Dictionary(Of K, T) would have higher overhead then List(Of T), but
significantly lower then Redim. For Integers Hashtable will have
significantly higher object allocated overhead then Dictionary(Of K,T) as
Hashtable will box all the integers going in. Generics, such as
Dictionary(Of K,T), eliminate this boxing penalty. Which is why List(Of T)
should be preferred over ArrayList.
Remember that System.Array in .NET is a full fledged object. When you do a
Redim Preserve a new object is allocated, the old object is copied to the
new object, then the old object is discard. ReDim simply allocates a new
object & discards the old object. Which means that 99,999 objects were
released for GC!

In your original sample 100000 objects are allocated! You can use CLR
Profiler to confirm the fact, unfortunately your sample takes a long time
to
run.

Public Sub Main()
Dim foos() As Long

Dim n As Long
n = 100000
Dim i As Long
For i = 1 To n
ReDim Preserve foos(i)
foos(i) = i
Next i
Debug.Print(foos(1000))
End Sub

If you change your sample to use List(Of T)

Public Sub Main()
Dim foos As List(Of Integer)

Dim n As Integer = 100000

For index As Integer = 1 To n
foos.Add(index)
Next

End Sub

Remember List(Of T) is implemented internally as an Array, not a linked
list.

Then I would estimate there would only about 15 objects allocated. The
List(Of T) itself, plus 14 generations of buffers. Remember the buffer
(the
internal array) starts out at 16 & doubles... that means that 13 objects
were released for GC.

Generation - Capacity
1 - 16
2 - 32
3 - 64
4 - 128
5 - 256
6 - 512
7 - 1,024
8 - 2,048
9 - 4,096
10 - 8,192
11 - 16,384
12 - 32,768
13 - 65,536
14 - 131,072
15 - 262,144
16 - 524,288
17 - 1,048,576

Of course if you start out with a capacity of 100,000, then there would
only
be 2 objects allocated. The List(Of T) itself & its internal buffer.

Again one could use CLR Profiler to verify, I'll see if I have CLR
Profiler
loaded on my VS 2005 VPC, & report back later...

| since no object is released for GC,
I would expect that to be true if we were talking about LinkedList(Of T),
however we are talking about Array & List(Of T) which are both based on an
Array of values.

Hope this helps
Jay

"Tom Jastrzebski" <to*@tom.com> wrote in message
news:nu*************@newssvr33.news.prodigy.com...
| Jay,
|
| I am aware of GC issues. However, for the purpose of simple collection
test
| it simply does not matter, since no object is released for GC, and no
memory
| deallocated until the application exits. Of course, that is not true for
the
| "array redim" test, but I do not even attempt to test its memory
efficiency.
| There also might be an exception for Hashtable with too small initial
| capacity, but I would not bet on this.
|
| But, if I you would like to continue, I am wondering what memory
overhead
| your tests show. Let's say List(Of Integer) with one million elements.
| My tests show that static arrays are surprisingly efficient, virtually
no
| overhead. Dynamic structures, however, is the whole different story.
|
| Cheers,
| Tomasz
|
|
| "Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in
message
| news:OD**************@TK2MSFTNGP15.phx.gbl...
| > Tom,
| > | Process.GetCurrentProcess().WorkingSet64
| > WorkingSet64 is the Working Set or total amount of physical memory in
use,
| > this is by no means the amount of memory the GC has allocated. In
other
| > words it includes objects that have not yet been garbage collected,
plus
| > it
| > includes the space for objects that were previously allocated but have
| > already been collected.
| >
| > As I stated, the GC over allocates physical memory as needed & will
only
| > return it when the OS requests it as another app is asking for more
| > memory... In other words the Working Set includes both allocated
memory
&
| > free memory. You need to use the .NET performance counters or CLR
Profiler
| > to see how much memory is currently allocated.
| >
| > | What I am looking for is dynamic data structure which has the least
| > memory
| > | overhead.
| > That would be List(Of T) with the "capacity" constructor set to the
| > expected
| > number of elements. For example if you know there are going to be 50
to
75
| > elements in the list, but no more then 100. I would pass 50 or 75 to
the
| > constructor. With 50, there would be at most a single reallocation.
| >
| > If your not sure what the limit is going to be, but once the list is
| > filled
| > its "constant", the I would recommend using List(Of T).TrimExcess when
I
| > finished filling the list.
| >
| > http://msdn2.microsoft.com/library/m...us,vs.80).aspx
| >
| > Hope this helps
| > Jay
| >
| > "Tom Jastrzebski" <to*@tom.com> wrote in message
| > news:qN*************@newssvr30.news.prodigy.com...
| > | Jay,
| > |
| > | I just noticed something I thought was interesting and decided to
write
| > | about it, thinking that someone migrating from VB 6.0 may experience
| > this
| > | issue, regardless of what the root cause is and whether these
| > environments
| > | should be compared or not.
| > |
| > | As I said, the only reason why I tried "array redim" was because I
was
| > | hoping to be nicely surprised - and I was not.
| > |
| > | What I am looking for is dynamic data structure which has the least
| > memory
| > | overhead.
| > | It seems like at some point I will have to develop my own. But that
is
| > OK,
| > | it by no means diminish the value of .Net Framework.
| > |
| > | To check process memory utilization I use:
| > | Process.GetCurrentProcess().WorkingSet64
| > |
| > | Peace,
| > | Tomasz
| > |
| > |
| >
| >
|
|

Nov 21 '05 #17

P: n/a
Tom,
| My understanding was that List(Of T) internally creates a new array of the
| double the previous size but, and this is important, does not copy the
data
| over, but rather uses some simple calculations to figure out where to look
| for data when it comes to returning values by index
It doesn't. It maintains a single array & copies the previous data over.

Remember List(Of T):
<quote>
Implements the System.Collections.Generic.IList<> interface using an array
whose size is dynamically increased as required
</quote>
http://msdn2.microsoft.com/library/6...us,vs.80).aspx

Notice "an array whose size is dynamically increased as required" which IMHO
is clearly refers to a single array, its singular.

You can use Reflector, or ILDASM to see how List(Of T) is implement. Using
CLR Profiler clearly demonstrates the old arrays being discarded...

FWIW: You can download CLR Profiler at
http://blogs.msdn.com/brianjo/archiv...28/433395.aspx

| Well, if it is not how
| List(Of T) works than it might be worth trying to develop one which works
| that way.
That feels like a micro and/or a premature optimization. I would only
*really* be concerned with the algorithm used when profiling demonstrated
that the use of List(Of T) *truly* adversely effected the performance of my
program! Remember the 80/20 rule. That is 80% of the execution time of your
program is spent in 20% of your code. I will optimize (worry about
performance, memory consumption) the 20% once that 20% has been identified &
proven to be a performance problem via profiling (CLR Profiler is one
profiling tool). The use of the List(Of T) may well be outside this 20% of
your code, prematurely optimizing it is possibly a waste of time.

For info on the 80/20 rule & optimizing only the 20% see Martin Fowler's
article "Yet Another Optimization Article" at
http://martinfowler.com/ieeeSoftware...timization.pdf

However! I agree in some cases that may be useful to have an alternative
algorithm, such as when profiling demonstrated a performance and/or memory
problem. Personally a "List(Of T)" that supported both algorithms & other
algorithms (via a Strategy Pattern perhaps) might be more beneficial here,
as your code could continue to use the same "List(Of T)", you would just
change the algorithm that the "List(Of T)" used, however the juice may not
be worth the squeeze. The CollectionBase I use is basically this way. My
code is written to my CollectionBase, my CollectionBase allows me to change
which collection (ArrayList, SortedList, other) is used.

http://groups-beta.google.com/group/...9915007fd9b8be

I am not done with my CollectionBase(Of T) yet...
If you are interested in alternative generic collections, have you check out
the Power Collections?

http://www.wintellect.com/powercollections/

| Process.GetCurrentProcess().WorkingSet64 showed much different values, but
| GC.GetTotalMemory(true) almost identical!
Which is what I would expect! As WorkingSet is the OS physical memory while
GetTotalMemory is GC allocated memory.

Hope this helps
Jay

"Tom Jastrzebski" <to*@tom.com> wrote in message
news:C9***************@newssvr19.news.prodigy.com. ..
| Jay,
|
| Again, I am not talking about "array redim" anymore.
| My understanding was that List(Of T) internally creates a new array of the
| double the previous size but, and this is important, does not copy the
data
| over, but rather uses some simple calculations to figure out where to look
| for data when it comes to returning values by index. Well, if it is not
how
| List(Of T) works than it might be worth trying to develop one which works
| that way.
|
| What I am really trying to determine is if my problem - potentially
hundreds
| of large data structures, created and released during calculations and
data
| transformations is solvable under .Net Framework. I hope it is, although
| List(Of T) may not be good enough.
|
| All right, I just did two more tests: filling in List(Of Integer) with
| initial capacity defined, and undefined.
| Process.GetCurrentProcess().WorkingSet64 showed much different values, but
| GC.GetTotalMemory(true) almost identical!
| I guess I am going to write my custom collection.
|
| Thanks,
| Tomasz
|
|
| "Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message
| news:OL**************@TK2MSFTNGP10.phx.gbl...
| > Tom,
| > | I am aware of GC issues.
| >
| > IMHO No you are not aware of the GC issues per se, otherwise we would
NOT
| > be
| > having this discussion. :-|
| >
| > | However, for the purpose of simple collection test
| > | it simply does not matter, since no object is released for GC,
| > Yes it does matter! As there are a number of objects allocated in both
| > your
| > original sample & my modified List(Of T) sample! These objects are
| > implementation details of Redim, Redim Preserve, and List(Of T). Redim
has
| > a
| > significantly higher object allocated overhead then List(Of T).
HashTable
| > &
| > Dictionary(Of K, T) would have higher overhead then List(Of T), but
| > significantly lower then Redim. For Integers Hashtable will have
| > significantly higher object allocated overhead then Dictionary(Of K,T)
as
| > Hashtable will box all the integers going in. Generics, such as
| > Dictionary(Of K,T), eliminate this boxing penalty. Which is why List(Of
T)
| > should be preferred over ArrayList.
| >
| >
| > Remember that System.Array in .NET is a full fledged object. When you do
a
| > Redim Preserve a new object is allocated, the old object is copied to
the
| > new object, then the old object is discard. ReDim simply allocates a new
| > object & discards the old object. Which means that 99,999 objects were
| > released for GC!
| >
| > In your original sample 100000 objects are allocated! You can use CLR
| > Profiler to confirm the fact, unfortunately your sample takes a long
time
| > to
| > run.
| >
| > Public Sub Main()
| > Dim foos() As Long
| >
| > Dim n As Long
| > n = 100000
| > Dim i As Long
| > For i = 1 To n
| > ReDim Preserve foos(i)
| > foos(i) = i
| > Next i
| > Debug.Print(foos(1000))
| > End Sub
| >
| > If you change your sample to use List(Of T)
| >
| > Public Sub Main()
| > Dim foos As List(Of Integer)
| >
| > Dim n As Integer = 100000
| >
| > For index As Integer = 1 To n
| > foos.Add(index)
| > Next
| >
| > End Sub
| >
| > Remember List(Of T) is implemented internally as an Array, not a linked
| > list.
| >
| > Then I would estimate there would only about 15 objects allocated. The
| > List(Of T) itself, plus 14 generations of buffers. Remember the buffer
| > (the
| > internal array) starts out at 16 & doubles... that means that 13 objects
| > were released for GC.
| >
| > Generation - Capacity
| > 1 - 16
| > 2 - 32
| > 3 - 64
| > 4 - 128
| > 5 - 256
| > 6 - 512
| > 7 - 1,024
| > 8 - 2,048
| > 9 - 4,096
| > 10 - 8,192
| > 11 - 16,384
| > 12 - 32,768
| > 13 - 65,536
| > 14 - 131,072
| > 15 - 262,144
| > 16 - 524,288
| > 17 - 1,048,576
| >
| > Of course if you start out with a capacity of 100,000, then there would
| > only
| > be 2 objects allocated. The List(Of T) itself & its internal buffer.
| >
| > Again one could use CLR Profiler to verify, I'll see if I have CLR
| > Profiler
| > loaded on my VS 2005 VPC, & report back later...
| >
| > | since no object is released for GC,
| > I would expect that to be true if we were talking about LinkedList(Of
T),
| > however we are talking about Array & List(Of T) which are both based on
an
| > Array of values.
| >
| > Hope this helps
| > Jay
| >
| > "Tom Jastrzebski" <to*@tom.com> wrote in message
| > news:nu*************@newssvr33.news.prodigy.com...
| > | Jay,
| > |
| > | I am aware of GC issues. However, for the purpose of simple collection
| > test
| > | it simply does not matter, since no object is released for GC, and no
| > memory
| > | deallocated until the application exits. Of course, that is not true
for
| > the
| > | "array redim" test, but I do not even attempt to test its memory
| > efficiency.
| > | There also might be an exception for Hashtable with too small initial
| > | capacity, but I would not bet on this.
| > |
| > | But, if I you would like to continue, I am wondering what memory
| > overhead
| > | your tests show. Let's say List(Of Integer) with one million elements.
| > | My tests show that static arrays are surprisingly efficient, virtually
| > no
| > | overhead. Dynamic structures, however, is the whole different story.
| > |
| > | Cheers,
| > | Tomasz
| > |
| > |
| > | "Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in
| > message
| > | news:OD**************@TK2MSFTNGP15.phx.gbl...
| > | > Tom,
| > | > | Process.GetCurrentProcess().WorkingSet64
| > | > WorkingSet64 is the Working Set or total amount of physical memory
in
| > use,
| > | > this is by no means the amount of memory the GC has allocated. In
| > other
| > | > words it includes objects that have not yet been garbage collected,
| > plus
| > | > it
| > | > includes the space for objects that were previously allocated but
have
| > | > already been collected.
| > | >
| > | > As I stated, the GC over allocates physical memory as needed & will
| > only
| > | > return it when the OS requests it as another app is asking for more
| > | > memory... In other words the Working Set includes both allocated
| > memory
| > &
| > | > free memory. You need to use the .NET performance counters or CLR
| > Profiler
| > | > to see how much memory is currently allocated.
| > | >
| > | > | What I am looking for is dynamic data structure which has the
least
| > | > memory
| > | > | overhead.
| > | > That would be List(Of T) with the "capacity" constructor set to the
| > | > expected
| > | > number of elements. For example if you know there are going to be 50
| > to
| > 75
| > | > elements in the list, but no more then 100. I would pass 50 or 75 to
| > the
| > | > constructor. With 50, there would be at most a single reallocation.
| > | >
| > | > If your not sure what the limit is going to be, but once the list is
| > | > filled
| > | > its "constant", the I would recommend using List(Of T).TrimExcess
when
| > I
| > | > finished filling the list.
| > | >
| > | > http://msdn2.microsoft.com/library/m...us,vs.80).aspx
| > | >
| > | > Hope this helps
| > | > Jay
| > | >
| > | > "Tom Jastrzebski" <to*@tom.com> wrote in message
| > | > news:qN*************@newssvr30.news.prodigy.com...
| > | > | Jay,
| > | > |
| > | > | I just noticed something I thought was interesting and decided to
| > write
| > | > | about it, thinking that someone migrating from VB 6.0 may
experience
| > | > this
| > | > | issue, regardless of what the root cause is and whether these
| > | > environments
| > | > | should be compared or not.
| > | > |
| > | > | As I said, the only reason why I tried "array redim" was because I
| > was
| > | > | hoping to be nicely surprised - and I was not.
| > | > |
| > | > | What I am looking for is dynamic data structure which has the
least
| > | > memory
| > | > | overhead.
| > | > | It seems like at some point I will have to develop my own. But
that
| > is
| > | > OK,
| > | > | it by no means diminish the value of .Net Framework.
| > | > |
| > | > | To check process memory utilization I use:
| > | > | Process.GetCurrentProcess().WorkingSet64
| > | > |
| > | > | Peace,
| > | > | Tomasz
| > | > |
| > | > |
| > | >
| > | >
| > |
| > |
| >
| >
|
|
Nov 21 '05 #18

P: n/a
> Remember List(Of T):
<quote>
Implements the System.Collections.Generic.IList<> interface using an array
whose size is dynamically increased as required
</quote>
http://msdn2.microsoft.com/library/6...us,vs.80).aspx

Notice "an array whose size is dynamically increased as required" which
IMHO
is clearly refers to a single array, its singular.


I see what they say, but if this had been true there would not have been
that much memory deallocated, I think.

Test with List(Of Integer) containing 1 million elements shows that when the
initial capacity is specified 92.6% memory is used by objects. While when
the initial capacity is not specified, as much as 50% of memory gets
deallocated as the List(Of T) grows.

Furthermore, the fact that it is almost exactly 50% makes me thinking that
List(Of T) really copies array to twice bigger when needed - the model using
geometric sum would produce precisely this result.

Tomasz

//C# example
long totalMem0 = Process.GetCurrentProcess().WorkingSet64;
long usedMem0 = GC.GetTotalMemory(true);
int n = 1000000;
List<int> li = new List<int>(n);
for (int i = 0; i < n; i++) {
li.Add(i);
}
long totalMem1 = Process.GetCurrentProcess().WorkingSet64;
long usedMem1 = GC.GetTotalMemory(true);
Debug.WriteLine("% mem used: " + ((double)(usedMem1 - usedMem0)) /
(totalMem1 - totalMem0) * 100);
Nov 21 '05 #19

P: n/a
Tom,
| Furthermore, the fact that it is almost exactly 50% makes me thinking that
| List(Of T) really copies array to twice bigger when needed - the model
using
| geometric sum would produce precisely this result.
Ding, ding, ding. Confetti falls from the roof!! Flashing lights go off...

Jackpot!!! That is what I've been saying! each time List(Of T) needs to
expand its buffer (the internal array) it doubles its size & copies the
values from the old array to the new array. If you set the initial capacity
this doubling may not occur as often.

| //C# example
You do realize this is a VB.NET group? VB.NET samples would be better
followed by other readers following this discussion.

Jay

"Tom Jastrzebski" <to*@tom.com> wrote in message
news:zB***************@newssvr19.news.prodigy.com. ..
|> Remember List(Of T):
| > <quote>
| > Implements the System.Collections.Generic.IList<> interface using an
array
| > whose size is dynamically increased as required
| > </quote>
| > http://msdn2.microsoft.com/library/6...us,vs.80).aspx
| >
| > Notice "an array whose size is dynamically increased as required" which
| > IMHO
| > is clearly refers to a single array, its singular.
|
| I see what they say, but if this had been true there would not have been
| that much memory deallocated, I think.
|
| Test with List(Of Integer) containing 1 million elements shows that when
the
| initial capacity is specified 92.6% memory is used by objects. While when
| the initial capacity is not specified, as much as 50% of memory gets
| deallocated as the List(Of T) grows.
|
| Furthermore, the fact that it is almost exactly 50% makes me thinking that
| List(Of T) really copies array to twice bigger when needed - the model
using
| geometric sum would produce precisely this result.
|
|
|
| Tomasz
|
| //C# example
| long totalMem0 = Process.GetCurrentProcess().WorkingSet64;
| long usedMem0 = GC.GetTotalMemory(true);
| int n = 1000000;
| List<int> li = new List<int>(n);
| for (int i = 0; i < n; i++) {
| li.Add(i);
| }
| long totalMem1 = Process.GetCurrentProcess().WorkingSet64;
| long usedMem1 = GC.GetTotalMemory(true);
| Debug.WriteLine("% mem used: " + ((double)(usedMem1 - usedMem0)) /
| (totalMem1 - totalMem0) * 100);
|
|
Nov 21 '05 #20

This discussion thread is closed

Replies have been disabled for this discussion.