Array.Clear vs List<>.Clear

Lee Crabtree

This seems inconsistent and more than a little bizarre.

Array.Clear sets all elements of the array to their default values (0,
null, whatever), whereas List<>.Clear removes all items from the list.
That part makes a reasonable amount of sense, as you can't actually take
items away from an Array. However, there doesn't seem to be a way to
perform the same operation in one fell swoop on a List<>.

For example:

byte[] byteArr = new byte[10];

....things happen and bytes get set...

Array.Clear(byteArr, 0, 10);

Now all the bytes are set to 0.
But if you use a List<byte>:

List<bytebyteList = new List<byte>(new byte[10]);

....things happen and bytes get set...

There's no way to reset all the bytes, so you're forced to iterate over
the list. Now, I'm sure that the performance hit of having to run a for
loop across the list isn't incredible. But aside from the apparent
inconsistency, I have to wonder if there isn't some mechanism to do the
same thing to a generic List.

Lee Crabtree

Oct 4 '07 #1

Subscribe Post Reply

5810

Marc Gravell

IMO, the Array.Clear() behavior is incorrect, as it doesn't follow the
normal intent of Clear(), and should throw a NotSupportedException. Re
the generic list - I can't think of many scenarios when I would *want*
to do this on a list - however, looping (using indexer, not foreach)
is a reasonable option and won't generally be too slow. You could
simply Clear() and AddRange(new T[count]), but this unnecessarily
creates the array.

Marc

Oct 4 '07 #2

Vadym Stetsiak

Hello, Lee!

What's the point using List<Tlike Array ?

List<Tinternally maintains array of type T, so when you call List<T>.Clear
all internal array items will be set to 0,
and List<Tsize will be reset.
--
With best regards, Vadym Stetsiak.
Blog: http://vadmyst.blogspot.com

You wrote on Thu, 04 Oct 2007 10:17:20 -0500:

LCThis seems inconsistent and more than a little bizarre.

LCArray.Clear sets all elements of the array to their default values
LC(0, null, whatever), whereas List<>.Clear removes all items from
LCthe list.
LCThat part makes a reasonable amount of sense, as you can't actually
LCtake items away from an Array. However, there doesn't seem to be a
LCway to perform the same operation in one fell swoop on a List<>.

LCFor example:

LCbyte[] byteArr = new byte[10];

LC...things happen and bytes get set...

LCArray.Clear(byteArr, 0, 10);

LCNow all the bytes are set to 0.
LCBut if you use a List<byte>:

LCList<bytebyteList = new List<byte>(new byte[10]);

LC...things happen and bytes get set...

LCThere's no way to reset all the bytes, so you're forced to iterate
LCover the list. Now, I'm sure that the performance hit of having to
LCrun a for loop across the list isn't incredible. But aside from
LCthe apparent inconsistency, I have to wonder if there isn't some
LCmechanism to do the same thing to a generic List.

LCLee Crabtree

Oct 4 '07 #3

Nicholas Paldino [.NET/C# MVP]

Lee,

There isn't a mechanism to do this, but writing your own is easy:

public static void ClearList<T>(List<Tlist, int index, int length)
{
// Cycle through the list and set the item to the default.
for (; index < (index + length); ++index)
{
// Set the item in the list to the default value.
list[index] = default(T);
}
}
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Lee Crabtree" <lc*******@goisi.comwrote in message
news:%2****************@TK2MSFTNGP02.phx.gbl...

This seems inconsistent and more than a little bizarre.

Array.Clear sets all elements of the array to their default values (0,
null, whatever), whereas List<>.Clear removes all items from the list.
That part makes a reasonable amount of sense, as you can't actually take
items away from an Array. However, there doesn't seem to be a way to
perform the same operation in one fell swoop on a List<>.

For example:

byte[] byteArr = new byte[10];

...things happen and bytes get set...

Array.Clear(byteArr, 0, 10);

Now all the bytes are set to 0.
But if you use a List<byte>:

List<bytebyteList = new List<byte>(new byte[10]);

...things happen and bytes get set...

There's no way to reset all the bytes, so you're forced to iterate over
the list. Now, I'm sure that the performance hit of having to run a for
loop across the list isn't incredible. But aside from the apparent
inconsistency, I have to wonder if there isn't some mechanism to do the
same thing to a generic List.

Lee Crabtree

Oct 4 '07 #4

Marc Gravell

IMO, the Array.Clear() behavior is incorrect...

for clarity, I'm talking about its implementation on IList / IList<T>.

What Array / T[] does on its own behalf is up to itself.

Marc

Oct 4 '07 #5

Lee Crabtree

The major reason I use a List<Tin a place where an Array would
normally be more common is that the line length has to be extended or
shortened at certain times, and recreating the array then copying
elements across seems unnecessarily ridiculous.

Lee Crabtree

Vadym Stetsiak wrote:

Hello, Lee!

What's the point using List<Tlike Array ?

List<Tinternally maintains array of type T, so when you call List<T>.Clear
all internal array items will be set to 0,
and List<Tsize will be reset.
--
With best regards, Vadym Stetsiak.
Blog: http://vadmyst.blogspot.com

You wrote on Thu, 04 Oct 2007 10:17:20 -0500:

LCThis seems inconsistent and more than a little bizarre.

LCArray.Clear sets all elements of the array to their default values
LC(0, null, whatever), whereas List<>.Clear removes all items from
LCthe list.
LCThat part makes a reasonable amount of sense, as you can't actually
LCtake items away from an Array. However, there doesn't seem to be a
LCway to perform the same operation in one fell swoop on a List<>.

LCFor example:

LCbyte[] byteArr = new byte[10];

LC...things happen and bytes get set...

LCArray.Clear(byteArr, 0, 10);

LCNow all the bytes are set to 0.
LCBut if you use a List<byte>:

LCList<bytebyteList = new List<byte>(new byte[10]);

LC...things happen and bytes get set...

LCThere's no way to reset all the bytes, so you're forced to iterate
LCover the list. Now, I'm sure that the performance hit of having to
LCrun a for loop across the list isn't incredible. But aside from
LCthe apparent inconsistency, I have to wonder if there isn't some
LCmechanism to do the same thing to a generic List.

LCLee Crabtree

Oct 4 '07 #6

Lee Crabtree

That's more or less what I wound up doing anyway, but curiosity got the
better of me, and there's rarely harm in asking.

Lee Crabtree

Nicholas Paldino [.NET/C# MVP] wrote:

Lee,

There isn't a mechanism to do this, but writing your own is easy:

public static void ClearList<T>(List<Tlist, int index, int length)
{
// Cycle through the list and set the item to the default.
for (; index < (index + length); ++index)
{
// Set the item in the list to the default value.
list[index] = default(T);
}
}

Oct 4 '07 #7

Laura T.

"Lee Crabtree" <lc*******@goisi.comha scritto nel messaggio
news:ul**************@TK2MSFTNGP06.phx.gbl...

The major reason I use a List<Tin a place where an Array would normally
be more common is that the line length has to be extended or shortened at
certain times, and recreating the array then copying elements across seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<Tanyway, copying and more
(ridiculous) copying.
As Vadym said, the List<Tinternally manages the list as an Array,[], so
every time you add one element over the List<T>.Capacity, the it will create
a new array, copy the elements, add the new item and then destroying the old
array.
The unfortunate thing with List<Tis that the capacity does grow up one by
one. I'd like to see something like "expand it by 10%", so that the next Add
would not incur reallocations.
So if you know how much you must expand, you sould set the List<T>.Capacity
in advance to the max you need as you cannot shrink a List<Tcapacity.
The performance of simple System.Array could result better.

>
Lee Crabtree

Vadym Stetsiak wrote:
>Hello, Lee!

What's the point using List<Tlike Array ?

List<Tinternally maintains array of type T, so when you call
List<T>.Clear all internal array items will be set to 0,
and List<Tsize will be reset.
--
With best regards, Vadym Stetsiak.
Blog: http://vadmyst.blogspot.com

You wrote on Thu, 04 Oct 2007 10:17:20 -0500:

LCThis seems inconsistent and more than a little bizarre.

LCArray.Clear sets all elements of the array to their default values
LC(0, null, whatever), whereas List<>.Clear removes all items from
LCthe list.
LCThat part makes a reasonable amount of sense, as you can't actually
LCtake items away from an Array. However, there doesn't seem to be a
LCway to perform the same operation in one fell swoop on a List<>.

LCFor example:

LCbyte[] byteArr = new byte[10];

LC...things happen and bytes get set...

LCArray.Clear(byteArr, 0, 10);

LCNow all the bytes are set to 0.
LCBut if you use a List<byte>:

LCList<bytebyteList = new List<byte>(new byte[10]);

LC...things happen and bytes get set...

LCThere's no way to reset all the bytes, so you're forced to iterate
LCover the list. Now, I'm sure that the performance hit of having to
LCrun a for loop across the list isn't incredible. But aside from
LCthe apparent inconsistency, I have to wonder if there isn't some
LCmechanism to do the same thing to a generic List.

LCLee Crabtree

Oct 4 '07 #8

Lee Crabtree

Okay, this defies what I would have expected, but a few quick tests make
it look as though chucking the whole list out and re-newing it is almost
twice as fast, viz:

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace ListDefaultingTest
{
class Program
{
static void Main(string[] args)
{
Stopwatch testTime = new Stopwatch();
List<bytetestList = new List<byte>(new byte[5000]);

testTime.Start();

for(int i = 0; i < testList.Count; i++)
testList[i] = 0;

testTime.Stop();

Console.WriteLine(testTime.Elapsed);

testTime.Reset();
testTime.Start();

testList = new List<byte>(new byte[5000]);

testTime.Stop();

Console.WriteLine(testTime.Elapsed);

Console.ReadLine();
}
}
}

Running this on my box (2.6 Ghz dual-core) gives between 450 and 650 ms
for the first version, and the second gives between 250 and 400 ms.

The List reference being fast makes sense, as there isn't an amazing
amount of work in generating a reference type, but I was pretty shocked
that creating a great big value array was still faster. Whether this
has to do with the speed of accessing List elements or the speed of
contiguous memory allocation (arrays ARE contiguous, aren't they?) I
don't know, but this was more than a little weird, to my mind.

Lee Crabtree

Nicholas Paldino [.NET/C# MVP] wrote:

Lee,

There isn't a mechanism to do this, but writing your own is easy:

public static void ClearList<T>(List<Tlist, int index, int length)
{
// Cycle through the list and set the item to the default.
for (; index < (index + length); ++index)
{
// Set the item in the list to the default value.
list[index] = default(T);
}
}

Oct 4 '07 #9

Doug Semler

On Oct 4, 11:54 am, "Laura T." <LT_s...@yahoo.comwrote:

"Lee Crabtree" <lcrabt...@goisi.comha scritto nel messaggionews:ul**************@TK2MSFTNGP06.phx.gb l...

The major reason I use a List<Tin a place where an Array would normally
be more common is that the line length has to be extended or shortened at
certain times, and recreating the array then copying elements across seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<Tanyway, copying and more
(ridiculous) copying.
As Vadym said, the List<Tinternally manages the list as an Array,[], so
every time you add one element over the List<T>.Capacity, the it will create
a new array, copy the elements, add the new item and then destroying the old
array.
The unfortunate thing with List<Tis that the capacity does grow up one by
one. I'd like to see something like "expand it by 10%", so that the next Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).

So if you know how much you must expand, you sould set the List<T>.Capacity
in advance to the max you need as you cannot shrink a List<Tcapacity.
The performance of simple System.Array could result better.

Oct 4 '07 #10

Peter Duniho

Lee Crabtree wrote:

Okay, this defies what I would have expected, but a few quick tests make
it look as though chucking the whole list out and re-newing it is almost
twice as fast, viz:

[...]
Running this on my box (2.6 Ghz dual-core) gives between 450 and 650 ms
for the first version, and the second gives between 250 and 400 ms.

The List reference being fast makes sense, as there isn't an amazing
amount of work in generating a reference type, but I was pretty shocked
that creating a great big value array was still faster. Whether this
has to do with the speed of accessing List elements or the speed of
contiguous memory allocation (arrays ARE contiguous, aren't they?) I
don't know, but this was more than a little weird, to my mind.

Well, my first caveat would to note the variance in your timings. If
your tests show differences for the same code of almost 100% (250 to 400
ms), that suggests that there are external factors that have much more
significance than the actual implementation.

That in mind, your results aren't all that surprising to me, as it's
almost always going to be slower to access individual elements than to
do things an entire block of memory at a time. Your second section of
code allows the compiler and run-time to do the latter, and so it seems
reasonably expected that it would be faster.

Pete

Oct 4 '07 #11

Doug Semler

On Oct 4, 1:59 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.comwrote:

Lee Crabtree wrote:
Okay, this defies what I would have expected, but a few quick tests make
it look as though chucking the whole list out and re-newing it is almost
twice as fast, viz:

[...]
Running this on my box (2.6 Ghz dual-core) gives between 450 and 650 ms
for the first version, and the second gives between 250 and 400 ms.

The List reference being fast makes sense, as there isn't an amazing
amount of work in generating a reference type, but I was pretty shocked
that creating a great big value array was still faster. Whether this
has to do with the speed of accessing List elements or the speed of
contiguous memory allocation (arrays ARE contiguous, aren't they?) I
don't know, but this was more than a little weird, to my mind.

Well, my first caveat would to note the variance in your timings. If
your tests show differences for the same code of almost 100% (250 to 400
ms), that suggests that there are external factors that have much more
significance than the actual implementation.

That in mind, your results aren't all that surprising to me, as it's
almost always going to be slower to access individual elements than to
do things an entire block of memory at a time. Your second section of
code allows the compiler and run-time to do the latter, and so it seems
reasonably expected that it would be faster.

Pete

Well... List.clear actually calls internally array.clear which is an
internal framework function. I wouldn't be surprised if it didn't
just eventually call some sort of memset variant (which is really
fast).

I would say that calling List.Clear() is going to be slightly faster
than an iterative version (but since her test didn't measure a
List<T>.Clear call it is unclear..pardon the pun).

Really, I think the main advantage to throwing out the old list is the
fact that you can't shrink a list size once it has been grown....and
that every time it grows the array size doubles....So if you have a
list that is 4 megabytes, the next growth will be to 8 megs. even if
you only want 4Megs+2 items....

Oct 4 '07 #12

Peter Duniho

Laura T. wrote:

Well, that's what you are going to get with List<Tanyway, copying and
more (ridiculous) copying.

I wouldn't call the copying "ridiculous", and it is required in any case
with _any_ array implementation. The only way to avoid it would be to
use some linked data structure (eg LinkedList).

As Vadym said, the List<Tinternally manages the list as an Array,[],
so every time you add one element over the List<T>.Capacity, the it will
create a new array, copy the elements, add the new item and then
destroying the old array.

This would be required managing a dynamically-sized Array as well,
except you'd have to do it yourself.

The unfortunate thing with List<Tis that the capacity does grow up one
by one. I'd like to see something like "expand it by 10%", so that the
next Add would not incur reallocations.

That's not correct. The List<Tdoes NOT "grow up one by one". The
default size starts out at 4 elements, and each time you exceed the
capacity of the list, the newly allocated storage has double the
previous capacity.

So if you know how much you must expand, you sould set the
List<T>.Capacity in advance to the max you need as you cannot shrink a
List<Tcapacity.

Yes, you definitely should provide whatever information you have, if you
have it. If you know what the eventual required capacity will be, it is
better to provide that.

The performance of simple System.Array could result better.

I don't see how, since the underlying behavior is basically the same.

For what it's worth, I wrote a short program to test the performance and
behavior of the List<Tclass. It confirms that with each new
allocation, the capacity is doubled. It also shows that at least on a
reasonably fast computer (Core 2 Duo 2.33Ghz), it takes less than 100 ms
to add in sequence 1 million integers to a List<int>.

It's true that for smaller lists, there are more allocations, and so the
copying is a larger part of the overall time cost. But then, for
smaller lists, the total time cost is significantly lower anyway, so
that proportional difference isn't really a big deal.

I really don't think "all that copying" is a significant performance factor.

Pete

Oct 4 '07 #13

Willy Denoyette [MVP]

"Doug Semler" <do********@gmail.comwrote in message
news:11**********************@w3g2000hsg.googlegro ups.com...

On Oct 4, 11:54 am, "Laura T." <LT_s...@yahoo.comwrote:
>"Lee Crabtree" <lcrabt...@goisi.comha scritto nel
messaggionews:ul**************@TK2MSFTNGP06.phx.g bl...

The major reason I use a List<Tin a place where an Array would
normally
be more common is that the line length has to be extended or shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<Tanyway, copying and
more
(ridiculous) copying.
As Vadym said, the List<Tinternally manages the list as an Array,[], so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying the
old
array.
The unfortunate thing with List<Tis that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).

No Laura is right, a new array is created with a size which is equal to the
previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<bytetestList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets created
with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy.

Oct 4 '07 #14

Willy Denoyette [MVP]

"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...

"Doug Semler" <do********@gmail.comwrote in message
news:11**********************@w3g2000hsg.googlegro ups.com...
>On Oct 4, 11:54 am, "Laura T." <LT_s...@yahoo.comwrote:
>>"Lee Crabtree" <lcrabt...@goisi.comha scritto nel
messaggionews:ul**************@TK2MSFTNGP06.phx. gbl...

The major reason I use a List<Tin a place where an Array would
normally
be more common is that the line length has to be extended or shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<Tanyway, copying and
more
(ridiculous) copying.
As Vadym said, the List<Tinternally manages the list as an Array,[],
so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying the
old
array.
The unfortunate thing with List<Tis that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).

No Laura is right, a new array is created with a size which is equal to
the previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<bytetestList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets
created with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy.

Oct 4 '07 #15

Doug Semler

On Oct 4, 2:28 pm, "Willy Denoyette [MVP]"
<willy.denoye...@telenet.bewrote:

"Doug Semler" <dougsem...@gmail.comwrote in message

news:11**********************@w3g2000hsg.googlegro ups.com...

On Oct 4, 11:54 am, "Laura T." <LT_s...@yahoo.comwrote:
"Lee Crabtree" <lcrabt...@goisi.comha scritto nel
messaggionews:ul**************@TK2MSFTNGP06.phx.gb l...

The major reason I use a List<Tin a place where an Array would
normally
be more common is that the line length has to be extended or shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<Tanyway, copying and
more
(ridiculous) copying.
As Vadym said, the List<Tinternally manages the list as an Array,[], so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying the
old
array.
The unfortunate thing with List<Tis that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).

No Laura is right, a new array is created with a size which is equal to the
previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<bytetestList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets created
with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy

If I put console.Write(testList.Capacity) after every create/add i
get:
0
4
4
4
4
8
8
8
8
The list doubles....Reflect the Add and EnsureCapacity method. What
happens is the current Count (size) is compared to the capacity
(items.Length). EnsureCapacity is then called on size+1. HOWEVER,
EnsureCapacity DOUBLES the list size.

Oct 4 '07 #16

Willy Denoyette [MVP]

"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP06.phx.gbl...

"Doug Semler" <do********@gmail.comwrote in message
news:11**********************@w3g2000hsg.googlegro ups.com...
>On Oct 4, 11:54 am, "Laura T." <LT_s...@yahoo.comwrote:
>>"Lee Crabtree" <lcrabt...@goisi.comha scritto nel
messaggionews:ul**************@TK2MSFTNGP06.phx. gbl...

The major reason I use a List<Tin a place where an Array would
normally
be more common is that the line length has to be extended or shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<Tanyway, copying and
more
(ridiculous) copying.
As Vadym said, the List<Tinternally manages the list as an Array,[],
so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying the
old
array.
The unfortunate thing with List<Tis that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).

No Laura is right, a new array is created with a size which is equal to
the previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<bytetestList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets
created with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy.

I see I'm not (completely) right of course, point is that the default size
is zero, and is doubled when there is an overflow.

So in the previous sample, the sequence is:
array size 0
add 1 byte - create array with size 1 (int) , say n
add byte
add byte
add byte
add byte -create array with size n = n* 2, from now on the new array is
doubled in size.
....

Sorry for the confusion.
Willy.

Oct 4 '07 #17

Jon Skeet [C# MVP]

Laura T. <LT*****@yahoo.comwrote:

<snip>

The unfortunate thing with List<Tis that the capacity does grow up one by
one. I'd like to see something like "expand it by 10%", so that the next Add
would not incur reallocations.

Nope, it doubles the capacity each time, as Doug said. Here's code to show that:

using System;
using System.Collections.Generic;

public class ConvertFile
{
public static void Main(string[] args)
{
List<stringlist = new List<string>();

int previousCapacity=-1;
for (int i=0; i < 100000; i++)
{
list.Add("");
if (list.Capacity != previousCapacity)
{
Console.WriteLine (list.Capacity);
previousCapacity = list.Capacity;
}
}
}
}

Output:

4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072

It would perhaps be nice to be able to specify the scaling factor, but 2 is reasonable for most situations.
--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Oct 4 '07 #18

Willy Denoyette [MVP]

"Doug Semler" <do********@gmail.comwrote in message
news:11**********************@d55g2000hsg.googlegr oups.com...

On Oct 4, 2:28 pm, "Willy Denoyette [MVP]"
<willy.denoye...@telenet.bewrote:
>"Doug Semler" <dougsem...@gmail.comwrote in message

news:11**********************@w3g2000hsg.googlegr oups.com...

On Oct 4, 11:54 am, "Laura T." <LT_s...@yahoo.comwrote:
"Lee Crabtree" <lcrabt...@goisi.comha scritto nel
messaggionews:ul**************@TK2MSFTNGP06.phx.g bl...

The major reason I use a List<Tin a place where an Array would
normally
be more common is that the line length has to be extended or
shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.

>Well, that's what you are going to get with List<Tanyway, copying
and
more
(ridiculous) copying.
As Vadym said, the List<Tinternally manages the list as an Array,[],
so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying
the
old
array.
The unfortunate thing with List<Tis that the capacity does grow up
one
by
one. I'd like to see something like "expand it by 10%", so that the
next
Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).

No Laura is right, a new array is created with a size which is equal to
the
previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<bytetestList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets
created
with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy

If I put console.Write(testList.Capacity) after every create/add i
get:
0
4
4
4
4
8
8
8
8
The list doubles....Reflect the Add and EnsureCapacity method. What
happens is the current Count (size) is compared to the capacity
(items.Length). EnsureCapacity is then called on size+1. HOWEVER,
EnsureCapacity DOUBLES the list size.

That's right, a new array is created with a size which is twice the previous
array size.
Actually, my point was that the default was not 4 but 0 (call it
nitpicking). That is;
List<bytetestList = new List<byte>();
starts with an empty array, and at the very first insert, a new array gets
created with a Capacity = 4 * element size, which for a byte type is an
array of 1 int. From then on a new array with twice the size is created when
the current array gets filled completely .
Willy.

Oct 4 '07 #19

Willy Denoyette [MVP]

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP*********************@msnews.microsoft.com. ..

Laura T. <LT*****@yahoo.comwrote:

<snip>

>The unfortunate thing with List<Tis that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.

Nope, it doubles the capacity each time, as Doug said. Here's code to show
that:

using System;
using System.Collections.Generic;

public class ConvertFile
{
public static void Main(string[] args)
{
List<stringlist = new List<string>();

int previousCapacity=-1;
for (int i=0; i < 100000; i++)
{
list.Add("");
if (list.Capacity != previousCapacity)
{
Console.WriteLine (list.Capacity);
previousCapacity = list.Capacity;
}
}
}
}

Output:

4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072

It would perhaps be nice to be able to specify the scaling factor, but 2
is reasonable for most situations.

I heard this scaling factor would change, or the algorithm would be adapted,
such that at a certain threshold the scaling factor would be reduced.
Actually there are some issues with this on 64 bit, remember the object size
limit is still 2GB even on 64 bit, so when arrived at 1GB, the next insert
will fail with an OOM exception, starting with a capacity which is large
enough is not always possible or even wanted.

Willy.

Oct 4 '07 #20

Christof Nordiek

"Doug Semler" <do********@gmail.comschrieb im Newsbeitrag
news:11**********************@50g2000hsm.googlegro ups.com...

Well... List.clear actually calls internally array.clear which is an
internal framework function. I wouldn't be surprised if it didn't
just eventually call some sort of memset variant (which is really
fast).

I suppose not, because List.Clear doesn't have to clear the items. It only
sets the length to zero.

>
I would say that calling List.Clear() is going to be slightly faster
than an iterative version (but since her test didn't measure a
List<T>.Clear call it is unclear..pardon the pun).

After List.Clear the list would have to be refilled, because it results in a
list with length zero. Not a list with same length and all Elements zero.

>
Really, I think the main advantage to throwing out the old list is the
fact that you can't shrink a list size once it has been grown....and
that every time it grows the array size doubles....So if you have a
list that is 4 megabytes, the next growth will be to 8 megs. even if
you only want 4Megs+2 items....

This also doesn't apply, if the List remains it size.

Christof

Oct 5 '07 #21

Peter Duniho

Christof Nordiek wrote:

"Doug Semler" <do********@gmail.comschrieb im Newsbeitrag
news:11**********************@50g2000hsm.googlegro ups.com...

>Well... List.clear actually calls internally array.clear which is an
internal framework function. I wouldn't be surprised if it didn't
just eventually call some sort of memset variant (which is really
fast).

I suppose not, because List.Clear doesn't have to clear the items. It
only sets the length to zero.

Why doesn't List.Clear() have to clear the items? If it doesn't,
reference types will still be referenced and can't be garbage collected.

I know that if I clear my List<>, I surely expect it to actually let go
of any references it's kept. It has to actually clear the internal
storage for this to happen. Either with explicit code to clear the
existing storage, or simply reallocating a new version of the storage.
Either way though, it has to do more than just reset the length.

>I would say that calling List.Clear() is going to be slightly faster
than an iterative version (but since her test didn't measure a
List<T>.Clear call it is unclear..pardon the pun).

After List.Clear the list would have to be refilled, because it results
in a list with length zero. Not a list with same length and all Elements
zero.

But the length is only "zero" to the client code using the List<class.
As far as the underlying implementation goes, it's still a valid array.

Pete

Oct 5 '07 #22

Doug Semler

"Christof Nordiek" <cn@nospam.dewrote in message
news:eZ**************@TK2MSFTNGP03.phx.gbl...

"Doug Semler" <do********@gmail.comschrieb im Newsbeitrag
news:11**********************@50g2000hsm.googlegro ups.com...

>Well... List.clear actually calls internally array.clear which is an
internal framework function. I wouldn't be surprised if it didn't
just eventually call some sort of memset variant (which is really
fast).

I suppose not, because List.Clear doesn't have to clear the items. It only
sets the length to zero.

But List.clear doesn't do that. All it does is dereference whatever is held
in the list.

Iss actually more important on 64 bit machines, where references are twice
as long as 32 bits ....wouldn't a List<objectthrow a OOM exception twice
as fast on that platform as on 32 bit just because of the "doubling of size
2GB memory allocation" issue ????? I'll have to test this monday......

--
Doug Semler, MCPD
a.a. #705, BAAWA. EAC Guardian of the Horn of the IPU (pbuhh).
The answer is 42; DNRC o-
Gur Hfrarg unf orpbzr fb shyy bs penc gurfr qnlf, abbar rira
erpbtavmrf fvzcyr guvatf yvxr ebg13 nalzber. Fnq, vfa'g vg?

Oct 6 '07 #23

Willy Denoyette [MVP]

"Doug Semler" <do********@gmail.comwrote in message
news:e3**************@TK2MSFTNGP05.phx.gbl...

"Christof Nordiek" <cn@nospam.dewrote in message
news:eZ**************@TK2MSFTNGP03.phx.gbl...
>"Doug Semler" <do********@gmail.comschrieb im Newsbeitrag
news:11**********************@50g2000hsm.googlegr oups.com...

>>Well... List.clear actually calls internally array.clear which is an
internal framework function. I wouldn't be surprised if it didn't
just eventually call some sort of memset variant (which is really
fast).

I suppose not, because List.Clear doesn't have to clear the items. It
only sets the length to zero.

But List.clear doesn't do that. All it does is dereference whatever is
held in the list.

Iss actually more important on 64 bit machines, where references are twice
as long as 32 bits ....wouldn't a List<objectthrow a OOM exception
twice as fast on that platform as on 32 bit just because of the "doubling
of size 2GB memory allocation" issue ????? I'll have to test this
monday......

On 64 bit Windows this throw twice as fast, references are 8 bytes now,
which means that you can't have more than ~268 Million references in the
List.
Note that you may have run out of physical memory long before you've reached
this number, the minimum object size on 64 bit is 24 bytes, which means that
a minimum of 6GB is needed to store the 268M objects.

Willy.

Oct 6 '07 #24

Jon Skeet [C# MVP]

Willy Denoyette [MVP] <wi*************@telenet.bewrote:

<snip>

Note that you may have run out of physical memory long before you've reached
this number, the minimum object size on 64 bit is 24 bytes, which means that
a minimum of 6GB is needed to store the 268M objects.

Only if each reference is to a different object. You may well have a
list which only has references to 10 *distinct* objects, but still has
a length in the millions.

I'm not saying this is a common scenario, of course :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Oct 6 '07 #25

Willy Denoyette [MVP]

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP*********************@msnews.microsoft.com. ..

Willy Denoyette [MVP] <wi*************@telenet.bewrote:

<snip>

>Note that you may have run out of physical memory long before you've
reached
this number, the minimum object size on 64 bit is 24 bytes, which means
that
a minimum of 6GB is needed to store the 268M objects.

Only if each reference is to a different object. You may well have a
list which only has references to 10 *distinct* objects, but still has
a length in the millions.

Absolutely right.

I'm not saying this is a common scenario, of course :)

I would suggest a design revision if I had to deal with this :)

Willy.

Oct 6 '07 #26

Ben Voigt [C++ MVP]

"Peter Duniho" <Np*********@NnOwSlPiAnMk.comwrote in message
news:13*************@corp.supernews.com...

Christof Nordiek wrote:
>"Doug Semler" <do********@gmail.comschrieb im Newsbeitrag
news:11**********************@50g2000hsm.googlegr oups.com...

>>Well... List.clear actually calls internally array.clear which is an
internal framework function. I wouldn't be surprised if it didn't
just eventually call some sort of memset variant (which is really
fast).

I suppose not, because List.Clear doesn't have to clear the items. It
only sets the length to zero.

Why doesn't List.Clear() have to clear the items? If it doesn't,
reference types will still be referenced and can't be garbage collected.

It doesn't need to zero each pointer, just make the whole mess unreachable.
It does that by clearing its reference to the entire array of pointers --
O(1), not O(N).

Oct 8 '07 #27

Ben Voigt [C++ MVP]

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP*********************@msnews.microsoft.com. ..

Willy Denoyette [MVP] <wi*************@telenet.bewrote:

<snip>

>Note that you may have run out of physical memory long before you've
reached
this number, the minimum object size on 64 bit is 24 bytes, which means
that
a minimum of 6GB is needed to store the 268M objects.

Only if each reference is to a different object. You may well have a
list which only has references to 10 *distinct* objects, but still has
a length in the millions.

I'm not saying this is a common scenario, of course :)

Sounds like Flyweight pattern to me, so maybe not all that uncommon.

>
--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Oct 8 '07 #28

Ben Voigt [C++ MVP]

"Willy Denoyette [MVP]" <wi*************@telenet.bewrote in message
news:%2****************@TK2MSFTNGP03.phx.gbl...

"Doug Semler" <do********@gmail.comwrote in message
news:e3**************@TK2MSFTNGP05.phx.gbl...
>"Christof Nordiek" <cn@nospam.dewrote in message
news:eZ**************@TK2MSFTNGP03.phx.gbl...
>>"Doug Semler" <do********@gmail.comschrieb im Newsbeitrag
news:11**********************@50g2000hsm.googleg roups.com...

Well... List.clear actually calls internally array.clear which is an
internal framework function. I wouldn't be surprised if it didn't
just eventually call some sort of memset variant (which is really
fast).

I suppose not, because List.Clear doesn't have to clear the items. It
only sets the length to zero.

But List.clear doesn't do that. All it does is dereference whatever is
held in the list.

Iss actually more important on 64 bit machines, where references are
twice as long as 32 bits ....wouldn't a List<objectthrow a OOM
exception twice as fast on that platform as on 32 bit just because of the
"doubling of size 2GB memory allocation" issue ????? I'll have to test
this monday......

The 2GB limit doesn't apply to x64.

Oct 8 '07 #29

Ben Voigt [C++ MVP]

But List.clear doesn't do that. All it does is dereference whatever is

held in the list.

I would hope it doesn't *dereference* any list entries. I think I know what
you meant (cease referencing), but that isn't what "dereference" means.

Oct 8 '07 #30

Jon Skeet [C# MVP]

On Oct 8, 3:37 pm, "Ben Voigt [C++ MVP]" <r...@nospam.nospamwrote:

Why doesn't List.Clear() have to clear the items? If it doesn't,
reference types will still be referenced and can't be garbage collected.

It doesn't need to zero each pointer, just make the whole mess unreachable.
It does that by clearing its reference to the entire array of pointers --
O(1), not O(N).

It doesn't do that though - it clears the existing array (and keeps
it), rather than just "forgetting" about the array.
It *could* either create a new array with the same capacity or
possibly just reset the capacity to 0, either way forgetting about the
old array, but it happens not to: it calls Array.Clear (specifying its
current size as the number of items to wipe out - so calling
List<T>.Clear twice in a row is quicker the second time).

Jon

Oct 8 '07 #31

Jon Skeet [C# MVP]

On Oct 8, 3:38 pm, "Ben Voigt [C++ MVP]" <r...@nospam.nospamwrote:

Only if each reference is to a different object. You may well have a
list which only has references to 10 *distinct* objects, but still has
a length in the millions.

I'm not saying this is a common scenario, of course :)

Sounds like Flyweight pattern to me, so maybe not all that uncommon.

Using flyweights isn't uncommon - I'd suggest that having lists with
millions of such entries is reasonably uncommon though.

Jon

Oct 8 '07 #32

Jon Skeet [C# MVP]

On Oct 8, 3:39 pm, "Ben Voigt [C++ MVP]" <r...@nospam.nospamwrote:

Iss actually more important on 64 bit machines, where references are
twice as long as 32 bits ....wouldn't a List<objectthrow a OOM
exception twice as fast on that platform as on 32 bit just because of the
"doubling of size 2GB memory allocation" issue ????? I'll have to test
this monday......

The 2GB limit doesn't apply to x64.

Yes it does, on a per-object basis. See
http://blogs.msdn.com/joshwil/archiv...10/450202.aspx

<quote>
First some background; in the 2.0 version of the .Net runtime (CLR) we
made a conscious design decision to keep the maximum object size
allowed in the GC Heap at 2GB, even on the 64-bit version of the
runtime.
</quote>

Jon

Oct 8 '07 #33

Peter Duniho

Ben Voigt [C++ MVP] wrote:

>>Iss actually more important on 64 bit machines, where references are
twice as long as 32 bits ....wouldn't a List<objectthrow a OOM
exception twice as fast on that platform as on 32 bit just because of the
"doubling of size 2GB memory allocation" issue ????? I'll have to test
this monday......

The 2GB limit doesn't apply to x64.

My understanding is that even on 64-bit, .NET has a 2GB limitation, or
at least a 2^32 element count limitation.

Has this changed? Is there a true 64-bit implementation of .NET
available now?

Pete

Oct 8 '07 #34

Ben Voigt [C++ MVP]

"Peter Duniho" <Np*********@NnOwSlPiAnMk.comwrote in message
news:13*************@corp.supernews.com...

Ben Voigt [C++ MVP] wrote:

>>>Iss actually more important on 64 bit machines, where references are
twice as long as 32 bits ....wouldn't a List<objectthrow a OOM
exception twice as fast on that platform as on 32 bit just because of
the "doubling of size 2GB memory allocation" issue ????? I'll have to
test this monday......

The 2GB limit doesn't apply to x64.

My understanding is that even on 64-bit, .NET has a 2GB limitation, or at
least a 2^32 element count limitation.

Well, 2^32 elements would be considerably better than the 2GB virtual
address limit.... but the information Jon posted shows there is still a
limit, although this time more artificial.

>
Has this changed? Is there a true 64-bit implementation of .NET available
now?

Pete

Oct 8 '07 #35

Willy Denoyette [MVP]

"Peter Duniho" <Np*********@NnOwSlPiAnMk.comwrote in message
news:13*************@corp.supernews.com...

Ben Voigt [C++ MVP] wrote:

>>>Iss actually more important on 64 bit machines, where references are
twice as long as 32 bits ....wouldn't a List<objectthrow a OOM
exception twice as fast on that platform as on 32 bit just because of
the "doubling of size 2GB memory allocation" issue ????? I'll have to
test this monday......

The 2GB limit doesn't apply to x64.

My understanding is that even on 64-bit, .NET has a 2GB limitation, or at
least a 2^32 element count limitation.

Has this changed? Is there a true 64-bit implementation of .NET available
now?

Pete

No, the (current) object size limit is not expressed in elements, it's
simply 2^31 bytes. (~2GB). The means that the largest array in n .NET is
currently 2^31 bytes, or (2^31)/4 int's or 2^31)/8 long's, etc...
A true 64 bit version is available since V2, but the bitness has nothing to
do with the object size limits imposed by the CLR.
Willy.

Oct 8 '07 #36

Array.Clear vs List<>.Clear

Similar topics