473,320 Members | 1,713 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

.NET 2.0 performance bug in ArrayList.Sort

I have come across with what appears to be a significant performance bug in
..NET 2.0 ArrayList.Sort method when compared with Array.Sort on the same
data. Same data on the same CPU gets sorted a lot faster with both methods
using .NET 1.1, that's why I am pretty sure its a (rather serious) bug. Below
you can find C# test case that should allow you to reproduce this error, to
run it you will need to put 2 data files into current directory where
executable is or just change filename pathes accordingly, the data files can
be obtained from here:

fast_data.txt: http://majestic12.co.uk/files/other/.../fast_data.txt
slow_data.txt: http://majestic12.co.uk/files/other/.../slow_data.txt

The data are strings (URLs) of about similar size in bytes and number.

The following are the console outputs from code on the same machine (AMD x2
3800, 2 GB RAM, XP Pro SP2):

VS2003 .NET 1.1 (with SP1) run:

-----------------------------------------------------------
Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 234 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 250 msec
-----------------------------------------------------------

Note that sorting times are almost exactly the same here, so all good in
..NET 1.1 .

-----------------------------------------------------------
VS2005 .NET 2.0 run:

Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 1531 msec
Time to sort strings in string[] array is: 187 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 703 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------------------

Notice that on the same machine with the same data sorting times are MUCH
slower in ArrayList.Sort, and particularly for the "slow_data.txt" file,
Array.Sort times are actually better, so I am not complaining there, but
clearly ArrayList sorts are seriously flawed - this appears to be data
dependent and by that I don't mean size of the data or number of strings: I
have got lots of such data files and about every 10th of them is 10-20 times
slower than the other even though it has got about the same number of lines
and bytesize.

Note: I am aware of boxing/unboxing overheads when dealing with ArrayLists,
however in this case the slowdown is really bad comparing to .NET 1.1 and it
appears to be data dependent - I am getting it on about 10% of my dataset
from which I have selected 2 examples (slow and fast) to demonstrate that its
a very serious performance issue.

Here is the code that should allow you to replicate the problem:

//////////////////////////////////////////////////////////////////////////
using System;
using System.Collections;
using System.IO;

/*
This is a test case of what appears to be a major performance issue in
ArrayList.Sort method for strings
in .NET 2.0 - it appears to be data dependant as some similarly sized
data files have got a lot less
performance penalty when sorting them using Array.Sort method on
string[] array.

The kicker: both versions run fast in .NET 1.1

Author: Alex Chudnovsky <al***@majestic12.co.uk>
Date: 28 Apr 2006
*/
namespace Majestic12
{
/// <summary>
/// Tests sorting performance of Array.Sort of string[] versus
ArrayList.Sort of the same strings
/// It appears that in .NET 2.0 in some cases ArrayList will take a LOT
more time to do the sorting
/// </summary>
class SlowArrayListSortTest
{
static void Main(string[] args)
{
// load strings from file: assumed to be in the same place as
the executable
TestFile("slow_data.txt"); // <--- this data file has got 10
times slower
TestFile("fast_data.txt"); // <--- more reasonable 2 times slower

Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void TestFile(string sFile)
{
FileStream oFS=File.OpenRead(sFile);

ArrayList oLines=new ArrayList();

StreamReader oSR=new StreamReader(oFS);

while(oSR.BaseStream.Position<oSR.BaseStream.Lengt h)
{
oLines.Add(oSR.ReadLine());
}

oFS.Close();

Console.WriteLine("Loaded {0} lines from file
{1}",oLines.Count,sFile);

// now copy same strings into string array for speed comparisons
string[] sLines=new string[oLines.Count];

for(int i=0;i<sLines.Length;i++)
sLines[i]=(string)oLines[i];

DateTime oTime=DateTime.Now;
oLines.Sort();
Console.WriteLine("Time to sort strings in ArrayList is: {0}
msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);

oTime=DateTime.Now;
Array.Sort(sLines);
Console.WriteLine("Time to sort strings in string[] array is:
{0} msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);
}
}
}

//////////////////////////////////////////////////////////////////////////
Apr 28 '06 #1
48 4390
Alex Chudnovsky wrote:
I have come across with what appears to be a significant performance
bug in .NET 2.0 ArrayList.Sort method when compared with Array.Sort
on the same data. Same data on the same CPU gets sorted a lot faster
with both methods using .NET 1.1, that's why I am pretty sure its a
(rather serious) bug. Below you can find C# test case that should
allow you to reproduce this error, to run it you will need to put 2
data files into current directory where executable is or just change
filename pathes accordingly, the data files can be obtained from here:

fast_data.txt:
http://majestic12.co.uk/files/other/.../fast_data.txt
slow_data.txt:
http://majestic12.co.uk/files/other/.../slow_data.txt

The data are strings (URLs) of about similar size in bytes and number.

The following are the console outputs from code on the same machine
(AMD x2 3800, 2 GB RAM, XP Pro SP2):

VS2003 .NET 1.1 (with SP1) run:

-----------------------------------------------------------
Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 234 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 250 msec
-----------------------------------------------------------

Note that sorting times are almost exactly the same here, so all good
in .NET 1.1 .

-----------------------------------------------------------
VS2005 .NET 2.0 run:

Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 1531 msec
Time to sort strings in string[] array is: 187 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 703 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------------------

Notice that on the same machine with the same data sorting times are
MUCH slower in ArrayList.Sort, and particularly for the
"slow_data.txt" file, Array.Sort times are actually better, so I am
not complaining there, but clearly ArrayList sorts are seriously
flawed - this appears to be data dependent and by that I don't mean
size of the data or number of strings: I have got lots of such data
files and about every 10th of them is 10-20 times slower than the
other even though it has got about the same number of lines and
bytesize.

Note: I am aware of boxing/unboxing overheads when dealing with
ArrayLists, however in this case the slowdown is really bad comparing
to .NET 1.1 and it appears to be data dependent - I am getting it on
about 10% of my dataset from which I have selected 2 examples (slow
and fast) to demonstrate that its a very serious performance issue.

Here is the code that should allow you to replicate the problem:

//////////////////////////////////////////////////////////////////////
//// using System;
using System.Collections;
using System.IO;

/*
This is a test case of what appears to be a major performance
issue in ArrayList.Sort method for strings
in .NET 2.0 - it appears to be data dependant as some similarly
sized data files have got a lot less
performance penalty when sorting them using Array.Sort method on
string[] array.

The kicker: both versions run fast in .NET 1.1

Author: Alex Chudnovsky <al***@majestic12.co.uk>
Date: 28 Apr 2006
*/
namespace Majestic12
{
/// <summary>
/// Tests sorting performance of Array.Sort of string[] versus
ArrayList.Sort of the same strings
/// It appears that in .NET 2.0 in some cases ArrayList will take
a LOT more time to do the sorting
/// </summary>
class SlowArrayListSortTest
{
static void Main(string[] args)
{
// load strings from file: assumed to be in the same
place as the executable
TestFile("slow_data.txt"); // <--- this data file has got
10 times slower
TestFile("fast_data.txt"); // <--- more reasonable 2
times slower

Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void TestFile(string sFile)
{
FileStream oFS=File.OpenRead(sFile);

ArrayList oLines=new ArrayList();

StreamReader oSR=new StreamReader(oFS);

while(oSR.BaseStream.Position<oSR.BaseStream.Lengt h)
{
oLines.Add(oSR.ReadLine());
}

oFS.Close();

Console.WriteLine("Loaded {0} lines from file
{1}",oLines.Count,sFile);

// now copy same strings into string array for speed
comparisons string[] sLines=new string[oLines.Count];

for(int i=0;i<sLines.Length;i++)
sLines[i]=(string)oLines[i];

DateTime oTime=DateTime.Now;
oLines.Sort();
Console.WriteLine("Time to sort strings in ArrayList is:
{0}
msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);

oTime=DateTime.Now;
Array.Sort(sLines);
Console.WriteLine("Time to sort strings in string[] array
is: {0}
msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);
} }
}

//////////////////////////////////////////////////////////////////////
////


Strange, as ArrayList.Sort uses Array.Sort() :D (use reflector, and
see for yourself). So it's not the sort that's flawed, as it's
precisely the same routine that gets executed.

The only difference is that ArrayList.Sort() uses a default comparer,
Array.Sort() doesn't. I think this is where the problem occurs.

I.o.w.: your sorts aren't exactly the same: the array sort is sorting
an array of strings. The arraylist sort is sorting an array of objects
using a comparer.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Apr 29 '06 #2
"Frans Bouma [C# MVP]" wrote:
I.o.w.: your sorts aren't exactly the same: the array sort is sorting
an array of strings. The arraylist sort is sorting an array of objects
using a comparer.
I am aware of that - I mentioned boxing/unboxing overheads. The issues are:
a) its a LOT slower than in .NET 1.1 <-- this alone makes it unacceptable
b) its DRAMATICALLY slower with some data
Strange, as ArrayList.Sort uses Array.Sort() :D (use reflector, and
see for yourself). So it's not the sort that's flawed, as it's
precisely the same routine that gets executed.


I can't say for sure what's flawed but _something_ certainly is, its no so
much slower performance of ArrayList.Sort that bothers me, its much slower
performance than same code for .NET 1.1, I added List<string> test and it
runs just fine, the problem appears to be solely in ArrayList - there is
absolutely no reason why sorting should take many times over version that is
present in .NET 1.1, I actually have got some much worse case data scenarios
that take even much much longer, this is an issue since legacy .NET 1.1 won't
use List<string> and thus require a rewrite. If performance hit was moderate
it could have been ignored, but the performance hit appears to be _DATA_
dependent, if anyone from Microsoft is interested to track this further then
I will be happy to assist.

Here is link to the source code that was mangled by this forum:
http://majestic12.co.uk/files/other/...bug/program.cs

Of course I did workaround it by using List<string>, but I don't like that
"solution" since:
a) it prevents me from being able to do legacy .NET 1.1 builds or involve
conditional defines
b) legacy compiled stuff without source will be hit by unexplained slowdowns

regards,

Alex
Apr 29 '06 #3
Alex Chudnovsky <Al************@discussions.microsoft.com> wrote:
"Frans Bouma [C# MVP]" wrote:
I.o.w.: your sorts aren't exactly the same: the array sort is sorting
an array of strings. The arraylist sort is sorting an array of objects
using a comparer.


I am aware of that - I mentioned boxing/unboxing overheads.


Those won't be relevant with strings - there's no boxing/unboxing
involved with reference types.

(I haven't read the rest of this thread - just thought I'd mention that
quickly now.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 29 '06 #4
> Those won't be relevant with strings - there's no boxing/unboxing
involved with reference types.


Fair enough, I stand corrected. The main issue however is still here: there
should not be any reason why same ArrayList sorts a lot slower in .NET 2.0
build than when it does in .NET 1.1 build with everything else being the same
(machine, data etc). Generic string list and Array.Sort actually sort faster
in .NET 2.0, but ArrayList seems to be rather badly affected, at least for
the data I provided.

It would be good if someone tried the sample code (see link above) and
posted their timings just to see if this is replicated on other machines
(maybe its AMD Athlon x2 that's only affected, something I feel rather
unlikely, but you never know).

Apr 29 '06 #5
Alex,
It would be good if someone tried the sample code (see link above)
and posted their timings just to see if this is replicated on other
machines (maybe its AMD Athlon x2 that's only affected, something I
feel rather unlikely, but you never know).

just tried your code, one time with a big file (3MB) and once with a
small file (just several KB):

Result 1, Big File. This is an average result, even though the measured
milliseconds are not always the same (about +/- 40 msec in execution time):

Loaded 70072 lines from file slow_data.txt
Time to sort strings in ArrayList is: 406 msec
Time to sort strings in Generic Lst is: 406 msec
Time to sort strings in string[] array is: 406 msec
Loaded 70072 lines from file fast_data.txt
Time to sort strings in ArrayList is: 406 msec
Time to sort strings in Generic Lst is: 406 msec
Time to sort strings in string[] array is: 453 msec
Press ENTER to exit
Result 2, Small file. Here it is really depending, on the fast data, I
also got the result, where the ArrayList had 15msec and the Generic List
got 31 msec... On the slow data, the Generic List seems to be quite
constant, whereas the ArrayList is between 70 and 180 msec.
One example output:

Loaded 5835 lines from file slow_data.txt
Time to sort strings in ArrayList is: 78 msec
Time to sort strings in Generic Lst is: 93 msec
Time to sort strings in string[] array is: 62 msec
Loaded 5835 lines from file fast_data.txt
Time to sort strings in ArrayList is: 31 msec
Time to sort strings in Generic Lst is: 15 msec
Time to sort strings in string[] array is: 31 msec
Press ENTER to exit
Finally, I thought, that there might be problem with the startup and I
switch the line oLines.Sort with the one of oGLines.Sort and it
happened, that now the oGLines.Sort semms to be slower... so I think it
is a matter of Startup (most of the time might arise at a first
initialization of Array.Sort())... maybe some internal static variable
or whatever...

And additionally, I tried some really big file (some 18MB) and it
happened, that all three sorting algorithms merely were the same... so
for me the Sort() method performs well, it must be some matter of startup.

long answer and a lot of assumptions, but I hope to help!
Markus

btw.: I have tested it on a PIV Centrino Laptop.
Apr 29 '06 #6
Thanks for that Marcus - but from what I can see you have not tried it with
my data that I linked to, specifically the following that seems slower than
usual: http://majestic12.co.uk/files/other/.../slow_data.txt

I have got thousands of such data files and about 10% of them exhibit the
problem, the rest get sorted in ArrayList pretty fast, however SOME are much
slower, yet if Array.Sort is used then they are all fast, hence my assertion
that its data dependent and that's why I provided links to actual data to
help replicate it, please try it and post here results, thanks!

The issue appears to be really data dependent, which is probably why it may
have escaped scrutiny during normal tests.

regards

Alex
Apr 29 '06 #7
Alex Chudnovsky <Al************@discussions.microsoft.com> wrote:
Thanks for that Marcus - but from what I can see you have not tried it with
my data that I linked to, specifically the following that seems slower than
usual: http://majestic12.co.uk/files/other/.../slow_data.txt

I have got thousands of such data files and about 10% of them exhibit the
problem, the rest get sorted in ArrayList pretty fast, however SOME are much
slower, yet if Array.Sort is used then they are all fast, hence my assertion
that its data dependent and that's why I provided links to actual data to
help replicate it, please try it and post here results, thanks!

The issue appears to be really data dependent, which is probably why it may
have escaped scrutiny during normal tests.


I believe I've read before that the ArrayList.Sort pivoting strategy
has changed between versions, which makes some data slower and some
data faster. I think that if it's faster "on average" and the speed-up
of some is comparable to the slow-down of others, that it's a
reasonable change - it sounds like you're just unlucky in your data.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 29 '06 #8
> I believe I've read before that the ArrayList.Sort pivoting strategy
has changed between versions, which makes some data slower and some
data faster. I think that if it's faster "on average" and the speed-up
of some is comparable to the slow-down of others, that it's a
reasonable change - it sounds like you're just unlucky in your data.


I'd love to see some pointers showing that its the case.

IMO its totally unacceptable to have such an unjustifiable slowdown, as I am
actually getting more than 10 times slower sorts. The only reason I noticed
them because code that run fast under .NET 1.1 actually was like a dog under
..NET 2, and even though 90% of data sets I have get sorted fast, the
remaining 10% were killing performance so much that I had to avoid using
ArrayList. I totally appreciate that some data takes more time to sort,
however the different between sorting calls should not be that dramatic.

The "luck" should not be factor between ArrayList.Sort and List<string>.Sort
and Array.Sort - some overheads - yes, but not THAT much, clearly something
is seriously wrong.

I would really like to see some independent confirmation of timings: please
use my data files as supplied above.

regards,

Alex
Apr 29 '06 #9
Ok, I got the real good test for you, this is the actual slowdown that I
experienced in my app, the test before was not as obvious as it is now.

Source code:

Data file (you need to use it, don't use your data files as error seems to
be data dependent!):

-----------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29972 lines from file slow_data.txt
Time to sort strings in Generic List is: 171 msec
Time to sort strings in ArrayList is: 33343 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------

Notice ArrayList sorting time? Same data but its actually 200 (!) TIMES
slower!

Here is same code (apart from Generic List that is only prevent in .NET 2.0)
run on .NET 1.1:

-----------------------------------------------
Loaded 29972 lines from file slow_data.txt
Time to sort strings in ArrayList is: 187 msec
Time to sort strings in string[] array is: 187 msec
Press ENTER to exit
-----------------------------------------------

Notice that all works fine in .NET 1.1, its same machine, same everything.

So, I would appreciate anyone to try the code above and data file to run on
their PC to verify if you get same much slower performance from ArrayLists
for _that_ data file (ignore all other files you may have).

This is serious stuff - 200 times slowdown for 10% of my real life data
cases meant overall slowdown of around 20 times: this is a killer for legacy
..NET 1.1 apps that may mysteriously start using much more CPU.
Apr 29 '06 #10
My bad forgot to repost links (source and data were updated, so please
download it again):

Source code: http://majestic12.co.uk/files/other/...bug/program.cs

Data file (you need to use it, don't use your data files as error seems to
be data dependent!):
http://majestic12.co.uk/files/other/.../slow_data.txt
Apr 29 '06 #11
From
http://msdn.microsoft.com/netframewo...ngsinNET20.asp
try using ArrayList.Sort(StringComparer.Ordinal) ..

with your data ... in above example ...

Running under .NET v2.0.50727.42
Loaded 29972 lines from file C:\slow_data.txt
Time to sort strings in Generic List is: 281 msec
Time to sort strings in ArrayList is: 64281 msec
Time to sort strings in ArrayList with Ordinal is: 140 msec
Time to sort strings in string[] array is: 265 msec
Press ENTER to exit

code added was ..

//bad data is already sorted (worst case) so just resort it again

oTime = DateTime.Now;

oLines.Sort(StringComparer.Ordinal);

Console.WriteLine("Time to sort strings in ArrayList with Ordinal is: {0}
msec", (DateTime.Now.Ticks - oTime.Ticks) / TimeSpan.TicksPerMillisecond);

Cheers,

Greg Young

Visual C# MVP
"Alex Chudnovsky" <Al************@discussions.microsoft.com> wrote in
message news:0D**********************************@microsof t.com...
My bad forgot to repost links (source and data were updated, so please
download it again):

Source code: http://majestic12.co.uk/files/other/...bug/program.cs

Data file (you need to use it, don't use your data files as error seems to
be data dependent!):
http://majestic12.co.uk/files/other/.../slow_data.txt

Apr 29 '06 #12
Ah they weren't completely sorted .. still huge gain ...

Running under .NET v2.0.50727.42
Loaded 29972 lines from file C:\slow_data.txt
Time to sort strings in Generic List is: 296 msec
Time to sort strings in ArrayList with Ordinal is: 8593 msec
Time to sort strings in ArrayList is: 1046 msec
Time to sort strings in string[] array is: 281 msec
Press ENTER to exit

Cheers,

Greg
"Greg Young" <Dr*************@hotmail.com> wrote in message
news:up****************@TK2MSFTNGP03.phx.gbl...
From
http://msdn.microsoft.com/netframewo...ngsinNET20.asp
try using ArrayList.Sort(StringComparer.Ordinal) ..

with your data ... in above example ...

Running under .NET v2.0.50727.42
Loaded 29972 lines from file C:\slow_data.txt
Time to sort strings in Generic List is: 281 msec
Time to sort strings in ArrayList is: 64281 msec
Time to sort strings in ArrayList with Ordinal is: 140 msec
Time to sort strings in string[] array is: 265 msec
Press ENTER to exit

code added was ..

//bad data is already sorted (worst case) so just resort it again

oTime = DateTime.Now;

oLines.Sort(StringComparer.Ordinal);

Console.WriteLine("Time to sort strings in ArrayList with Ordinal is: {0}
msec", (DateTime.Now.Ticks - oTime.Ticks) / TimeSpan.TicksPerMillisecond);

Cheers,

Greg Young

Visual C# MVP
"Alex Chudnovsky" <Al************@discussions.microsoft.com> wrote in
message news:0D**********************************@microsof t.com...
My bad forgot to repost links (source and data were updated, so please
download it again):

Source code: http://majestic12.co.uk/files/other/...bug/program.cs

Data file (you need to use it, don't use your data files as error seems
to
be data dependent!):
http://majestic12.co.uk/files/other/.../slow_data.txt


Apr 29 '06 #13
But Greg, I know I can change code and make it fast - I just use List
(string) or just string[] and then Array.Sort, the issue is that in .NET 2.0
ArrayList.Sort in some cases (as shown above) will sort many many times
slower than same code in .NET 1.1, of course rewriting code will solve the
problem, but I think its a very serious issue that deserves investigating -
if you do not have source code then some .NET apps that run fine in .NET 1.1
will run very slow under .NET 2.0, I can appreciate some slowdown but 200 (!)
slower is just way too slow.

So, my question is not really how to work around this - I already did, but
to get some attention from Microsoft folk to see exactly _WHY_ in this
situation ArrayList.Sort was sooooooooooooo slow - it is clearly a bug of
some kind that needs to be addressed.
Apr 30 '06 #14
I just escalated things based upon the following code ...

object[] Items = oLines.ToArray();

object[] Items2 = oLines.ToArray();

DateTime oTime = DateTime.Now;

oTime = DateTime.Now;

Array.Sort(Items, 0, Items.Length, Comparer.Default);

Console.WriteLine("Time to sort with Default Comparer: {0} msec",
(DateTime.Now.Ticks - oTime.Ticks) / TimeSpan.TicksPerMillisecond);

oTime = DateTime.Now;

Array.Sort(Items2, 0, Items.Length, null);

Console.WriteLine("Time to sort with Null Comparer: {0} msec",
(DateTime.Now.Ticks - oTime.Ticks) / TimeSpan.TicksPerMillisecond);

oTime = DateTime.Now;

"Alex Chudnovsky" <Al************@discussions.microsoft.com> wrote in
message news:0D**********************************@microsof t.com...
But Greg, I know I can change code and make it fast - I just use List
(string) or just string[] and then Array.Sort, the issue is that in .NET
2.0
ArrayList.Sort in some cases (as shown above) will sort many many times
slower than same code in .NET 1.1, of course rewriting code will solve the
problem, but I think its a very serious issue that deserves
investigating -
if you do not have source code then some .NET apps that run fine in .NET
1.1
will run very slow under .NET 2.0, I can appreciate some slowdown but 200
(!)
slower is just way too slow.

So, my question is not really how to work around this - I already did, but
to get some attention from Microsoft folk to see exactly _WHY_ in this
situation ArrayList.Sort was sooooooooooooo slow - it is clearly a bug of
some kind that needs to be addressed.

Apr 30 '06 #15
I added your code and here is the output I get:

-----------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29972 lines from file slow_data.txt
Time to sort strings in Generic List is: 187 msec
Time to sort strings in ArrayList is: 36609 msec
Time to sort strings in string[] array is: 156 msec
Time to sort with Default Comparer: 156 msec
Time to sort with Null Comparer: 156 msec
Press ENTER to exit
-----------------------------------------------

All good apart from ArrayList.Sort - I did some profiling and it seems that
sorting is repeated many many times, sadly I do not have source code to see
what's exactly wrong - I am very curious to find out because there has got to
be some reason for sorting to run _that_ slow - personally I am fine with it
as I am aware of the problem and have workaround, the only reason I posted is
to try to help others who may get gobsmacked by the slow performance of the
code that run just fine under .NET 1.1.
Apr 30 '06 #16
Where did you add the clones?

After the arraylist sort by chance :)

Cheers,

Greg
"Alex Chudnovsky" <Al************@discussions.microsoft.com> wrote in
message news:9F**********************************@microsof t.com...
I added your code and here is the output I get:

-----------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29972 lines from file slow_data.txt
Time to sort strings in Generic List is: 187 msec
Time to sort strings in ArrayList is: 36609 msec
Time to sort strings in string[] array is: 156 msec
Time to sort with Default Comparer: 156 msec
Time to sort with Null Comparer: 156 msec
Press ENTER to exit
-----------------------------------------------

All good apart from ArrayList.Sort - I did some profiling and it seems
that
sorting is repeated many many times, sadly I do not have source code to
see
what's exactly wrong - I am very curious to find out because there has got
to
be some reason for sorting to run _that_ slow - personally I am fine with
it
as I am aware of the problem and have workaround, the only reason I posted
is
to try to help others who may get gobsmacked by the slow performance of
the
code that run just fine under .NET 1.1.

Apr 30 '06 #17
"Greg Young" wrote:
Where did you add the clones?
After the arraylist sort by chance :)


You got me, I should not work till that late (almost 3am here in the UK) :(

I now put ToArray() before sorting or ArrayList takes place, and here is the
output:

------------------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29972 lines from file slow_data.txt
Time to sort strings in Generic List is: 187 msec
Time to sort strings in ArrayList is: 36562 msec
Time to sort strings in string[] array is: 171 msec
Time to sort with Default Comparer: 36703 msec
Time to sort with Null Comparer: 156 msec
Press ENTER to exit
------------------------------------------------------

So, it seems default comparer has got the issue of some kind. I am very
tempted to split data into chunks and try to find out if its specific strings
that result in slow performance, but too tired to do it now, at the moment I
am satisfied that its not just a weird issue experienced by me alone :)

Apr 30 '06 #18
Alex,
Thanks for that Marcus - but from what I can see you have not tried
it with my data that I linked to, specifically the following that
seems slower than usual:
http://majestic12.co.uk/files/other/.../slow_data.txt

with your data, the ArrayList.Sort() is really much slower, even though
mine seems to be not that bad:

-----------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29989 lines from file slow_data.txt
Time to sort strings in Generic List is: 203 msec
Time to sort strings in ArrayList is: 5218 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------

What I think, the problem lies in the string comparison, as many of your
strings have the same lets say 20-30 letters at the beginning, an the
real comparison between two strings can only be found after comparing
the first letters (being the same). In Framework 2.0 they have done a
lot with globalization and localization etc., maybe deep inside there,
they use another comparer or have rewritten the comparer to work
correctly with more settings and more cultures. Maybe you try to write
an own comparer, if it is slow or fast does not matter in the first
case, but just to make sure, if the ArrayList.Sort or the
Comparer.Compare method is the problem. If the latter, you can implement
an own Comparer, if the first, you might implement your own sorting
algorithm (which is not really fine)... and the last solution could be:
ArrayList.ToString(), the sort the string-array, which is fast on both
framework versions, and convert back to an ArrayList (ugly, but it might
be ok with execution time).
Finally, what came to my mind (not a direct solution to the speed, but):

Can you change your application to sort the text-file once (e.g. at
system startup or at button click of a user), then write the sorted text
to a file, and then you only have to read the file and it is sorted...
it might prevent you from more sleepless night ;-)..

have a good day!
Markus
Apr 30 '06 #19
Alex Chudnovsky <Al************@discussions.microsoft.com> wrote:

<snip>
So, it seems default comparer has got the issue of some kind. I am very
tempted to split data into chunks and try to find out if its specific strings
that result in slow performance, but too tired to do it now, at the moment I
am satisfied that its not just a weird issue experienced by me alone :)


Just to narrow things slightly (maybe - I'm not sure I've followed
everything) - I don't think it's the actual comparer that's at fault.

I've written a couple of "proxying" classes which just count the number
of comparisons which are done, proxying to a different implementation.
It also lets you proxy from a non-generic comparer to a generic one.
(The code is below.)

If you use the non-generic Array.Sort with Comparer.Default, it takes a
long time and *lots* of comparisons. If you use the Array.Sort<T> with
an IComparer<string> which proxies to Comparer.Default, it's fast and
uses 1/100th of the comparisons.

Not sure how much help all this is, but I would definitely suggest
opening a bug on the MSDN Product Feedback Center.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 30 '06 #20
Jon Skeet [C# MVP] wrote:
Alex Chudnovsky <Al************@discussions.microsoft.com> wrote:
Thanks for that Marcus - but from what I can see you have not tried
it with my data that I linked to, specifically the following that
seems slower than usual:
http://majestic12.co.uk/files/other/.../slow_data.txt

I have got thousands of such data files and about 10% of them
exhibit the problem, the rest get sorted in ArrayList pretty fast,
however SOME are much slower, yet if Array.Sort is used then they
are all fast, hence my assertion that its data dependent and that's
why I provided links to actual data to help replicate it, please
try it and post here results, thanks!

The issue appears to be really data dependent, which is probably
why it may have escaped scrutiny during normal tests.


I believe I've read before that the ArrayList.Sort pivoting strategy
has changed between versions, which makes some data slower and some
data faster. I think that if it's faster "on average" and the
speed-up of some is comparable to the slow-down of others, that it's
a reasonable change - it sounds like you're just unlucky in your data.


Though that's IMHO irrelevant here, as ArrayList.Sort simply calls
Array.Sort(items... ) under the hood, which means that it should
resolve in teh same results as a call to Array.Sort()

(with the exception ofcourse that the call to Array.Sort in this
particular example isn't equal to the ArrayList.Sort due to the
differences in parameters and therefore can't be compared.)

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Apr 30 '06 #21
Alex Chudnovsky wrote:
"Frans Bouma [C# MVP]" wrote:
I.o.w.: your sorts aren't exactly the same: the array sort is
sorting an array of strings. The arraylist sort is sorting an array
of objects using a comparer.


I am aware of that - I mentioned boxing/unboxing overheads. The
issues are: a) its a LOT slower than in .NET 1.1 <-- this alone
makes it unacceptable b) its DRAMATICALLY slower with some data


NO. There's no boxing involved but you ignore the fact that the
Array.Sort() call YOU make is completely different from the
Array.Sort() call the ArrayList.Sort() makes as the ArrayList uses a
general comparer, not a string specific comparer!
Strange, as ArrayList.Sort uses Array.Sort() :D (use reflector, and
see for yourself). So it's not the sort that's flawed, as it's
precisely the same routine that gets executed.


I can't say for sure what's flawed but something certainly is, its no
so much slower performance of ArrayList.Sort that bothers me, its
much slower performance than same code for .NET 1.1,


This is logical, as the comparers for strings etc. have changed in
..NET 2.0.

I've a tight schedule so I won't run your code, though if you pass a
string specific comparer to the ArrayList.Sort call, you'll likely see
that the results are vastly improving.

That's what I was trying to illustrate. Your calls show different
results, and that's understandable because they're actually doing
completely different things, however you insist that they're the same.
Well, they're not.

FB

ps: majestic12... the old amiga group?
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Apr 30 '06 #22
Frans Bouma [C# MVP] <pe******************@xs4all.nl> wrote:
I believe I've read before that the ArrayList.Sort pivoting strategy
has changed between versions, which makes some data slower and some
data faster. I think that if it's faster "on average" and the
speed-up of some is comparable to the slow-down of others, that it's
a reasonable change - it sounds like you're just unlucky in your data.


Though that's IMHO irrelevant here, as ArrayList.Sort simply calls
Array.Sort(items... ) under the hood, which means that it should
resolve in teh same results as a call to Array.Sort()

(with the exception ofcourse that the call to Array.Sort in this
particular example isn't equal to the ArrayList.Sort due to the
differences in parameters and therefore can't be compared.)


Yup - unless you provide the same parameters (the important one being
the IComparer), at which point you get the same results - if you use
Comparer.Default, you get far, far more comparisons.

For there to be more comparisons, of course, presumably the comparisons
must be returning different results, although I can't see why. Hmm.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 30 '06 #23
Alex,

I made this test to see what it was with equal objects (of course in VB in
my case)

Your file is sorted. If I turn that file around I get a quit normal figurs.
However as it is already sorted, it takes an extreme long time. This I call
a bug.

\\\
Public Class Test
Public Shared Sub Main()
Dim b As New ArrayList
Dim sr As New IO.StreamReader("C:\slow_data.txt")
Dim line As String = sr.ReadLine
Do While line IsNot Nothing
b.Add(line)
line = sr.ReadLine
Loop
Dim alist As New ArrayList
' For i As Integer = 0 To b.Count - 1
For i As Integer = b.Count - 1 To 0 Step -1
alist.Add(b(i))
Next
Dim afixed(alist.Count - 1) As Object
alist.CopyTo(afixed)
Dim start As Integer = Environment.TickCount
alist.Sort()
Dim AListtime As Integer = Environment.TickCount - start
start = Environment.TickCount
Array.Sort(afixed)
Dim Afixedtime As Integer = Environment.TickCount - start
MessageBox.Show("Arraylist = " _
& AListtime & vbCrLf & "fixed = " & Afixedtime)
End Sub
End Class
///

Cor

"Alex Chudnovsky" <Alex Ch********@discussions.microsoft.com> schreef in
bericht news:C5**********************************@microsof t.com...
I have come across with what appears to be a significant performance bug in
.NET 2.0 ArrayList.Sort method when compared with Array.Sort on the same
data. Same data on the same CPU gets sorted a lot faster with both methods
using .NET 1.1, that's why I am pretty sure its a (rather serious) bug.
Below
you can find C# test case that should allow you to reproduce this error,
to
run it you will need to put 2 data files into current directory where
executable is or just change filename pathes accordingly, the data files
can
be obtained from here:

fast_data.txt:
http://majestic12.co.uk/files/other/.../fast_data.txt
slow_data.txt:
http://majestic12.co.uk/files/other/.../slow_data.txt

The data are strings (URLs) of about similar size in bytes and number.

The following are the console outputs from code on the same machine (AMD
x2
3800, 2 GB RAM, XP Pro SP2):

VS2003 .NET 1.1 (with SP1) run:

-----------------------------------------------------------
Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 234 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 250 msec
-----------------------------------------------------------

Note that sorting times are almost exactly the same here, so all good in
.NET 1.1 .

-----------------------------------------------------------
VS2005 .NET 2.0 run:

Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 1531 msec
Time to sort strings in string[] array is: 187 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 703 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------------------

Notice that on the same machine with the same data sorting times are MUCH
slower in ArrayList.Sort, and particularly for the "slow_data.txt" file,
Array.Sort times are actually better, so I am not complaining there, but
clearly ArrayList sorts are seriously flawed - this appears to be data
dependent and by that I don't mean size of the data or number of strings:
I
have got lots of such data files and about every 10th of them is 10-20
times
slower than the other even though it has got about the same number of
lines
and bytesize.

Note: I am aware of boxing/unboxing overheads when dealing with
ArrayLists,
however in this case the slowdown is really bad comparing to .NET 1.1 and
it
appears to be data dependent - I am getting it on about 10% of my dataset
from which I have selected 2 examples (slow and fast) to demonstrate that
its
a very serious performance issue.

Here is the code that should allow you to replicate the problem:

//////////////////////////////////////////////////////////////////////////
using System;
using System.Collections;
using System.IO;

/*
This is a test case of what appears to be a major performance issue in
ArrayList.Sort method for strings
in .NET 2.0 - it appears to be data dependant as some similarly sized
data files have got a lot less
performance penalty when sorting them using Array.Sort method on
string[] array.

The kicker: both versions run fast in .NET 1.1

Author: Alex Chudnovsky <al***@majestic12.co.uk>
Date: 28 Apr 2006
*/
namespace Majestic12
{
/// <summary>
/// Tests sorting performance of Array.Sort of string[] versus
ArrayList.Sort of the same strings
/// It appears that in .NET 2.0 in some cases ArrayList will take a LOT
more time to do the sorting
/// </summary>
class SlowArrayListSortTest
{
static void Main(string[] args)
{
// load strings from file: assumed to be in the same place as
the executable
TestFile("slow_data.txt"); // <--- this data file has got 10
times slower
TestFile("fast_data.txt"); // <--- more reasonable 2 times
slower

Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void TestFile(string sFile)
{
FileStream oFS=File.OpenRead(sFile);

ArrayList oLines=new ArrayList();

StreamReader oSR=new StreamReader(oFS);

while(oSR.BaseStream.Position<oSR.BaseStream.Lengt h)
{
oLines.Add(oSR.ReadLine());
}

oFS.Close();

Console.WriteLine("Loaded {0} lines from file
{1}",oLines.Count,sFile);

// now copy same strings into string array for speed
comparisons
string[] sLines=new string[oLines.Count];

for(int i=0;i<sLines.Length;i++)
sLines[i]=(string)oLines[i];

DateTime oTime=DateTime.Now;
oLines.Sort();
Console.WriteLine("Time to sort strings in ArrayList is: {0}
msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);

oTime=DateTime.Now;
Array.Sort(sLines);
Console.WriteLine("Time to sort strings in string[] array is:
{0} msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);
}
}
}

//////////////////////////////////////////////////////////////////////////

Apr 30 '06 #24
Thanks for the feedback!

The files that I sort are indeed partially sorted - in some cases they _may_
be fully sorted (but I don't know that so I have to sort them), and of course
I can and did change my sorting code to avoid this bug, that's not the
problem: I already have workaround (use Array.Sort), the only reason I posted
this is for the benefit of others in hope to replicate this issue and bring
Microsoft's attention to get it fixed: there is just no excuse for sorter to
take so much more time than in .NET 1.1, hence it affects legacy software
that may not be changed due to lack of source, thus fix is clearly needed in
the .NET runtime.

Frans Bouma:
Your calls show different results, and that's understandable because
they're actually doing completely different things, however you insist that
they're the same. Well, they're not.


Try running same code under .NET 1.1 - you will see that ArrayList.Sort
performance there just fine, however it is a dog in .NET 2.0, thus I _am_
doing the _same_ things by executing software that was well debugged and
worked fine under .NET 1.1, however it was slowed down to crawl (which is
what made me investigate) under .NET 2.0. Workaround is not an issue - I
already have done it before I posted test case, my motivations for doing it
explained above.

Markus: my applications deals with billions of urls that need to be sorted,
those files are partially sorted elsewhere, and in some cases _may_ be fully
sorted, I however do not have such guarantee thus need to do another sort
(this will go away but not immediately), in any case I said many times
workaround is NOT an issue: I already have one, thanks!

I actually think I know where the bug may be: on first sight it appears that
its Comparer.Default that's at fault here: with it even Array.Sort takes long
time, however I think the issue may be in the actual sorting algorithm that
involves custom comparer in principle: I have changed code to do Array.Sort
with my own comparer and the sorting is still slow: number of comparisons
made is 147,408,713 <--- this is way too many and is primary reason for
slowdown: excessive comparisons. Updated source code for this test can be had
here:
http://majestic12.co.uk/files/other/...bug/program.cs

I think this allows to narrow down problem to the actual issue in Array.Sort
with custom Comparer - perhaps ArrayList was changed to use Default Comparer
rather than just do a call to sort like it (probably) did in .NET 1.1, which
is what made ArrayList slow, but in reality problem is deeper. The
interesting thing is that Array.Sort with null comparer is not affected, this
leads me to believe that there are ought to be 2 sorting paths, one of which
(no comparer present) is correct, however the other one is not correct for
cases when data is partially or fully sorted. _BOTH_ code paths should use
the same logic, the only difference between them should be call to supplied
custom Comprarer. Indeed, if one with null comparer does job very fast then
its no excuse for the other one to fail so miserably in this case. Hence, the
problem is in the runtime code.

I am inclined to post a simplified test case as a .NET 2 bug in Array.Sort
with custom comparer provided (and by extention all ArrayList Sorts). Does
anyone disagree with my conclusions and next action, and can someone point me
in the best direction where this bug report will actually be taken advantage
of?
Apr 30 '06 #25
I've located what is happening. When you call Array.Sort(string) it is
actually calling Array<string>.Sort(string).

This uses the ArraySortHelper<string>.Default.Sort method.

On the other hand, ArrayList is calling Array.Sort which in turn seems
to be using Array+SortObjectHelper.QuickSort which is where the slow
down is occuring.

The only differerce to these two calls is that Array+SortObjectHelper
makes three extra comparisons to decide upon whether the left, middle,
or right value should be used as a pivot.

Tried a test running the same quick sort object helper uses but without
the call to GetPivotValue and it runs in the same time the other sorts
did (140ms compared to 13,281ms).

I then tried a test running the comparisons used in get pivot but always
returning the middle value as the pivot (the same way ArraySortHelper
works) and the method still ran at around 140ms.

Seems to have something with it's choice of pivot as if you only ever
pivot around the middle index each time it works as you would expect
(this is infact the same behavior as ArraySortHelper).

The string comparisons have no effect as both sort methods are using the
same comparisons.

The only difference I found was this choice of pivot. Leaving everything
else in there, but having it always use the middle index as the pivot
produced the expected results.

Must be something about the pattern of your data that just happens to
cause it to fall into an O(n2) pattern.

Was looking at the pattern here:
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

But the quicksort used doesn't seem to quite be median of three, and
your data doesn't follow that pattern. Strangly the bad data in this
case is infact the more sorted data. There is one spike though, this
element is near the start and is moved to near the end, could this be
causing the quicksort to break down?
Apr 30 '06 #26
Chris Chilvers wrote:
Must be something about the pattern of your data that just happens to
cause it to fall into an O(n2) pattern.


I agree that its data dependent and indeed ironically the slowdown seems
to happen with partially or fully sorted data (one might naively assume
it would make secondary sorting faster, not slower): in my case I do not
know in advance if data is fully or partially sorted, thus have to sort
again, however this is not really relevant - more importantly if it
works fast in .NET 1.1 then it should not be much slower in .NET 2.0.

If _all_ other .NET sorting methods (and its quicksort everywhere,
right?) were producing the same slow result on the same data, then I
could have accepted it, however there is really no justification for
such poor performance that appears to happen only if you specify
non-null comparer: this seems to lead to another code path (rightfully
so, since it allows to avoid check for custom comparer inside loop), but
that codepath is not as optimal as the other one.

In effect I assert here that the following code will use different code
paths and one of them is broken:

1) Array.Sort(Items,0,Items.Length,null);

// specify any comparer even your own and performance is much slower
// due to more comparisons
2) Array.Sort(Items,0,Items.Length,Comparer.Default);

Note here: we can stop talking about ArrayList since the problem seems
to be located in Array.Sort with non-null comparer, which I presume is
not null in case of ArrayList.Sort, but since we pass string[] directly
to Array.Sort here it means that your theory about
Array+SortObjectHelper.QuickSort does not (or should not?) apply.

I see no reason why mere presense of custom comparer (so long as it
compares in the same way) should make any difference on sorting.

I know that source code of .NET 1.1 is available and can be checked, but
can the same be done for .NET 2.0, anyone got any pointers?
regards,

Alex
Apr 30 '06 #27
On Sun, 30 Apr 2006 16:52:41 +0100, Alex Chudnovsky
<al***@majestic12.co.uk> wrote:

again, however this is not really relevant - more importantly if it
works fast in .NET 1.1 then it should not be much slower in .NET 2.0.
This seems to be a limitation to the quick sort algorithm in general.
Might be the old algorithm they were using worked slower on average with
other datasets? You're just unluck to now have a dataset that the new
version doesn't like. You could have had a dataset that sorted slow in
the 1.1 implementation and now sorts fast.

My only trouble with this, is if the new version is on average faster
why does the generic methods still sort using an old algorithm and
didn't use the new one?

If _all_ other .NET sorting methods (and its quicksort everywhere,
right?) were producing the same slow result on the same data, then I
could have accepted it, however there is really no justification for
such poor performance that appears to happen only if you specify
non-null comparer: this seems to lead to another code path (rightfully
so, since it allows to avoid check for custom comparer inside loop), but
that codepath is not as optimal as the other one.

In effect I assert here that the following code will use different code
paths and one of them is broken:

1) Array.Sort(Items,0,Items.Length,null);

// specify any comparer even your own and performance is much slower
// due to more comparisons
2) Array.Sort(Items,0,Items.Length,Comparer.Default);
not quite Array.Sort(string[] items) actually calls:
Array<string>.Sort(string[] items)
which in turn calls
Array<string>.Sort(items, 0, items.Length-1, Comparer<string>.Default)

Using Comparer.Default stops the call to Array.Sort(string[] items) from
being changed to the generic Array<string>.Sort(string[] items)

Hence why it is using a different code path as Array<string>.Sort and
Array.Sort use different quick sort algorithims.

Array.Sort uses Array+SortObjectHelper to sort the list
Array<T>.Sort uses ArraySortHelper to sort the list

The performance difference between the two lies in their choice of pivot
value.

ArraySortHelper just takes, (left + ((right - left) / 2), thus always
the middle value.

Array+SortObjectHelper compares the left value, middle value and right
value to decide which to use (median of three?).
I see no reason why mere presense of custom comparer (so long as it
compares in the same way) should make any difference on sorting.
It isn't directly, its how the compiler deals with generic types and
automagicaly making a call to Array<T> when it recognises the
parameters.

I know that source code of .NET 1.1 is available and can be checked, but
can the same be done for .NET 2.0, anyone got any pointers?


Reflector can look at both .NET 1.1 and .NET 2.0
Apr 30 '06 #28

Its getting hotter, or "better" and "better" if you like :)

I know some of you were not happy about two things in my original test case:
a) that my strings were already sorted (why sort again?)
b) possible changes in string comparisons in .NET 2

Ok, lets look at more generic example, using Array.Sort on integers in a
self contained test that does not rely on external data, here is output
from new test that only uses Array.Sort on integer array with numbers
that were generated in the test itself:

-----------------------------------------------
Running under .NET v2.0.50727.42
Sorting 1000000 elements, already sorted: NO
Time to sort with null: 46 msec
Time to sort with default comparer: 46 msec
Time to sort with custom comparer: 9734 msec, compares: 20572897
-----------------------------------------------

Note: already sorted here means that numbers were actually sorted but in
reverse order, ie descending, where as out sort rearranges them to be in
ascending order.

See how long it took to sort them in case of custom defined comparer
which actually does nothing else but calls method of Comparer.Default?

Now, here is indeed very interesting output from when you run it under
..NET 1.1:

-----------------------------------------------
Running under .NET v1.1.4322.2032
Sorting 1000000 elements, already sorted: NO
Time to sort with null comparer: 203 msec
Time to sort with default comparer: 218 msec
Time to sort with custom comparer: 4531 msec, compares: 19000036
-----------------------------------------------

Its also slow, albeit not as slow, in .NET 1.1!!! 8-/

This actually explains why I had some fairly slow running sorts with
custom comparers. Profiler appears to show in both cases that most of
time is spend in Array+SorterGenericArray.QuickSort(int left, int
right), or more specifically inside children of that method that I can't
see, but its not actually the custom comparison function that accounts
for less than 10% of the sorting time.

The key here is not the comparisons per se - but actual number of calls
to them: its very very very high, why would it be so unless that special
"generic array sort" code path is flawed?

Full code (includes earlier string test) here:
http://majestic12.co.uk/files/other/...bug/program.cs

Comment out #define DOTNET2 if you want it to run under .NET 1.1.

regards,

Alex
Apr 30 '06 #29
Chris Chilvers wrote:
The performance difference between the two lies in their choice of pivot
value.

ArraySortHelper just takes, (left + ((right - left) / 2), thus always
the middle value.
Array+SortObjectHelper compares the left value, middle value and right
value to decide which to use (median of three?).


But why have different choice of pivot values? It seems that
ArraySortHelper's way of doing things is better performance wise. IMO, a
call to a Sort routine should yield same performance for same data: just
because its generic object should not mean that potentially performance
of application can be degraded so badly.

I just posted message that seems to be fairly close to what you were
saying about those different sorters.

regards,

Alex
Apr 30 '06 #30
Interesting, I added a call to the generic compariter as well:

oTime = DateTime.Now;
Array.Sort(iData, 0, iData.Length, new
ComparerTest<int>());
Console.WriteLine("Time to sort with custom<int>
comparer: {0} msec, compares: {1}",
(DateTime.Now.Ticks - oTime.Ticks) /
TimeSpan.TicksPerMillisecond, ComparerTest<int>.iComparisonsNum);
CheckSortedOrder(iData);
and:
class ComparerTest<T> : IComparer<T> {
public static int iComparisonsNum = 0;

public int Compare(T a, T b) {
iComparisonsNum++;
return Comparer<T>.Default.Compare(a, b);
}
}
The results were:

Running under .NET v2.0.50727.42
Sorting 1000000 elements, already sorted: NO
Time to sort with null comparer: 46 msec
Time to sort with default comparer: 62 msec
Time to sort with custom<int> comparer: 796 msec, compares: 19000036
Time to sort with custom comparer: 9437 msec, compares: 20572897
Apr 30 '06 #31

Apparently relevant source code that is available here:
http://dotnet.di.unipi.it/content/ss...cs-source.html

Shows interesting internal call:

[MethodImplAttribute(MethodImplOptions.InternalCall )]
private static extern int TrySZSort(Array keys, Array items, int
left,int right);

It is presumably very fast and is only called by if Comparer is Default
or null.

Perhaps what we are having here is very poor performance of .NET code
versus very highly optimised sorting routine that is called in most
cases, but not when comparer is present.

What I think may have changed in .NET 2.0 is that this high performance
method was not called in all cases where it was in .NET 1.1 :-/
regards,

Alex
Apr 30 '06 #32
>-----------------------------------------------
Running under .NET v2.0.50727.42
Sorting 1000000 elements, already sorted: NO
Time to sort with null: 46 msec
Time to sort with default comparer: 46 msec
Time to sort with custom comparer: 9734 msec, compares: 20572897
-----------------------------------------------


I've located it, it seems with primative arrays it seems to call
TrySZSort

Running under .NET v2.0.50727.42
Sorting 1000000 elements, already sorted: YES
Time to sort with null comparer: 46 msec
Time to sort with default comparer: 46 msec
Time to sort with custom<int> comparer: 328 msec, compares: 0
Time to sort with custom comparer: 8937 msec, compares: 20572880
Time to sort with TrySZSort: 62 msec, retval = True
And the code:

oTime = DateTime.Now;
bool could_sort = (bool)
typeof(Array).InvokeMember("TrySZSort",
System.Reflection.BindingFlags.Static |
System.Reflection.BindingFlags.InvokeMethod |
System.Reflection.BindingFlags.Public |
System.Reflection.BindingFlags.NonPublic,
null, null, new object[] { iData, null, 0,
iData.Length - 1 });
Console.WriteLine("Time to sort with TrySZSort: {0}
msec, could_sort = {1}",
(DateTime.Now.Ticks - oTime.Ticks) /
TimeSpan.TicksPerMillisecond, could_sort);
if (could_sort) Checkcould_sortOrder(iData);

So now we have three code paths,

Array.Sort -> Array+SorterObjectArray
Array<T>.Sort -> ArraySortHelper<T>
Array.TrySZSort

Array.Sort will call TrySZSort so long as:
comparer != null && comparer != Comparer.Default

TrySZSort must be using native code and looking at the raw memory values
to avoid having to box it into an object.

From
http://discuss.develop.com/archives/...F=P&S=&P=33045

An SZArray is just an array with only one dimension and the lower bound
is 0.
Apr 30 '06 #33
Chris Chilvers wrote:
So now we have three code paths,

Array.Sort -> Array+SorterObjectArray
Array<T>.Sort -> ArraySortHelper<T>
Array.TrySZSort


There are seems to be 3 code paths in Array.Sort (non generic version):

1) TrySZSort - fastest, native code
2) Array+SorterObjectArray: slow .NET, but still not the worst case as
no GetValue/SetValue functions are used: only called if .NET can get
what appears to be pointer to that array of some kind, using the
following code: Object[] objKeys = keys as Object[];
3) Array+SorterGenericArray: last resort - slowest .NET version that
seems to be called in our case

It may well be that the reason it happens in ArrayList.Sort due to
non-Default comparer used which forces it to use slowest path, IMO, no
big reason for change of behavior to that present in .NET 1.1

All in all I am now getting better understanding how this sorting is
done, not sure now if I should submit a bug or not :-/

Alex
Apr 30 '06 #34
On Sun, 30 Apr 2006 18:55:10 +0100, Alex Chudnovsky
<al***@majestic12.co.uk> wrote:
Chris Chilvers wrote:
So now we have three code paths,

Array.Sort -> Array+SorterObjectArray
Array<T>.Sort -> ArraySortHelper<T>
Array.TrySZSort


There are seems to be 3 code paths in Array.Sort (non generic version):

1) TrySZSort - fastest, native code
2) Array+SorterObjectArray: slow .NET, but still not the worst case as
no GetValue/SetValue functions are used: only called if .NET can get
what appears to be pointer to that array of some kind, using the
following code: Object[] objKeys = keys as Object[];
3) Array+SorterGenericArray: last resort - slowest .NET version that
seems to be called in our case

It may well be that the reason it happens in ArrayList.Sort due to
non-Default comparer used which forces it to use slowest path, IMO, no
big reason for change of behavior to that present in .NET 1.1

All in all I am now getting better understanding how this sorting is
done, not sure now if I should submit a bug or not :-/

Alex


The trouble is implementing sorting whilst keeping it generic.
Unfortunatly the best sorting method often depends upon the data and how
sorted the data is already.

My only question is the choice of which value should be for the pivot.
This seems to be what slows it down the most with your original data.
Why don't both sorts use the same method of choosing a pivot? At least
your sort would then be consitantly slow :)
Apr 30 '06 #35
Chris Chilvers wrote:
My only question is the choice of which value should be for the pivot.
This seems to be what slows it down the most with your original data.


But the issue does not appear to be the pivot - its that native call is
not made for some reason in ArrayList.Sort in .NET 2.0 (at least for
strings): perhaps that secret native sorting method is also smarter at
pivots, we don't really know (and I am not desperate enough to get into
disassembly).

Its tempting for me to leave it as is :-/

regards,

Alex
Apr 30 '06 #36
On Sun, 30 Apr 2006 19:30:03 +0100, Alex Chudnovsky
<al***@majestic12.co.uk> wrote:
Chris Chilvers wrote:
My only question is the choice of which value should be for the pivot.
This seems to be what slows it down the most with your original data.


But the issue does not appear to be the pivot - its that native call is
not made for some reason in ArrayList.Sort in .NET 2.0 (at least for
strings): perhaps that secret native sorting method is also smarter at
pivots, we don't really know (and I am not desperate enough to get into
disassembly).

Its tempting for me to leave it as is :-/

regards,

Alex


That would probaly be because the string is not a primative object. I'd
guess that the TrySZSort is capable of sorting primatives such as
short/int/long/char/etc.
Apr 30 '06 #37
Chris Chilvers wrote:
That would probaly be because the string is not a primative object. I'd
guess that the TrySZSort is capable of sorting primatives such as
short/int/long/char/etc.


TrySZSort seems to work on strings: that's why (I think) sorting is fast
when you use Array.Sort on string array, what I think happens is that
ArrayList.Sort in .NET will specify Comparer that is not Default, this
makes Array.Sort method to avoid using TrySZSort because it can't run
with custom comparers, thus forcing down the route of much slower native
..NET implementation, and it seems that its the slowest (Generic sorter)
that actually runs rather than 2nd slowest (Object sorter).

It also raises question as to why would TrySZSort be _that_ much faster,
sure optimal use of registers etc, but perhaps it has got some
algorithmic changes that make it faster than the code that we can see?

The issue is kind of still here: strings loaded into ArrayList will
appear to get very different sorting treatment in .NET 2.0 rather than
that in .NET 1.1. If that's the true reason then I wonder whether it was
totally necessary to specify non-default comparer in ArrayList. It still
feels like we have not reached total clarity as to what happens inside :-/

Alex
Apr 30 '06 #38
I am not sure where you are coming up with this ..
not quite Array.Sort(string[] items) actually calls:
Array<string>.Sort(string[] items)
It seems it is being sorted by an unmanaged method called TrySZSort before
trying a managed sort.

Here is the relevant reflectorred code.

if ((length > 1) && (((comparer != Comparer.Default) && (comparer !=
null)) || !Array.TrySZSort(keys, items, index, (index + length) - 1)))
{
object[] objArray1 = keys as object[];
object[] objArray2 = null;
if (objArray1 != null)
{
objArray2 = items as object[];
}
if ((objArray1 != null) && ((items == null) || (objArray2 !=
null)))
{
Array.SorterObjectArray array1 = new
Array.SorterObjectArray(objArray1, objArray2, comparer);
array1.QuickSort(index, (index + length) - 1);
}
else
{
Array.SorterGenericArray array2 = new
Array.SorterGenericArray(keys, items, comparer);
array2.QuickSort(index, (index + length) - 1);
}
}
If the comparer is null or default it should be falling out to the unmanaged
code..

What is really interesting (and I included it with the test program I sent
in) is the call ...

oLines.Sort(0, oLines.Count - 1, null);

arraylist.sort calls .. Array.Sort(this._items, index, count, comparer);

Thus producing the exact same array.sort call I showed earlier which was
fast, yet it comes out as slow ...

Cheers,

Greg

"Chris Chilvers" <ke****@dynafus.com> wrote in message
news:94********************************@4ax.com... On Sun, 30 Apr 2006 16:52:41 +0100, Alex Chudnovsky
<al***@majestic12.co.uk> wrote:

again, however this is not really relevant - more importantly if it
works fast in .NET 1.1 then it should not be much slower in .NET 2.0.


This seems to be a limitation to the quick sort algorithm in general.
Might be the old algorithm they were using worked slower on average with
other datasets? You're just unluck to now have a dataset that the new
version doesn't like. You could have had a dataset that sorted slow in
the 1.1 implementation and now sorts fast.

My only trouble with this, is if the new version is on average faster
why does the generic methods still sort using an old algorithm and
didn't use the new one?

If _all_ other .NET sorting methods (and its quicksort everywhere,
right?) were producing the same slow result on the same data, then I
could have accepted it, however there is really no justification for
such poor performance that appears to happen only if you specify
non-null comparer: this seems to lead to another code path (rightfully
so, since it allows to avoid check for custom comparer inside loop), but
that codepath is not as optimal as the other one.

In effect I assert here that the following code will use different code
paths and one of them is broken:

1) Array.Sort(Items,0,Items.Length,null);

// specify any comparer even your own and performance is much slower
// due to more comparisons
2) Array.Sort(Items,0,Items.Length,Comparer.Default);


not quite Array.Sort(string[] items) actually calls:
Array<string>.Sort(string[] items)
which in turn calls
Array<string>.Sort(items, 0, items.Length-1, Comparer<string>.Default)

Using Comparer.Default stops the call to Array.Sort(string[] items) from
being changed to the generic Array<string>.Sort(string[] items)

Hence why it is using a different code path as Array<string>.Sort and
Array.Sort use different quick sort algorithims.

Array.Sort uses Array+SortObjectHelper to sort the list
Array<T>.Sort uses ArraySortHelper to sort the list

The performance difference between the two lies in their choice of pivot
value.

ArraySortHelper just takes, (left + ((right - left) / 2), thus always
the middle value.

Array+SortObjectHelper compares the left value, middle value and right
value to decide which to use (median of three?).
I see no reason why mere presense of custom comparer (so long as it
compares in the same way) should make any difference on sorting.


It isn't directly, its how the compiler deals with generic types and
automagicaly making a call to Array<T> when it recognises the
parameters.

I know that source code of .NET 1.1 is available and can be checked, but
can the same be done for .NET 2.0, anyone got any pointers?


Reflector can look at both .NET 1.1 and .NET 2.0

Apr 30 '06 #39
Hey guys I have already submitted a bug report based upon this ..

the exact behavior I submitted is what you are getting at now ...

Array.Sort(array,0,length, null) //fast
Array.Sort(array,0,length,DefaultComparer)//slow
ArrayList.Sort(0,length,null) //should generate same call as first (fast
generates slow call)

Cheers,

Greg

"Alex Chudnovsky" <al***@majestic12.co.uk> wrote in message
news:OP*************@TK2MSFTNGP02.phx.gbl...
Chris Chilvers wrote:
That would probaly be because the string is not a primative object. I'd
guess that the TrySZSort is capable of sorting primatives such as
short/int/long/char/etc.


TrySZSort seems to work on strings: that's why (I think) sorting is fast
when you use Array.Sort on string array, what I think happens is that
ArrayList.Sort in .NET will specify Comparer that is not Default, this
makes Array.Sort method to avoid using TrySZSort because it can't run with
custom comparers, thus forcing down the route of much slower native .NET
implementation, and it seems that its the slowest (Generic sorter) that
actually runs rather than 2nd slowest (Object sorter).

It also raises question as to why would TrySZSort be _that_ much faster,
sure optimal use of registers etc, but perhaps it has got some algorithmic
changes that make it faster than the code that we can see?

The issue is kind of still here: strings loaded into ArrayList will appear
to get very different sorting treatment in .NET 2.0 rather than that in
.NET 1.1. If that's the true reason then I wonder whether it was totally
necessary to specify non-default comparer in ArrayList. It still feels
like we have not reached total clarity as to what happens inside :-/

Alex

Apr 30 '06 #40
The StringComaparer did have an improvement Frans as I pointed out earlier
(64 seconds to 8 seconds here) .. that is however not the major problem. I
have placed a bug report on this already.
"Frans Bouma [C# MVP]" <pe******************@xs4all.nl> wrote in message
news:xn***************@news.microsoft.com...
Alex Chudnovsky wrote:
"Frans Bouma [C# MVP]" wrote:
> I.o.w.: your sorts aren't exactly the same: the array sort is
> sorting an array of strings. The arraylist sort is sorting an array
> of objects using a comparer.


I am aware of that - I mentioned boxing/unboxing overheads. The
issues are: a) its a LOT slower than in .NET 1.1 <-- this alone
makes it unacceptable b) its DRAMATICALLY slower with some data


NO. There's no boxing involved but you ignore the fact that the
Array.Sort() call YOU make is completely different from the
Array.Sort() call the ArrayList.Sort() makes as the ArrayList uses a
general comparer, not a string specific comparer!
> Strange, as ArrayList.Sort uses Array.Sort() :D (use reflector, and
> see for yourself). So it's not the sort that's flawed, as it's
> precisely the same routine that gets executed.


I can't say for sure what's flawed but something certainly is, its no
so much slower performance of ArrayList.Sort that bothers me, its
much slower performance than same code for .NET 1.1,


This is logical, as the comparers for strings etc. have changed in
.NET 2.0.

I've a tight schedule so I won't run your code, though if you pass a
string specific comparer to the ArrayList.Sort call, you'll likely see
that the results are vastly improving.

That's what I was trying to illustrate. Your calls show different
results, and that's understandable because they're actually doing
completely different things, however you insist that they're the same.
Well, they're not.

FB

ps: majestic12... the old amiga group?
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Apr 30 '06 #41
Greg Young wrote:
Hey guys I have already submitted a bug report based upon this ..

the exact behavior I submitted is what you are getting at now ...

Array.Sort(array,0,length, null) //fast
Array.Sort(array,0,length,DefaultComparer)//slow
ArrayList.Sort(0,length,null) //should generate same call as first (fast
generates slow call)


Okay, so what would happen now - bug will be checked and you will get
some kind of feedback on if/how/when it will be fixed? If so please
share it, for now I am satisfied with available workarounds.

cheers,

Alex
Apr 30 '06 #42
On Sun, 30 Apr 2006 16:40:06 -0400, "Greg Young"
<Dr*************@hotmail.com> wrote:
Array.Sort(array,0,length, null) //fast
ArrayList.Sort(0,length,null) //should generate same call as first (fast
generates slow call)

These two do not call the same method

The same as:
Array.Sort(string[] items)
Array.Sort(string[] items, 0, items.length-1, Comparer.Default)

Are not the same methods

the compiler sees Array.Sort(string[]) and realises it can instead call
the generic version Array.Sort<string>(...)

The same happens when you use the comparer as null. Because null does
not have a type it can match null to IComparer<string>, thus can then
call Array.Sort<string>(string[], int, int, IComparator<string) instead
of Array.Sort(string, int, int, IComparator)

Try calling:
Array.Sort(array, 0, length, (IComparer)null)
now we're giving null an explicit type so that we for it to use the
non-generic version of Sort.

I've tried calling TrySZSort with a string array and it cannot sort it.
And Array.Sort with a string uses ArrayObjectSorter.

==========================

// check null comparer here
oTime = DateTime.Now;
Array.Sort(Items2, 0, Items.Length, (IComparer)null);
Console.WriteLine("Time to sort with Null Comparer: {0}
msec",
(DateTime.Now.Ticks - oTime.Ticks) /
TimeSpan.TicksPerMillisecond);

Time to sort with Null Comparer: 29859 msec
With both Array.Sort and Array.Sort<T> you really have to becarful which
version is being called, as even if you don't specify a generic, the
compiler can infer it from the parameters.

Thus if you had the generic method:

void Push<T>(IList<T> list, params T items[])
....

IList<int> items = new IList<int>();

Push<int>(items, 1, 2 3);

OR

Push(items, 1, 2 3)

In the second instance the compiler sees that all the parameters match
for the required type <T> thus it infers the call to Push<int>, even if
there is also a non generic method call Push as long as the signature
isn't an exact match.

To demonstrate:
static void Main(string[] args) {
Test<int>(1);
Test(1);
Test((object)1);
Test<string>("abc");
Test("abc");
Test((object)"abc");
Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void Test<T>(T value) {
Console.WriteLine("Test<T>(T), value = {0}", value);
}

static void Test(object value) {
Console.WriteLine("Test(object), value = {0}", value);
}

static void Test(string value) {
Console.WriteLine("Test(string), value = {0}", value);
}
OUTPUT:
Test<T>(T), value = 1
Test<T>(T), value = 1
Test(object), value = 1
Test<T>(T), value = abc
Test(string), value = abc
Test(object), value = abc
Press ENTER to exit
May 1 '06 #43
On Sun, 30 Apr 2006 16:10:21 -0400, "Greg Young"
<Dr*************@hotmail.com> wrote:
I am not sure where you are coming up with this ..
not quite Array.Sort(string[] items) actually calls:
Array<string>.Sort(string[] items)


Referer to 20.6.4 of
http://download.microsoft.com/downlo...2.doc#01000021
May 1 '06 #44
I understand type inference ...

Array<string>.Sort(string[] items) ..

Correct me if I am wrong but the last time I checked, array was not
implemented as a generic type.

Cheers,

Greg

20.6.4: When a generic method is called without specifying type arguments, a
type inference process attempts to infer type arguments for the call. The
presence of type inference allows a more convenient syntax to be used for
calling a generic method, and allows the programmer to avoid specifying
redundant type information. For example, given the method declaration:

class Util
{
static Random rand = new Random();

static public T Choose<T>(T first, T second) {
return (rand.Next(2) == 0)? first: second;
}
}

it is possible to invoke the Choose method without explicitly specifying a
type argument:

int i = Util.Choose(5, 213); // Calls Choose<int>

string s = Util.Choose("foo", "bar"); // Calls Choose<string>

Chris Chilvers" <ke****@dynafus.com> wrote in message
news:5u********************************@4ax.com...
On Sun, 30 Apr 2006 16:10:21 -0400, "Greg Young"
<Dr*************@hotmail.com> wrote:
I am not sure where you are coming up with this ..
not quite Array.Sort(string[] items) actually calls:
Array<string>.Sort(string[] items)


Referer to 20.6.4 of
http://download.microsoft.com/downlo...2.doc#01000021

May 1 '06 #45
Sorry I said that wrong ... Array.Sort is not implemented as a generic
method in this case.

also the array being passed (from an arraylist) will be object [] not string
[]

Cheers,

Greg
"Greg Young" <Dr*************@hotmail.com> wrote in message
news:O9**************@TK2MSFTNGP02.phx.gbl...
I understand type inference ...

Array<string>.Sort(string[] items) ..

Correct me if I am wrong but the last time I checked, array was not
implemented as a generic type.

Cheers,

Greg

20.6.4: When a generic method is called without specifying type arguments,
a type inference process attempts to infer type arguments for the call.
The presence of type inference allows a more convenient syntax to be used
for calling a generic method, and allows the programmer to avoid
specifying redundant type information. For example, given the method
declaration:

class Util
{
static Random rand = new Random();

static public T Choose<T>(T first, T second) {
return (rand.Next(2) == 0)? first: second;
}
}

it is possible to invoke the Choose method without explicitly specifying a
type argument:

int i = Util.Choose(5, 213); // Calls Choose<int>

string s = Util.Choose("foo", "bar"); // Calls Choose<string>

Chris Chilvers" <ke****@dynafus.com> wrote in message
news:5u********************************@4ax.com...
On Sun, 30 Apr 2006 16:10:21 -0400, "Greg Young"
<Dr*************@hotmail.com> wrote:
I am not sure where you are coming up with this ..

not quite Array.Sort(string[] items) actually calls:
Array<string>.Sort(string[] items)


Referer to 20.6.4 of
http://download.microsoft.com/downlo...2.doc#01000021


May 1 '06 #46
Chris ..

The two have identical signatures ...

ArrayList.Sort in turn calls ...

reflectorred :
public virtual void Sort(int index, int count, IComparer comparer)
{
if ((index < 0) || (count < 0))
{
throw new ArgumentOutOfRangeException((index < 0) ? "index" :
"count", Environment.GetResourceString("ArgumentOutOfRange_ NeedNonNegNum"));
}
if ((this._size - index) < count)
{
throw new
ArgumentException(Environment.GetResourceString("A rgument_InvalidOffLen"));
}
Array.Sort(this._items, index, count, comparer);
this._version++;
}
equates to ...
Array.Sort(object [], int, int, null)
original call was
Array.Sort(array,0,length, null) //fast
which is
Array.Sort(object [], int, int, null)

in both cases the array is of type object [].

Greg

"Chris Chilvers" <ke****@dynafus.com> wrote in message
news:2a********************************@4ax.com... On Sun, 30 Apr 2006 16:40:06 -0400, "Greg Young"
<Dr*************@hotmail.com> wrote:
Array.Sort(array,0,length, null) //fast
ArrayList.Sort(0,length,null) //should generate same call as first (fast
generates slow call)

These two do not call the same method

The same as:
Array.Sort(string[] items)
Array.Sort(string[] items, 0, items.length-1, Comparer.Default)

Are not the same methods

the compiler sees Array.Sort(string[]) and realises it can instead call
the generic version Array.Sort<string>(...)

The same happens when you use the comparer as null. Because null does
not have a type it can match null to IComparer<string>, thus can then
call Array.Sort<string>(string[], int, int, IComparator<string) instead
of Array.Sort(string, int, int, IComparator)

Try calling:
Array.Sort(array, 0, length, (IComparer)null)
now we're giving null an explicit type so that we for it to use the
non-generic version of Sort.

I've tried calling TrySZSort with a string array and it cannot sort it.
And Array.Sort with a string uses ArrayObjectSorter.

==========================

// check null comparer here
oTime = DateTime.Now;
Array.Sort(Items2, 0, Items.Length, (IComparer)null);
Console.WriteLine("Time to sort with Null Comparer: {0}
msec",
(DateTime.Now.Ticks - oTime.Ticks) /
TimeSpan.TicksPerMillisecond);

Time to sort with Null Comparer: 29859 msec
With both Array.Sort and Array.Sort<T> you really have to becarful which
version is being called, as even if you don't specify a generic, the
compiler can infer it from the parameters.

Thus if you had the generic method:

void Push<T>(IList<T> list, params T items[])
...

IList<int> items = new IList<int>();

Push<int>(items, 1, 2 3);

OR

Push(items, 1, 2 3)

In the second instance the compiler sees that all the parameters match
for the required type <T> thus it infers the call to Push<int>, even if
there is also a non generic method call Push as long as the signature
isn't an exact match.

To demonstrate:
static void Main(string[] args) {
Test<int>(1);
Test(1);
Test((object)1);
Test<string>("abc");
Test("abc");
Test((object)"abc");
Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void Test<T>(T value) {
Console.WriteLine("Test<T>(T), value = {0}", value);
}

static void Test(object value) {
Console.WriteLine("Test(object), value = {0}", value);
}

static void Test(string value) {
Console.WriteLine("Test(string), value = {0}", value);
}
OUTPUT:
Test<T>(T), value = 1
Test<T>(T), value = 1
Test(object), value = 1
Test<T>(T), value = abc
Test(string), value = abc
Test(object), value = abc
Press ENTER to exit

May 1 '06 #47
Greg Young wrote:
The StringComaparer did have an improvement Frans as I pointed out
earlier (64 seconds to 8 seconds here) .. that is however not the
major problem. I have placed a bug report on this already.
As both use Array.Sort(), I'd be surprised they will find this a bug.
As I said: you're not comparing 2 equal things.

FB


"Frans Bouma [C# MVP]" <pe******************@xs4all.nl> wrote in
message news:xn***************@news.microsoft.com...
Alex Chudnovsky wrote:
"Frans Bouma [C# MVP]" wrote:

> I.o.w.: your sorts aren't exactly the same: the array sort is
> sorting an array of strings. The arraylist sort is sorting an array >> > of objects using a comparer.
I am aware of that - I mentioned boxing/unboxing overheads. The
issues are: a) its a LOT slower than in .NET 1.1 <-- this alone
makes it unacceptable b) its DRAMATICALLY slower with some data


NO. There's no boxing involved but you ignore the fact that the
Array.Sort() call YOU make is completely different from the
Array.Sort() call the ArrayList.Sort() makes as the ArrayList uses a
general comparer, not a string specific comparer!
> Strange, as ArrayList.Sort uses Array.Sort() :D (use reflector, and >> > see for yourself). So it's not the sort that's flawed, as
it's >> > precisely the same routine that gets executed.
I can't say for sure what's flawed but something certainly is, its

no >> so much slower performance of ArrayList.Sort that bothers me,
its >> much slower performance than same code for .NET 1.1,

This is logical, as the comparers for strings etc. have changed in
.NET 2.0.

I've a tight schedule so I won't run your code, though if you pass a
string specific comparer to the ArrayList.Sort call, you'll likely
see that the results are vastly improving.

That's what I was trying to illustrate. Your calls show different
results, and that's understandable because they're actually doing
completely different things, however you insist that they're the
same. Well, they're not.

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
May 1 '06 #48
After reading some suggestions about the possibility of differring call
which I at first shrugged off then accepted I went and started munging
through the IL in question ..
slow ...
IL_0083: ldsfld class [mscorlib]System.Collections.Comparer
[mscorlib]System.Collections.Comparer::Default

IL_0088: call void [mscorlib]System.Array::Sort(class
[mscorlib]System.Array,

int32,

int32,

class [mscorlib]System.Collections.IComparer)

fast

IL_00c9: ldnull

IL_00ca: call void [mscorlib]System.Array::Sort<object>(!!0[],

int32,

int32,

class [mscorlib]System.Collections.Generic.IComparer`1<!!0>)

Changes to USE a generic?!?! given the null? This is very interesting ...

I saw the following ..

if (comparer == null)
{
comparer = Comparer.Default;
}
in reflector and figured that would be picked up later ... but I guess the
compiler realized it...

if we flip a little class such as ..
public class TestComparer : IComparer<object>

{

public int Compare(object x, object y)

{

return ((IComparable) x).CompareTo(y);

}

}

We can then force the first slow array method to use the generic version.
and it speeds up from 60 seconds to 300 ms

Now for the kicker ... we have ArrayList.Sort

The one with a parameter always uses the slow call

IL_0000: ldarg.0

IL_0001: ldc.i4.0

IL_0002: ldarg.0

IL_0003: callvirt instance int32 System.Collections.ArrayList::get_Count()

IL_0008: ldsfld class System.Collections.Comparer
System.Collections.Comparer::Default

IL_000d: callvirt instance void System.Collections.ArrayList::Sort(int32,

int32,

class System.Collections.IComparer)

IL_0012: ret

Parameterless

IL_004b: ldarg.3

IL_004c: call void System.Array::Sort(class System.Array,

int32,

int32,

class System.Collections.IComparer)

is the ArrayLists parameterless sort (which if it passed null would use the
fast version) and would be just as correct as passing Comparer.Default
(since that is the default anyways:))

The arraylist.Sort however is doomed to always be extremely slow as there is
no way for us to force this to happen :(
I am also questioning a bit whether there is another bug somewhere deeper
.... 60 seconds vs 250 ms is not a small problem

So after all of this I would say tht yes we have a bug since passing null in
ArrayList.Sort() instead of Comparer.Default would give us a net gain of
about 19,900% on my machine.

Cheers,

Greg

"Chris Chilvers" <ke****@dynafus.com> wrote in message
news:2a********************************@4ax.com...
On Sun, 30 Apr 2006 16:40:06 -0400, "Greg Young"
<Dr*************@hotmail.com> wrote:
Array.Sort(array,0,length, null) //fast
ArrayList.Sort(0,length,null) //should generate same call as first (fast
generates slow call)

These two do not call the same method

The same as:
Array.Sort(string[] items)
Array.Sort(string[] items, 0, items.length-1, Comparer.Default)

Are not the same methods

the compiler sees Array.Sort(string[]) and realises it can instead call
the generic version Array.Sort<string>(...)

The same happens when you use the comparer as null. Because null does
not have a type it can match null to IComparer<string>, thus can then
call Array.Sort<string>(string[], int, int, IComparator<string) instead
of Array.Sort(string, int, int, IComparator)

Try calling:
Array.Sort(array, 0, length, (IComparer)null)
now we're giving null an explicit type so that we for it to use the
non-generic version of Sort.

I've tried calling TrySZSort with a string array and it cannot sort it.
And Array.Sort with a string uses ArrayObjectSorter.

==========================

// check null comparer here
oTime = DateTime.Now;
Array.Sort(Items2, 0, Items.Length, (IComparer)null);
Console.WriteLine("Time to sort with Null Comparer: {0}
msec",
(DateTime.Now.Ticks - oTime.Ticks) /
TimeSpan.TicksPerMillisecond);

Time to sort with Null Comparer: 29859 msec
With both Array.Sort and Array.Sort<T> you really have to becarful which
version is being called, as even if you don't specify a generic, the
compiler can infer it from the parameters.

Thus if you had the generic method:

void Push<T>(IList<T> list, params T items[])
...

IList<int> items = new IList<int>();

Push<int>(items, 1, 2 3);

OR

Push(items, 1, 2 3)

In the second instance the compiler sees that all the parameters match
for the required type <T> thus it infers the call to Push<int>, even if
there is also a non generic method call Push as long as the signature
isn't an exact match.

To demonstrate:
static void Main(string[] args) {
Test<int>(1);
Test(1);
Test((object)1);
Test<string>("abc");
Test("abc");
Test((object)"abc");
Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void Test<T>(T value) {
Console.WriteLine("Test<T>(T), value = {0}", value);
}

static void Test(object value) {
Console.WriteLine("Test(object), value = {0}", value);
}

static void Test(string value) {
Console.WriteLine("Test(string), value = {0}", value);
}
OUTPUT:
Test<T>(T), value = 1
Test<T>(T), value = 1
Test(object), value = 1
Test<T>(T), value = abc
Test(string), value = abc
Test(object), value = abc
Press ENTER to exit

May 1 '06 #49

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: herbert | last post by:
1) I created a class Person whose objects are stored in collections. Using IComparable allows to define one sort criteria like name. Best practice: How can I add more sort criterias like age, zip,...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.