473,545 Members | 721 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

.NET 2.0 performance bug in ArrayList.Sort

I have come across with what appears to be a significant performance bug in
..NET 2.0 ArrayList.Sort method when compared with Array.Sort on the same
data. Same data on the same CPU gets sorted a lot faster with both methods
using .NET 1.1, that's why I am pretty sure its a (rather serious) bug. Below
you can find C# test case that should allow you to reproduce this error, to
run it you will need to put 2 data files into current directory where
executable is or just change filename pathes accordingly, the data files can
be obtained from here:

fast_data.txt: http://majestic12.co.uk/files/other/.../fast_data.txt
slow_data.txt: http://majestic12.co.uk/files/other/.../slow_data.txt

The data are strings (URLs) of about similar size in bytes and number.

The following are the console outputs from code on the same machine (AMD x2
3800, 2 GB RAM, XP Pro SP2):

VS2003 .NET 1.1 (with SP1) run:

-----------------------------------------------------------
Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 234 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 250 msec
-----------------------------------------------------------

Note that sorting times are almost exactly the same here, so all good in
..NET 1.1 .

-----------------------------------------------------------
VS2005 .NET 2.0 run:

Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 1531 msec
Time to sort strings in string[] array is: 187 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 703 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------------------

Notice that on the same machine with the same data sorting times are MUCH
slower in ArrayList.Sort, and particularly for the "slow_data. txt" file,
Array.Sort times are actually better, so I am not complaining there, but
clearly ArrayList sorts are seriously flawed - this appears to be data
dependent and by that I don't mean size of the data or number of strings: I
have got lots of such data files and about every 10th of them is 10-20 times
slower than the other even though it has got about the same number of lines
and bytesize.

Note: I am aware of boxing/unboxing overheads when dealing with ArrayLists,
however in this case the slowdown is really bad comparing to .NET 1.1 and it
appears to be data dependent - I am getting it on about 10% of my dataset
from which I have selected 2 examples (slow and fast) to demonstrate that its
a very serious performance issue.

Here is the code that should allow you to replicate the problem:

//////////////////////////////////////////////////////////////////////////
using System;
using System.Collecti ons;
using System.IO;

/*
This is a test case of what appears to be a major performance issue in
ArrayList.Sort method for strings
in .NET 2.0 - it appears to be data dependant as some similarly sized
data files have got a lot less
performance penalty when sorting them using Array.Sort method on
string[] array.

The kicker: both versions run fast in .NET 1.1

Author: Alex Chudnovsky <al***@majestic 12.co.uk>
Date: 28 Apr 2006
*/
namespace Majestic12
{
/// <summary>
/// Tests sorting performance of Array.Sort of string[] versus
ArrayList.Sort of the same strings
/// It appears that in .NET 2.0 in some cases ArrayList will take a LOT
more time to do the sorting
/// </summary>
class SlowArrayListSo rtTest
{
static void Main(string[] args)
{
// load strings from file: assumed to be in the same place as
the executable
TestFile("slow_ data.txt"); // <--- this data file has got 10
times slower
TestFile("fast_ data.txt"); // <--- more reasonable 2 times slower

Console.WriteLi ne("Press ENTER to exit");
Console.ReadLin e();
}

static void TestFile(string sFile)
{
FileStream oFS=File.OpenRe ad(sFile);

ArrayList oLines=new ArrayList();

StreamReader oSR=new StreamReader(oF S);

while(oSR.BaseS tream.Position< oSR.BaseStream. Length)
{
oLines.Add(oSR. ReadLine());
}

oFS.Close();

Console.WriteLi ne("Loaded {0} lines from file
{1}",oLines.Cou nt,sFile);

// now copy same strings into string array for speed comparisons
string[] sLines=new string[oLines.Count];

for(int i=0;i<sLines.Le ngth;i++)
sLines[i]=(string)oLines[i];

DateTime oTime=DateTime. Now;
oLines.Sort();
Console.WriteLi ne("Time to sort strings in ArrayList is: {0}
msec",(DateTime .Now.Ticks-oTime.Ticks)/TimeSpan.TicksP erMillisecond);

oTime=DateTime. Now;
Array.Sort(sLin es);
Console.WriteLi ne("Time to sort strings in string[] array is:
{0} msec",(DateTime .Now.Ticks-oTime.Ticks)/TimeSpan.TicksP erMillisecond);
}
}
}

//////////////////////////////////////////////////////////////////////////
Apr 28 '06 #1
48 4412
Alex Chudnovsky wrote:
I have come across with what appears to be a significant performance
bug in .NET 2.0 ArrayList.Sort method when compared with Array.Sort
on the same data. Same data on the same CPU gets sorted a lot faster
with both methods using .NET 1.1, that's why I am pretty sure its a
(rather serious) bug. Below you can find C# test case that should
allow you to reproduce this error, to run it you will need to put 2
data files into current directory where executable is or just change
filename pathes accordingly, the data files can be obtained from here:

fast_data.txt:
http://majestic12.co.uk/files/other/.../fast_data.txt
slow_data.txt:
http://majestic12.co.uk/files/other/.../slow_data.txt

The data are strings (URLs) of about similar size in bytes and number.

The following are the console outputs from code on the same machine
(AMD x2 3800, 2 GB RAM, XP Pro SP2):

VS2003 .NET 1.1 (with SP1) run:

-----------------------------------------------------------
Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 234 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 250 msec
-----------------------------------------------------------

Note that sorting times are almost exactly the same here, so all good
in .NET 1.1 .

-----------------------------------------------------------
VS2005 .NET 2.0 run:

Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 1531 msec
Time to sort strings in string[] array is: 187 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 703 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------------------

Notice that on the same machine with the same data sorting times are
MUCH slower in ArrayList.Sort, and particularly for the
"slow_data. txt" file, Array.Sort times are actually better, so I am
not complaining there, but clearly ArrayList sorts are seriously
flawed - this appears to be data dependent and by that I don't mean
size of the data or number of strings: I have got lots of such data
files and about every 10th of them is 10-20 times slower than the
other even though it has got about the same number of lines and
bytesize.

Note: I am aware of boxing/unboxing overheads when dealing with
ArrayLists, however in this case the slowdown is really bad comparing
to .NET 1.1 and it appears to be data dependent - I am getting it on
about 10% of my dataset from which I have selected 2 examples (slow
and fast) to demonstrate that its a very serious performance issue.

Here is the code that should allow you to replicate the problem:

//////////////////////////////////////////////////////////////////////
//// using System;
using System.Collecti ons;
using System.IO;

/*
This is a test case of what appears to be a major performance
issue in ArrayList.Sort method for strings
in .NET 2.0 - it appears to be data dependant as some similarly
sized data files have got a lot less
performance penalty when sorting them using Array.Sort method on
string[] array.

The kicker: both versions run fast in .NET 1.1

Author: Alex Chudnovsky <al***@majestic 12.co.uk>
Date: 28 Apr 2006
*/
namespace Majestic12
{
/// <summary>
/// Tests sorting performance of Array.Sort of string[] versus
ArrayList.Sort of the same strings
/// It appears that in .NET 2.0 in some cases ArrayList will take
a LOT more time to do the sorting
/// </summary>
class SlowArrayListSo rtTest
{
static void Main(string[] args)
{
// load strings from file: assumed to be in the same
place as the executable
TestFile("slow_ data.txt"); // <--- this data file has got
10 times slower
TestFile("fast_ data.txt"); // <--- more reasonable 2
times slower

Console.WriteLi ne("Press ENTER to exit");
Console.ReadLin e();
}

static void TestFile(string sFile)
{
FileStream oFS=File.OpenRe ad(sFile);

ArrayList oLines=new ArrayList();

StreamReader oSR=new StreamReader(oF S);

while(oSR.BaseS tream.Position< oSR.BaseStream. Length)
{
oLines.Add(oSR. ReadLine());
}

oFS.Close();

Console.WriteLi ne("Loaded {0} lines from file
{1}",oLines.Cou nt,sFile);

// now copy same strings into string array for speed
comparisons string[] sLines=new string[oLines.Count];

for(int i=0;i<sLines.Le ngth;i++)
sLines[i]=(string)oLines[i];

DateTime oTime=DateTime. Now;
oLines.Sort();
Console.WriteLi ne("Time to sort strings in ArrayList is:
{0}
msec",(DateTime .Now.Ticks-oTime.Ticks)/TimeSpan.TicksP erMillisecond);

oTime=DateTime. Now;
Array.Sort(sLin es);
Console.WriteLi ne("Time to sort strings in string[] array
is: {0}
msec",(DateTime .Now.Ticks-oTime.Ticks)/TimeSpan.TicksP erMillisecond);
} }
}

//////////////////////////////////////////////////////////////////////
////


Strange, as ArrayList.Sort uses Array.Sort() :D (use reflector, and
see for yourself). So it's not the sort that's flawed, as it's
precisely the same routine that gets executed.

The only difference is that ArrayList.Sort( ) uses a default comparer,
Array.Sort() doesn't. I think this is where the problem occurs.

I.o.w.: your sorts aren't exactly the same: the array sort is sorting
an array of strings. The arraylist sort is sorting an array of objects
using a comparer.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Apr 29 '06 #2
"Frans Bouma [C# MVP]" wrote:
I.o.w.: your sorts aren't exactly the same: the array sort is sorting
an array of strings. The arraylist sort is sorting an array of objects
using a comparer.
I am aware of that - I mentioned boxing/unboxing overheads. The issues are:
a) its a LOT slower than in .NET 1.1 <-- this alone makes it unacceptable
b) its DRAMATICALLY slower with some data
Strange, as ArrayList.Sort uses Array.Sort() :D (use reflector, and
see for yourself). So it's not the sort that's flawed, as it's
precisely the same routine that gets executed.


I can't say for sure what's flawed but _something_ certainly is, its no so
much slower performance of ArrayList.Sort that bothers me, its much slower
performance than same code for .NET 1.1, I added List<string> test and it
runs just fine, the problem appears to be solely in ArrayList - there is
absolutely no reason why sorting should take many times over version that is
present in .NET 1.1, I actually have got some much worse case data scenarios
that take even much much longer, this is an issue since legacy .NET 1.1 won't
use List<string> and thus require a rewrite. If performance hit was moderate
it could have been ignored, but the performance hit appears to be _DATA_
dependent, if anyone from Microsoft is interested to track this further then
I will be happy to assist.

Here is link to the source code that was mangled by this forum:
http://majestic12.co.uk/files/other/...bug/program.cs

Of course I did workaround it by using List<string>, but I don't like that
"solution" since:
a) it prevents me from being able to do legacy .NET 1.1 builds or involve
conditional defines
b) legacy compiled stuff without source will be hit by unexplained slowdowns

regards,

Alex
Apr 29 '06 #3
Alex Chudnovsky <Al************ @discussions.mi crosoft.com> wrote:
"Frans Bouma [C# MVP]" wrote:
I.o.w.: your sorts aren't exactly the same: the array sort is sorting
an array of strings. The arraylist sort is sorting an array of objects
using a comparer.


I am aware of that - I mentioned boxing/unboxing overheads.


Those won't be relevant with strings - there's no boxing/unboxing
involved with reference types.

(I haven't read the rest of this thread - just thought I'd mention that
quickly now.)

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 29 '06 #4
> Those won't be relevant with strings - there's no boxing/unboxing
involved with reference types.


Fair enough, I stand corrected. The main issue however is still here: there
should not be any reason why same ArrayList sorts a lot slower in .NET 2.0
build than when it does in .NET 1.1 build with everything else being the same
(machine, data etc). Generic string list and Array.Sort actually sort faster
in .NET 2.0, but ArrayList seems to be rather badly affected, at least for
the data I provided.

It would be good if someone tried the sample code (see link above) and
posted their timings just to see if this is replicated on other machines
(maybe its AMD Athlon x2 that's only affected, something I feel rather
unlikely, but you never know).

Apr 29 '06 #5
Alex,
It would be good if someone tried the sample code (see link above)
and posted their timings just to see if this is replicated on other
machines (maybe its AMD Athlon x2 that's only affected, something I
feel rather unlikely, but you never know).

just tried your code, one time with a big file (3MB) and once with a
small file (just several KB):

Result 1, Big File. This is an average result, even though the measured
milliseconds are not always the same (about +/- 40 msec in execution time):

Loaded 70072 lines from file slow_data.txt
Time to sort strings in ArrayList is: 406 msec
Time to sort strings in Generic Lst is: 406 msec
Time to sort strings in string[] array is: 406 msec
Loaded 70072 lines from file fast_data.txt
Time to sort strings in ArrayList is: 406 msec
Time to sort strings in Generic Lst is: 406 msec
Time to sort strings in string[] array is: 453 msec
Press ENTER to exit
Result 2, Small file. Here it is really depending, on the fast data, I
also got the result, where the ArrayList had 15msec and the Generic List
got 31 msec... On the slow data, the Generic List seems to be quite
constant, whereas the ArrayList is between 70 and 180 msec.
One example output:

Loaded 5835 lines from file slow_data.txt
Time to sort strings in ArrayList is: 78 msec
Time to sort strings in Generic Lst is: 93 msec
Time to sort strings in string[] array is: 62 msec
Loaded 5835 lines from file fast_data.txt
Time to sort strings in ArrayList is: 31 msec
Time to sort strings in Generic Lst is: 15 msec
Time to sort strings in string[] array is: 31 msec
Press ENTER to exit
Finally, I thought, that there might be problem with the startup and I
switch the line oLines.Sort with the one of oGLines.Sort and it
happened, that now the oGLines.Sort semms to be slower... so I think it
is a matter of Startup (most of the time might arise at a first
initialization of Array.Sort()).. . maybe some internal static variable
or whatever...

And additionally, I tried some really big file (some 18MB) and it
happened, that all three sorting algorithms merely were the same... so
for me the Sort() method performs well, it must be some matter of startup.

long answer and a lot of assumptions, but I hope to help!
Markus

btw.: I have tested it on a PIV Centrino Laptop.
Apr 29 '06 #6
Thanks for that Marcus - but from what I can see you have not tried it with
my data that I linked to, specifically the following that seems slower than
usual: http://majestic12.co.uk/files/other/.../slow_data.txt

I have got thousands of such data files and about 10% of them exhibit the
problem, the rest get sorted in ArrayList pretty fast, however SOME are much
slower, yet if Array.Sort is used then they are all fast, hence my assertion
that its data dependent and that's why I provided links to actual data to
help replicate it, please try it and post here results, thanks!

The issue appears to be really data dependent, which is probably why it may
have escaped scrutiny during normal tests.

regards

Alex
Apr 29 '06 #7
Alex Chudnovsky <Al************ @discussions.mi crosoft.com> wrote:
Thanks for that Marcus - but from what I can see you have not tried it with
my data that I linked to, specifically the following that seems slower than
usual: http://majestic12.co.uk/files/other/.../slow_data.txt

I have got thousands of such data files and about 10% of them exhibit the
problem, the rest get sorted in ArrayList pretty fast, however SOME are much
slower, yet if Array.Sort is used then they are all fast, hence my assertion
that its data dependent and that's why I provided links to actual data to
help replicate it, please try it and post here results, thanks!

The issue appears to be really data dependent, which is probably why it may
have escaped scrutiny during normal tests.


I believe I've read before that the ArrayList.Sort pivoting strategy
has changed between versions, which makes some data slower and some
data faster. I think that if it's faster "on average" and the speed-up
of some is comparable to the slow-down of others, that it's a
reasonable change - it sounds like you're just unlucky in your data.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 29 '06 #8
> I believe I've read before that the ArrayList.Sort pivoting strategy
has changed between versions, which makes some data slower and some
data faster. I think that if it's faster "on average" and the speed-up
of some is comparable to the slow-down of others, that it's a
reasonable change - it sounds like you're just unlucky in your data.


I'd love to see some pointers showing that its the case.

IMO its totally unacceptable to have such an unjustifiable slowdown, as I am
actually getting more than 10 times slower sorts. The only reason I noticed
them because code that run fast under .NET 1.1 actually was like a dog under
..NET 2, and even though 90% of data sets I have get sorted fast, the
remaining 10% were killing performance so much that I had to avoid using
ArrayList. I totally appreciate that some data takes more time to sort,
however the different between sorting calls should not be that dramatic.

The "luck" should not be factor between ArrayList.Sort and List<string>.So rt
and Array.Sort - some overheads - yes, but not THAT much, clearly something
is seriously wrong.

I would really like to see some independent confirmation of timings: please
use my data files as supplied above.

regards,

Alex
Apr 29 '06 #9
Ok, I got the real good test for you, this is the actual slowdown that I
experienced in my app, the test before was not as obvious as it is now.

Source code:

Data file (you need to use it, don't use your data files as error seems to
be data dependent!):

-----------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29972 lines from file slow_data.txt
Time to sort strings in Generic List is: 171 msec
Time to sort strings in ArrayList is: 33343 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------

Notice ArrayList sorting time? Same data but its actually 200 (!) TIMES
slower!

Here is same code (apart from Generic List that is only prevent in .NET 2.0)
run on .NET 1.1:

-----------------------------------------------
Loaded 29972 lines from file slow_data.txt
Time to sort strings in ArrayList is: 187 msec
Time to sort strings in string[] array is: 187 msec
Press ENTER to exit
-----------------------------------------------

Notice that all works fine in .NET 1.1, its same machine, same everything.

So, I would appreciate anyone to try the code above and data file to run on
their PC to verify if you get same much slower performance from ArrayLists
for _that_ data file (ignore all other files you may have).

This is serious stuff - 200 times slowdown for 10% of my real life data
cases meant overall slowdown of around 20 times: this is a killer for legacy
..NET 1.1 apps that may mysteriously start using much more CPU.
Apr 29 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
482
by: herbert | last post by:
1) I created a class Person whose objects are stored in collections. Using IComparable allows to define one sort criteria like name. Best practice: How can I add more sort criterias like age, zip, ...? I don't care whether the original collection is sorted or I get a copied collection sorted. 2) VB.NET: When is an ArrayList faster than...
0
7465
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7656
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7752
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5325
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
4944
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3449
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3441
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1013
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
701
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.