473,738 Members | 11,192 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to best extract a list of identical keys in a sorted ArrayList with BinarySearch ?

Guy
Is there a better way to search identical elements in a sorted array
list than the following:

iIndex = Array.BinarySea rch( m_Array, 0, m_Array.Count,
aSearchedObject );

aFoundObject= m_Array[iIndex];

m_ResultArray.A dd ( aFoundObject);

iIndex++;
while ( ( m_Array[iIndex].CompareTo( aSearchedObject ) ) && ( iIndex
< m_Array.Count ) )
{
m_ResultArray.A dd ( aFoundObject);

Jun 1 '07 #1
8 2594
Guy
Ok, bad key strucked:

Is there a better way to search identical elements in a sorted array
list than the following:

iIndex = Array.BinarySea rch( m_Array, 0, m_Array.Count,
aSearchedObject );
aFoundObject= m_Array[iIndex];
m_ResultArray.A dd ( aFoundObject);
iIndex++;
while ( ( m_Array[iIndex].CompareTo( aSearchedObject ) ) && ( iIndex
< m_Array.Count ) )
{
m_ResultArray.A dd ( aFoundObject);
iIndex++
}

Is there a garanty that the first BinarySearch, will strike the first
items in the sorted list ?

Example the search object is :
ID, Make, Model, Year ( Ex: 1234, Buick, Regal, 1998 )
The array contains 5 Buick, Regal, 1998 with 5 distincts ID (they have
different engines configurations) . I assume they are consecutive in
the sorted array.
My IComparable compares two object using this:
(Make+Model+Yea r).ToHashCode.C omparesTo ( the other object hascode)

I need to extract the 5 Buicks ID from the list as fast as possible.

Thanks.
Jun 1 '07 #2

"Guy" <gu****@yahoo.c omwrote in message
news:11******** **************@ p47g2000hsd.goo glegroups.com.. .
Ok, bad key strucked:

Is there a better way to search identical elements in a sorted array
list than the following:

iIndex = Array.BinarySea rch( m_Array, 0, m_Array.Count,
aSearchedObject );
aFoundObject= m_Array[iIndex];
m_ResultArray.A dd ( aFoundObject);
iIndex++;
while ( ( m_Array[iIndex].CompareTo( aSearchedObject ) ) && ( iIndex
< m_Array.Count ) )
{
m_ResultArray.A dd ( aFoundObject);
iIndex++
}

Is there a garanty that the first BinarySearch, will strike the first
items in the sorted list ?
Probably not... checking Reflector.

No, there is no such guarantee. BinarySearch will return the index for the
first item it encounters which compares equal, and since it doesn't make a
sequential pass through the array, that is not necessarily the lowest index
of all matching items.
>
Example the search object is :
ID, Make, Model, Year ( Ex: 1234, Buick, Regal, 1998 )
The array contains 5 Buick, Regal, 1998 with 5 distincts ID (they have
different engines configurations) . I assume they are consecutive in
the sorted array.
My IComparable compares two object using this:
(Make+Model+Yea r).ToHashCode.C omparesTo ( the other object hascode)

I need to extract the 5 Buicks ID from the list as fast as possible.

Thanks.


Jun 1 '07 #3
On Jun 1, 7:48 am, Guy <guh...@yahoo.c omwrote:
Ok, bad key strucked:

Is there a better way to search identical elements in a sorted array
list than the following:

iIndex = Array.BinarySea rch( m_Array, 0, m_Array.Count,
aSearchedObject );

aFoundObject= m_Array[iIndex];

m_ResultArray.A dd ( aFoundObject);

iIndex++;
while ( ( m_Array[iIndex].CompareTo( aSearchedObject ) ) && ( iIndex
< m_Array.Count ) )
{
m_ResultArray.A dd ( aFoundObject);
iIndex++
}

Is there a garanty that the first BinarySearch, will strike the first
items in the sorted list ?

Example the search object is :
ID, Make, Model, Year ( Ex: 1234, Buick, Regal, 1998 )
The array contains 5 Buick, Regal, 1998 with 5 distincts ID (they have
different engines configurations) . I assume they are consecutive in
the sorted array.
My IComparable compares two object using this:
(Make+Model+Yea r).ToHashCode.C omparesTo ( the other object hascode)

I need to extract the 5 Buicks ID from the list as fast as possible.
Use BinarySearch to find an entry. Work backward from the entry to
find the first entry. Work forward from the entry that BinarySearch
found to find the last entry. Done.

There is no guarantee that a binary search will find the first entry,
since it doesn't examine all of the elements, it can't possibly. Only
a linear search will guarantee to find the first matching entry.

How long is this ArrayList, by the way?

If the ArrayList is truly huge, and this search is a common operation,
consider the following: trade memory for speed. Use a SortedList of
objects, where each object in the main list contains the key and a
list of items with that key. When you need all items with a certain
key, hash into the SortedList (no searching required) and grab the
list of equivalent items. Lookups will be as fast as they can possibly
be.

You can also create an enumeration over the SortedList that returns a
deep traversal of all items in the collection, in order.

However, the whole thing will use significantly more memory.

Jun 1 '07 #4
On Fri, 01 Jun 2007 10:00:53 -0700, Bruce Wood <br*******@cana da.com>
wrote:
Use BinarySearch to find an entry. Work backward from the entry to
find the first entry. Work forward from the entry that BinarySearch
found to find the last entry. Done.

There is no guarantee that a binary search will find the first entry,
since it doesn't examine all of the elements, it can't possibly. Only
a linear search will guarantee to find the first matching entry.
Well, sure a binary search could find the first entry. It could do
exactly what you propose the OP do, which is to upon finding a matching
element, work its way back until it knows where the first matching one
is. It all just depends on how the behavior is defined. Microsoft
decided to go with the faster-but-ambiguous "first match found" design,
but they could just as easily have defined the behavior to always return
the "first match in array" result instead.
How long is this ArrayList, by the way?

If the ArrayList is truly huge, and this search is a common operation,
consider the following: trade memory for speed. Use a SortedList of
objects, where each object in the main list contains the key and a
list of items with that key. When you need all items with a certain
key, hash into the SortedList (no searching required) and grab the
list of equivalent items. Lookups will be as fast as they can possibly
be.
I'm not exactly clear on what you propose here. First of all, according
to the docs, SortedList does not allow duplicate keys. So the question of
finding the first of a given key is moot.

Second, hashing only works if you've stored the items according to the
hash value in the first place. I see no advantage in applying some sort
of hash value to a SortedList. Now, a Hashtable would be different and of
course much faster. But then that's true generally and is a completely
different issue.

Finally, as near as I can tell the implementation of SortedList involves a
separate array containing just the keys and indexes into the array
containing the actual value data. This means that every time something is
added to, or removed from the collection, the key index array has to be
adjusted, shifting all the items after the insertion or removal. This is
probably okay if the array is built once and then persists, but it could
get very expensive if there's a lot of data and the array is in constant
flux. Note that this issue is worst in the very situation in which you're
suggesting one use SortedList: "If the ArrayList is truly huge".

And of course, given that you can't just "hash into the SortedList",
searching a SortedList is going to take the same time as searching a
sorted ArrayList, since both would require a binary search for optimal
results.

All of the above is with respect to the non-generic SortedList. There is
also a generic SortedList<that would be useful in some situations, but
which has similar problems (in particular, the requirement that keys be
unique). It's not clear from the documentation whether it's a balanced
binary tree or not, but if it's not then it would also have the problem
that the worst-case scenario for searching is considerably worse than for
a binary search.

If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.

Pete
Jun 1 '07 #5
On Jun 1, 11:00 am, "Peter Duniho" <NpOeStPe...@nn owslpianmk.com>
wrote:
On Fri, 01 Jun 2007 10:00:53 -0700, Bruce Wood <brucew...@cana da.com>
wrote:
Use BinarySearch to find an entry. Work backward from the entry to
find the first entry. Work forward from the entry that BinarySearch
found to find the last entry. Done.
There is no guarantee that a binary search will find the first entry,
since it doesn't examine all of the elements, it can't possibly. Only
a linear search will guarantee to find the first matching entry.

Well, sure a binary search could find the first entry. It could do
exactly what you propose the OP do, which is to upon finding a matching
element, work its way back until it knows where the first matching one
is. It all just depends on how the behavior is defined. Microsoft
decided to go with the faster-but-ambiguous "first match found" design,
but they could just as easily have defined the behavior to always return
the "first match in array" result instead.
True.
How long is this ArrayList, by the way?
If the ArrayList is truly huge, and this search is a common operation,
consider the following: trade memory for speed. Use a SortedList of
objects, where each object in the main list contains the key and a
list of items with that key. When you need all items with a certain
key, hash into the SortedList (no searching required) and grab the
list of equivalent items. Lookups will be as fast as they can possibly
be.

I'm not exactly clear on what you propose here. First of all, according
to the docs, SortedList does not allow duplicate keys. So the question of
finding the first of a given key is moot.
You stopped reading at "SortedList ". I'm proposing the classic "list
of lists" solution: the sorted list contains one entry for each key,
where that entry contains all of the items that match that key.
Second, hashing only works if you've stored the items according to the
hash value in the first place. I see no advantage in applying some sort
of hash value to a SortedList. Now, a Hashtable would be different and of
course much faster. But then that's true generally and is a completely
different issue.

Finally, as near as I can tell the implementation of SortedList involves a
separate array containing just the keys and indexes into the array
containing the actual value data. This means that every time something is
added to, or removed from the collection, the key index array has to be
adjusted, shifting all the items after the insertion or removal. This is
probably okay if the array is built once and then persists, but it could
get very expensive if there's a lot of data and the array is in constant
flux. Note that this issue is worst in the very situation in which you're
suggesting one use SortedList: "If the ArrayList is truly huge".

And of course, given that you can't just "hash into the SortedList",
searching a SortedList is going to take the same time as searching a
sorted ArrayList, since both would require a binary search for optimal
results.
I have never looked at the underlying implementation for a SortedList.
If you are correct, then I'm appalled that it does not combine a list
and a hash table. If the indexer on key has to do a binary search then
that's a terrible waste of time. I'd rather trade memory for more
speed, personally.
All of the above is with respect to the non-generic SortedList. There is
also a generic SortedList<that would be useful in some situations, but
which has similar problems (in particular, the requirement that keys be
unique).
....which isn't a problem at all: you misunderstood my post.
It's not clear from the documentation whether it's a balanced
binary tree or not, but if it's not then it would also have the problem
that the worst-case scenario for searching is considerably worse than for
a binary search.

If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.
Yes, and if SortedList truly is simply an ordered list with a key
indexer that searches by binary search or some other such method, then
I wouldn't propose it.

In that case the OP should use a hash table and, if ordered access is
required for some reason, either maintain a parallel sorted list or
sort it on an as-needed basis.

I still can't believe that SortedList doesn't use a dual structure
internally. Yuck.

Jun 1 '07 #6
On Fri, 01 Jun 2007 16:36:02 -0700, Bruce Wood <br*******@cana da.com>
wrote:
You stopped reading at "SortedList ". I'm proposing the classic "list
of lists" solution: the sorted list contains one entry for each key,
where that entry contains all of the items that match that key.
I didn't stop reading. I simply got distracted by the implication that
you could use some sort of hashed access to the SortedList. :)
[...]
I have never looked at the underlying implementation for a SortedList.
If you are correct, then I'm appalled that it does not combine a list
and a hash table. If the indexer on key has to do a binary search then
that's a terrible waste of time. I'd rather trade memory for more
speed, personally.
Well, there's a Hashtable class. I presume that SortedList doesn't do
hashing, because someone who wanted hashing would just use the Hashtable
class instead, or possibly would use the two class together to achieve
hashing with sorting. Hashing and sorting are so fundamentally different
that I just don't see a simple collection class actually implementing
both. Not that it couldn't, just that it would make the interface so
complicated, merging two completely different behaviors into a single
class.
[...]
>If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.

Yes, and if SortedList truly is simply an ordered list with a key
indexer that searches by binary search or some other such method, then
I wouldn't propose it.
From the documentation of SortedList.Item :

Retrieving the value of this property is an O(log n)
operation, where n is Count

Sure looks like a binary search to me. :)
In that case the OP should use a hash table and, if ordered access is
required for some reason, either maintain a parallel sorted list or
sort it on an as-needed basis.
Well, IMHO it really depends on the performance needs. A binary search is
actually pretty fast, even on very large data. The main problem with it
is that it requires that the data be kept in sorted order in the first
place, which can be expensive. But if you already have the requirement
that the data be sorted, I see no need to switch to hashing to find data.
Even if you assume that the binary search is slower than a lookup by hash
value, it's not going to be *much* slower (certainly not an order of
magnitude or anything like that).

In addition, hashing of course has the trade-off of memory requirements
versus collisions. With any large data set, unless you have a really huge
hash table, you're going to have a fair number of collisions, which of
course you have to search linearly through before getting the item you
really want. That linear search can easily consume the same time as a
simple binary search on sorted data would, depending on the number of
collisions for that hashed value.

Hashing works well when the data isn't sorted in the first place and you
have no other need to sort it, but if the data needs to be sorted anyway,
there's really no reason to not just use a binary search.
I still can't believe that SortedList doesn't use a dual structure
internally. Yuck.
Well, it does. It has an index array and the value array. :) It's just
that neither of those data structures are a hash table. :)

Pete
Jun 2 '07 #7
On Jun 1, 5:32 pm, "Peter Duniho" <NpOeStPe...@nn owslpianmk.com>
wrote:
On Fri, 01 Jun 2007 16:36:02 -0700, Bruce Wood <brucew...@cana da.com>
wrote:
You stopped reading at "SortedList ". I'm proposing the classic "list
of lists" solution: the sorted list contains one entry for each key,
where that entry contains all of the items that match that key.

I didn't stop reading. I simply got distracted by the implication that
you could use some sort of hashed access to the SortedList. :)
[...]
I have never looked at the underlying implementation for a SortedList.
If you are correct, then I'm appalled that it does not combine a list
and a hash table. If the indexer on key has to do a binary search then
that's a terrible waste of time. I'd rather trade memory for more
speed, personally.

Well, there's a Hashtable class. I presume that SortedList doesn't do
hashing, because someone who wanted hashing would just use the Hashtable
class instead, or possibly would use the two class together to achieve
hashing with sorting. Hashing and sorting are so fundamentally different
that I just don't see a simple collection class actually implementing
both. Not that it couldn't, just that it would make the interface so
complicated, merging two completely different behaviors into a single
class.
Oh, I don't know. I think the interface would look exactly like
SortedList, except that it would find things faster. :-)
If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.
Yes, and if SortedList truly is simply an ordered list with a key
indexer that searches by binary search or some other such method, then
I wouldn't propose it.

From the documentation of SortedList.Item :

Retrieving the value of this property is an O(log n)
operation, where n is Count

Sure looks like a binary search to me. :)
Yup. That's pretty conclusive. I should read the doc more closely,
huh? :-)
In that case the OP should use a hash table and, if ordered access is
required for some reason, either maintain a parallel sorted list or
sort it on an as-needed basis.

Well, IMHO it really depends on the performance needs. A binary search is
actually pretty fast, even on very large data. The main problem with it
is that it requires that the data be kept in sorted order in the first
place, which can be expensive. But if you already have the requirement
that the data be sorted, I see no need to switch to hashing to find data.
Even if you assume that the binary search is slower than a lookup by hash
value, it's not going to be *much* slower (certainly not an order of
magnitude or anything like that).
Agreed.
In addition, hashing of course has the trade-off of memory requirements
versus collisions. With any large data set, unless you have a really huge
hash table, you're going to have a fair number of collisions, which of
course you have to search linearly through before getting the item you
really want. That linear search can easily consume the same time as a
simple binary search on sorted data would, depending on the number of
collisions for that hashed value.

Hashing works well when the data isn't sorted in the first place and you
have no other need to sort it, but if the data needs to be sorted anyway,
there's really no reason to not just use a binary search.
Agreed.

I had just assumed that SortedList provided the fastest access
possible but, as you pointed out, the gain in speed wouldn't be that
much, and the amount of memory consumed wouldn't be worth it.

So, we agree: the OP should use a Hashtable (or Dictionary<>) if
sorting isn't important, and a SortedList if sorting is required.

Jun 2 '07 #8
On Fri, 01 Jun 2007 20:13:40 -0700, Bruce Wood <br*******@cana da.com>
wrote:
[...]
So, we agree: the OP should use a Hashtable (or Dictionary<>) if
sorting isn't important, and a SortedList if sorting is required.
Yes, vehemently agree. :) (Personally, I'm in love with the generics, so
I would recommend SortedList<if the data needs to be ordered generally,
and Dictionary<if not :) ).
Jun 2 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2517
by: Robert Zurer | last post by:
Assuming that I have created a strongly typed collection and overridden the appropriate methods, i.e. this, Add, Insert etc., so that a sort order is maintained, it's still very possible for a property of the 'ListMemberObject' which is instrumental in the sort to be modified unbeknownst to the list thereby defeating the sort. One way to keep the list sorted is to add a 'ParentList' property to the ListMemberObject and notify that...
5
3838
by: Gerrit Beuze | last post by:
Hi all, Using C# 1.1: I need a fast way of determining whether a string is in a list of approx 150 keyword like strings. in two versions: one case-sensitive, the other one case-insenstive. I thought of loading the keywords in a Hastable as Keys with null values, and then use Hastable.ContainsKey(string) but Hashtable does not seem to support case-insentive keys.
10
11618
by: Ryan Graham | last post by:
I totally bombed this question in an interview so I'm posting my answer here for comments and suggestions... perhaps (god help me) I'm just not that bright, but this works and seems to be fairly efficent. The idea was simple, insert an integer into a list that has already been sorted. private int _list; .... public void Insert(int value) { int tempArray;
4
1527
by: Mike Dole | last post by:
I have an arraylist like the one with the Guitar Class sample in Q316302. Dim MycolliCol as arraylist Private Sub FillArray() MyColliCol.Add(New Colli(1, "STUK", 0)) MyColliCol.Add(New Colli(16, "ROL", 1)) MyColliCol.Add(New Colli(17, "BOS", 2)) MyColliCol.Add(New Colli(18, "DOOS", 0)) MyColliCol.Add(New Colli(19, "PAK", 0))
33
1302
by: Masahiro Ito | last post by:
I am trying to do an if statement that could have many different matches. In sql I would do it like: SELECT * FROM Customer WHERE Staff IN (10, 15, 23, 14) In VB, I would do: If Staff.Count = 10 or Staff.Count = 15 or Staff.Count = 23 or
9
2873
by: Paul Nations | last post by:
I've got arraylists of simple classes bound to controls. I need to search through those arraylists to set the correct SelectedItem in the control. The code looks like: Public Class DegreeMaintenance Private arrCipCodes As New ArrayList 'populate reader with data With rdr Do While .Read arrCipCodes.Add(New CipCode(.GetString(0), .GetString(1)))
5
17644
by: Someone | last post by:
I wish to use a sorted list to store lists of strings. I am unsure as to how things work. I want to do something like: SortedList list = new SortedList(); for(...loop for string...) { if( ! list.contains(key)) { list.add("keyval", new ArrayList(string)); else list.GetValue().add(string);
4
1855
by: shrishjain | last post by:
Hi All, I need a type where I can store my items in sorted order. And I want to keep adding items to it, and want it to remain sorted. Is there any type in .net which I can make use of. I see there is SortedList<key, value> for hash tables, but could find anything for a sorted list. Currently I am using List<string> and whenever I add an item, I need to
10
6770
by: pamelafluente | last post by:
Hi I have a sorted list with several thousands items. In my case, but this is not important, objects are stored only in Keys, Values are all Nothing. Several of the stored objects (might be a large number) have to be removed (say when ToBeRemoved = true). class SomeObj ToBeRemoved be a boolean field end class
0
8788
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9476
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9263
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9208
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6751
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6053
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4570
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4825
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2193
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.