473,387 Members | 1,532 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

How to best extract a list of identical keys in a sorted ArrayList with BinarySearch ?

Guy
Is there a better way to search identical elements in a sorted array
list than the following:

iIndex = Array.BinarySearch( m_Array, 0, m_Array.Count,
aSearchedObject );

aFoundObject= m_Array[iIndex];

m_ResultArray.Add ( aFoundObject);

iIndex++;
while ( ( m_Array[iIndex].CompareTo( aSearchedObject) ) && ( iIndex
< m_Array.Count ) )
{
m_ResultArray.Add ( aFoundObject);

Jun 1 '07 #1
8 2573
Guy
Ok, bad key strucked:

Is there a better way to search identical elements in a sorted array
list than the following:

iIndex = Array.BinarySearch( m_Array, 0, m_Array.Count,
aSearchedObject );
aFoundObject= m_Array[iIndex];
m_ResultArray.Add ( aFoundObject);
iIndex++;
while ( ( m_Array[iIndex].CompareTo( aSearchedObject) ) && ( iIndex
< m_Array.Count ) )
{
m_ResultArray.Add ( aFoundObject);
iIndex++
}

Is there a garanty that the first BinarySearch, will strike the first
items in the sorted list ?

Example the search object is :
ID, Make, Model, Year ( Ex: 1234, Buick, Regal, 1998 )
The array contains 5 Buick, Regal, 1998 with 5 distincts ID (they have
different engines configurations). I assume they are consecutive in
the sorted array.
My IComparable compares two object using this:
(Make+Model+Year).ToHashCode.ComparesTo ( the other object hascode)

I need to extract the 5 Buicks ID from the list as fast as possible.

Thanks.
Jun 1 '07 #2

"Guy" <gu****@yahoo.comwrote in message
news:11**********************@p47g2000hsd.googlegr oups.com...
Ok, bad key strucked:

Is there a better way to search identical elements in a sorted array
list than the following:

iIndex = Array.BinarySearch( m_Array, 0, m_Array.Count,
aSearchedObject );
aFoundObject= m_Array[iIndex];
m_ResultArray.Add ( aFoundObject);
iIndex++;
while ( ( m_Array[iIndex].CompareTo( aSearchedObject) ) && ( iIndex
< m_Array.Count ) )
{
m_ResultArray.Add ( aFoundObject);
iIndex++
}

Is there a garanty that the first BinarySearch, will strike the first
items in the sorted list ?
Probably not... checking Reflector.

No, there is no such guarantee. BinarySearch will return the index for the
first item it encounters which compares equal, and since it doesn't make a
sequential pass through the array, that is not necessarily the lowest index
of all matching items.
>
Example the search object is :
ID, Make, Model, Year ( Ex: 1234, Buick, Regal, 1998 )
The array contains 5 Buick, Regal, 1998 with 5 distincts ID (they have
different engines configurations). I assume they are consecutive in
the sorted array.
My IComparable compares two object using this:
(Make+Model+Year).ToHashCode.ComparesTo ( the other object hascode)

I need to extract the 5 Buicks ID from the list as fast as possible.

Thanks.


Jun 1 '07 #3
On Jun 1, 7:48 am, Guy <guh...@yahoo.comwrote:
Ok, bad key strucked:

Is there a better way to search identical elements in a sorted array
list than the following:

iIndex = Array.BinarySearch( m_Array, 0, m_Array.Count,
aSearchedObject );

aFoundObject= m_Array[iIndex];

m_ResultArray.Add ( aFoundObject);

iIndex++;
while ( ( m_Array[iIndex].CompareTo( aSearchedObject) ) && ( iIndex
< m_Array.Count ) )
{
m_ResultArray.Add ( aFoundObject);
iIndex++
}

Is there a garanty that the first BinarySearch, will strike the first
items in the sorted list ?

Example the search object is :
ID, Make, Model, Year ( Ex: 1234, Buick, Regal, 1998 )
The array contains 5 Buick, Regal, 1998 with 5 distincts ID (they have
different engines configurations). I assume they are consecutive in
the sorted array.
My IComparable compares two object using this:
(Make+Model+Year).ToHashCode.ComparesTo ( the other object hascode)

I need to extract the 5 Buicks ID from the list as fast as possible.
Use BinarySearch to find an entry. Work backward from the entry to
find the first entry. Work forward from the entry that BinarySearch
found to find the last entry. Done.

There is no guarantee that a binary search will find the first entry,
since it doesn't examine all of the elements, it can't possibly. Only
a linear search will guarantee to find the first matching entry.

How long is this ArrayList, by the way?

If the ArrayList is truly huge, and this search is a common operation,
consider the following: trade memory for speed. Use a SortedList of
objects, where each object in the main list contains the key and a
list of items with that key. When you need all items with a certain
key, hash into the SortedList (no searching required) and grab the
list of equivalent items. Lookups will be as fast as they can possibly
be.

You can also create an enumeration over the SortedList that returns a
deep traversal of all items in the collection, in order.

However, the whole thing will use significantly more memory.

Jun 1 '07 #4
On Fri, 01 Jun 2007 10:00:53 -0700, Bruce Wood <br*******@canada.com>
wrote:
Use BinarySearch to find an entry. Work backward from the entry to
find the first entry. Work forward from the entry that BinarySearch
found to find the last entry. Done.

There is no guarantee that a binary search will find the first entry,
since it doesn't examine all of the elements, it can't possibly. Only
a linear search will guarantee to find the first matching entry.
Well, sure a binary search could find the first entry. It could do
exactly what you propose the OP do, which is to upon finding a matching
element, work its way back until it knows where the first matching one
is. It all just depends on how the behavior is defined. Microsoft
decided to go with the faster-but-ambiguous "first match found" design,
but they could just as easily have defined the behavior to always return
the "first match in array" result instead.
How long is this ArrayList, by the way?

If the ArrayList is truly huge, and this search is a common operation,
consider the following: trade memory for speed. Use a SortedList of
objects, where each object in the main list contains the key and a
list of items with that key. When you need all items with a certain
key, hash into the SortedList (no searching required) and grab the
list of equivalent items. Lookups will be as fast as they can possibly
be.
I'm not exactly clear on what you propose here. First of all, according
to the docs, SortedList does not allow duplicate keys. So the question of
finding the first of a given key is moot.

Second, hashing only works if you've stored the items according to the
hash value in the first place. I see no advantage in applying some sort
of hash value to a SortedList. Now, a Hashtable would be different and of
course much faster. But then that's true generally and is a completely
different issue.

Finally, as near as I can tell the implementation of SortedList involves a
separate array containing just the keys and indexes into the array
containing the actual value data. This means that every time something is
added to, or removed from the collection, the key index array has to be
adjusted, shifting all the items after the insertion or removal. This is
probably okay if the array is built once and then persists, but it could
get very expensive if there's a lot of data and the array is in constant
flux. Note that this issue is worst in the very situation in which you're
suggesting one use SortedList: "If the ArrayList is truly huge".

And of course, given that you can't just "hash into the SortedList",
searching a SortedList is going to take the same time as searching a
sorted ArrayList, since both would require a binary search for optimal
results.

All of the above is with respect to the non-generic SortedList. There is
also a generic SortedList<that would be useful in some situations, but
which has similar problems (in particular, the requirement that keys be
unique). It's not clear from the documentation whether it's a balanced
binary tree or not, but if it's not then it would also have the problem
that the worst-case scenario for searching is considerably worse than for
a binary search.

If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.

Pete
Jun 1 '07 #5
On Jun 1, 11:00 am, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:
On Fri, 01 Jun 2007 10:00:53 -0700, Bruce Wood <brucew...@canada.com>
wrote:
Use BinarySearch to find an entry. Work backward from the entry to
find the first entry. Work forward from the entry that BinarySearch
found to find the last entry. Done.
There is no guarantee that a binary search will find the first entry,
since it doesn't examine all of the elements, it can't possibly. Only
a linear search will guarantee to find the first matching entry.

Well, sure a binary search could find the first entry. It could do
exactly what you propose the OP do, which is to upon finding a matching
element, work its way back until it knows where the first matching one
is. It all just depends on how the behavior is defined. Microsoft
decided to go with the faster-but-ambiguous "first match found" design,
but they could just as easily have defined the behavior to always return
the "first match in array" result instead.
True.
How long is this ArrayList, by the way?
If the ArrayList is truly huge, and this search is a common operation,
consider the following: trade memory for speed. Use a SortedList of
objects, where each object in the main list contains the key and a
list of items with that key. When you need all items with a certain
key, hash into the SortedList (no searching required) and grab the
list of equivalent items. Lookups will be as fast as they can possibly
be.

I'm not exactly clear on what you propose here. First of all, according
to the docs, SortedList does not allow duplicate keys. So the question of
finding the first of a given key is moot.
You stopped reading at "SortedList". I'm proposing the classic "list
of lists" solution: the sorted list contains one entry for each key,
where that entry contains all of the items that match that key.
Second, hashing only works if you've stored the items according to the
hash value in the first place. I see no advantage in applying some sort
of hash value to a SortedList. Now, a Hashtable would be different and of
course much faster. But then that's true generally and is a completely
different issue.

Finally, as near as I can tell the implementation of SortedList involves a
separate array containing just the keys and indexes into the array
containing the actual value data. This means that every time something is
added to, or removed from the collection, the key index array has to be
adjusted, shifting all the items after the insertion or removal. This is
probably okay if the array is built once and then persists, but it could
get very expensive if there's a lot of data and the array is in constant
flux. Note that this issue is worst in the very situation in which you're
suggesting one use SortedList: "If the ArrayList is truly huge".

And of course, given that you can't just "hash into the SortedList",
searching a SortedList is going to take the same time as searching a
sorted ArrayList, since both would require a binary search for optimal
results.
I have never looked at the underlying implementation for a SortedList.
If you are correct, then I'm appalled that it does not combine a list
and a hash table. If the indexer on key has to do a binary search then
that's a terrible waste of time. I'd rather trade memory for more
speed, personally.
All of the above is with respect to the non-generic SortedList. There is
also a generic SortedList<that would be useful in some situations, but
which has similar problems (in particular, the requirement that keys be
unique).
....which isn't a problem at all: you misunderstood my post.
It's not clear from the documentation whether it's a balanced
binary tree or not, but if it's not then it would also have the problem
that the worst-case scenario for searching is considerably worse than for
a binary search.

If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.
Yes, and if SortedList truly is simply an ordered list with a key
indexer that searches by binary search or some other such method, then
I wouldn't propose it.

In that case the OP should use a hash table and, if ordered access is
required for some reason, either maintain a parallel sorted list or
sort it on an as-needed basis.

I still can't believe that SortedList doesn't use a dual structure
internally. Yuck.

Jun 1 '07 #6
On Fri, 01 Jun 2007 16:36:02 -0700, Bruce Wood <br*******@canada.com>
wrote:
You stopped reading at "SortedList". I'm proposing the classic "list
of lists" solution: the sorted list contains one entry for each key,
where that entry contains all of the items that match that key.
I didn't stop reading. I simply got distracted by the implication that
you could use some sort of hashed access to the SortedList. :)
[...]
I have never looked at the underlying implementation for a SortedList.
If you are correct, then I'm appalled that it does not combine a list
and a hash table. If the indexer on key has to do a binary search then
that's a terrible waste of time. I'd rather trade memory for more
speed, personally.
Well, there's a Hashtable class. I presume that SortedList doesn't do
hashing, because someone who wanted hashing would just use the Hashtable
class instead, or possibly would use the two class together to achieve
hashing with sorting. Hashing and sorting are so fundamentally different
that I just don't see a simple collection class actually implementing
both. Not that it couldn't, just that it would make the interface so
complicated, merging two completely different behaviors into a single
class.
[...]
>If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.

Yes, and if SortedList truly is simply an ordered list with a key
indexer that searches by binary search or some other such method, then
I wouldn't propose it.
From the documentation of SortedList.Item:

Retrieving the value of this property is an O(log n)
operation, where n is Count

Sure looks like a binary search to me. :)
In that case the OP should use a hash table and, if ordered access is
required for some reason, either maintain a parallel sorted list or
sort it on an as-needed basis.
Well, IMHO it really depends on the performance needs. A binary search is
actually pretty fast, even on very large data. The main problem with it
is that it requires that the data be kept in sorted order in the first
place, which can be expensive. But if you already have the requirement
that the data be sorted, I see no need to switch to hashing to find data.
Even if you assume that the binary search is slower than a lookup by hash
value, it's not going to be *much* slower (certainly not an order of
magnitude or anything like that).

In addition, hashing of course has the trade-off of memory requirements
versus collisions. With any large data set, unless you have a really huge
hash table, you're going to have a fair number of collisions, which of
course you have to search linearly through before getting the item you
really want. That linear search can easily consume the same time as a
simple binary search on sorted data would, depending on the number of
collisions for that hashed value.

Hashing works well when the data isn't sorted in the first place and you
have no other need to sort it, but if the data needs to be sorted anyway,
there's really no reason to not just use a binary search.
I still can't believe that SortedList doesn't use a dual structure
internally. Yuck.
Well, it does. It has an index array and the value array. :) It's just
that neither of those data structures are a hash table. :)

Pete
Jun 2 '07 #7
On Jun 1, 5:32 pm, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:
On Fri, 01 Jun 2007 16:36:02 -0700, Bruce Wood <brucew...@canada.com>
wrote:
You stopped reading at "SortedList". I'm proposing the classic "list
of lists" solution: the sorted list contains one entry for each key,
where that entry contains all of the items that match that key.

I didn't stop reading. I simply got distracted by the implication that
you could use some sort of hashed access to the SortedList. :)
[...]
I have never looked at the underlying implementation for a SortedList.
If you are correct, then I'm appalled that it does not combine a list
and a hash table. If the indexer on key has to do a binary search then
that's a terrible waste of time. I'd rather trade memory for more
speed, personally.

Well, there's a Hashtable class. I presume that SortedList doesn't do
hashing, because someone who wanted hashing would just use the Hashtable
class instead, or possibly would use the two class together to achieve
hashing with sorting. Hashing and sorting are so fundamentally different
that I just don't see a simple collection class actually implementing
both. Not that it couldn't, just that it would make the interface so
complicated, merging two completely different behaviors into a single
class.
Oh, I don't know. I think the interface would look exactly like
SortedList, except that it would find things faster. :-)
If constant-order searching is desired, then a hashtable-based data
structure needs to be used, like Hashtable or the generic Dictionary<>.
Yes, and if SortedList truly is simply an ordered list with a key
indexer that searches by binary search or some other such method, then
I wouldn't propose it.

From the documentation of SortedList.Item:

Retrieving the value of this property is an O(log n)
operation, where n is Count

Sure looks like a binary search to me. :)
Yup. That's pretty conclusive. I should read the doc more closely,
huh? :-)
In that case the OP should use a hash table and, if ordered access is
required for some reason, either maintain a parallel sorted list or
sort it on an as-needed basis.

Well, IMHO it really depends on the performance needs. A binary search is
actually pretty fast, even on very large data. The main problem with it
is that it requires that the data be kept in sorted order in the first
place, which can be expensive. But if you already have the requirement
that the data be sorted, I see no need to switch to hashing to find data.
Even if you assume that the binary search is slower than a lookup by hash
value, it's not going to be *much* slower (certainly not an order of
magnitude or anything like that).
Agreed.
In addition, hashing of course has the trade-off of memory requirements
versus collisions. With any large data set, unless you have a really huge
hash table, you're going to have a fair number of collisions, which of
course you have to search linearly through before getting the item you
really want. That linear search can easily consume the same time as a
simple binary search on sorted data would, depending on the number of
collisions for that hashed value.

Hashing works well when the data isn't sorted in the first place and you
have no other need to sort it, but if the data needs to be sorted anyway,
there's really no reason to not just use a binary search.
Agreed.

I had just assumed that SortedList provided the fastest access
possible but, as you pointed out, the gain in speed wouldn't be that
much, and the amount of memory consumed wouldn't be worth it.

So, we agree: the OP should use a Hashtable (or Dictionary<>) if
sorting isn't important, and a SortedList if sorting is required.

Jun 2 '07 #8
On Fri, 01 Jun 2007 20:13:40 -0700, Bruce Wood <br*******@canada.com>
wrote:
[...]
So, we agree: the OP should use a Hashtable (or Dictionary<>) if
sorting isn't important, and a SortedList if sorting is required.
Yes, vehemently agree. :) (Personally, I'm in love with the generics, so
I would recommend SortedList<if the data needs to be ordered generally,
and Dictionary<if not :) ).
Jun 2 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Robert Zurer | last post by:
Assuming that I have created a strongly typed collection and overridden the appropriate methods, i.e. this, Add, Insert etc., so that a sort order is maintained, it's still very possible for a...
5
by: Gerrit Beuze | last post by:
Hi all, Using C# 1.1: I need a fast way of determining whether a string is in a list of approx 150 keyword like strings. in two versions: one case-sensitive, the other one case-insenstive. I...
10
by: Ryan Graham | last post by:
I totally bombed this question in an interview so I'm posting my answer here for comments and suggestions... perhaps (god help me) I'm just not that bright, but this works and seems to be fairly...
4
by: Mike Dole | last post by:
I have an arraylist like the one with the Guitar Class sample in Q316302. Dim MycolliCol as arraylist Private Sub FillArray() MyColliCol.Add(New Colli(1, "STUK", 0)) MyColliCol.Add(New...
33
by: Masahiro Ito | last post by:
I am trying to do an if statement that could have many different matches. In sql I would do it like: SELECT * FROM Customer WHERE Staff IN (10, 15, 23, 14) In VB, I would do: If...
9
by: Paul Nations | last post by:
I've got arraylists of simple classes bound to controls. I need to search through those arraylists to set the correct SelectedItem in the control. The code looks like: Public Class...
5
by: Someone | last post by:
I wish to use a sorted list to store lists of strings. I am unsure as to how things work. I want to do something like: SortedList list = new SortedList(); for(...loop for string...) { if( !...
4
by: shrishjain | last post by:
Hi All, I need a type where I can store my items in sorted order. And I want to keep adding items to it, and want it to remain sorted. Is there any type in .net which I can make use of. I see...
10
by: pamelafluente | last post by:
Hi I have a sorted list with several thousands items. In my case, but this is not important, objects are stored only in Keys, Values are all Nothing. Several of the stored objects (might be a...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.