473,416 Members | 1,543 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,416 software developers and data experts.

Search algorithm to use

If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the
first element greater than a certain value, is a simple linear search
the best here(for loop from beginning to end of array)? It seems like
some other search algorithm like binary or whatever would be of no
benefit on this data set.

Nov 14 '05 #1
28 3118
joshc wrote:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the
first element greater than a certain value, is a simple linear search
the best here(for loop from beginning to end of array)? It seems like
some other search algorithm like binary or whatever would be of no
benefit on this data set.


Alternatives
- hash table - insert n slements O(n); search O(1)
- quick sort - sort n elements O(n log n); search O(log n)

A hash table is close to optimal.

gtoomey
Nov 14 '05 #2
joshc wrote:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the
first element greater than a certain value, is a simple linear search
the best here(for loop from beginning to end of array)? It seems like
some other search algorithm like binary or whatever would be of no
benefit on this data set.


Questions of efficiency are really outside the bounds
of the C language as such; there are many implementations
of C and the timing characteristics of a given operation
differ from one implementation to the next.

Still, for such a small array it seems likely (not
certain, of course) that the difference between a linear
search and a binary search will be negligible. The way
to find out, of course, is to implement the search both
ways and measure. Even then, of course, the result will
only be valid for the implementation at hand and will not
necessarily hold for others.

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 14 '05 #3
I realize this, but still, regardless, there are algorithms that are in
general better than others. Perhaps you could suggest a more general
programming newsgroup for this type of question? I remember coming
across one some time ago but am having trouble finding it. It was a
programming group in general.
Eric Sosman wrote:
joshc wrote:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the first element greater than a certain value, is a simple linear search the best here(for loop from beginning to end of array)? It seems like some other search algorithm like binary or whatever would be of no
benefit on this data set.


Questions of efficiency are really outside the bounds
of the C language as such; there are many implementations
of C and the timing characteristics of a given operation
differ from one implementation to the next.

Still, for such a small array it seems likely (not
certain, of course) that the difference between a linear
search and a binary search will be negligible. The way
to find out, of course, is to implement the search both
ways and measure. Even then, of course, the result will
only be valid for the implementation at hand and will not
necessarily hold for others.

--
Eric Sosman
es*****@acm-dot-org.invalid


Nov 14 '05 #4
joshc wrote:
I realize this, but still, regardless, there are algorithms that are in general better than others. Perhaps you could suggest a more general
programming newsgroup for this type of question? I remember coming
across one some time ago but am having trouble finding it. It was a
programming group in general.


A long time ago I looked for some sort of generic "algorithms" group,
but the closest I found was for (primarily 3D) graphics algorithms. I
think you are thinking of comp.programming and that may be the best
spot.

In general the way to figure out how good an algorithm is, is using
"big O" This is just a way of saying, if you have an array of n items,
how many more loops will it take to find the answer as n increases. If
it is exponential then the algorithm isn't so good ( O(n^2) .) If it is
linear ( O(n) ) then it's pretty normal. And if it is logarithmic (
O(n^(1/2)) ) then that's pretty much as good as it gets. The only
modifying factor is that a more complex algorithm will take longer for
each iteration so for a small enough n, an O(n^2) can win over an
O(n^(1/2)).

If you do a brute force loop, this will be linear. If there are ten
items, it will at max take 10 loops. If there are 20 then 20, and if 30
then 30.
A binary search is able to cut off half the work each time so until you
double the size of the array there won't be any change in time. 1 item
will take 1 loop. 2 will take 2, 3 will take 2, 4 will take 3, 5->3,
6->3, 7->3, 8->4, etc. So this is a logorithmic and pretty much as good
as it gets. However it is harder to code and requires the computer to
do some extra stuff each loop so for a short enough array, it probably
isn't worth it.

-Chris

Nov 14 '05 #5
Thanks for your reply Chris. I am familar with big O notation from my
algorithms class :). I am going to use a simple linear search due to
this being an embedded environment and concern over code size.

Nov 14 '05 #6
joshc wrote:
I realize this, but still, regardless, there are algorithms that are in
general better than others. Perhaps you could suggest a more general
programming newsgroup for this type of question? I remember coming
across one some time ago but am having trouble finding it. It was a
programming group in general.


Chris Williams mentioned comp.programming, which I've
seen mentioned before but am not personally familiar with.
Given the small size ("less than 50 elements") of your
problem, though, I'll stick by my previous answer: the
difference between O(N) linear search and O(ln N) binary
search will probably be so small as to be difficult to
measure accurately. Keep in mind two things about big-Oh:
It discards all constant factors, and it's only useful for
"sufficiently large" N. For small N, the constant factors
and even the lower-order terms may be more important than
the asymptotic behavior.

Stepping back just a bit, permit me to question the
wisdom of fretting over this decision at an early stage in
the program development. Do you have any a priori reason
to believe the time for these searches will be significant?
If you need to perform ten million such searches per second
that might be the case -- but if the program is going to do
anything other than searching it seems quite likely that the
other activities will have far more effect on the running
time than the searches will. A friend of mine used to speak
of picking the bottle caps off the beach so the sand would
be nice and clean around the rotting whale carcases ...

Jackson's Laws of Program Optimization:

First Law: Don't Do It.

Second Law (for experts only): Don't Do It Yet.

--
Eric Sosman
es*****@acm-dot-org.invalid
Nov 14 '05 #7
*** Rude topposting fixed ***

joshc wrote:
Eric Sosman wrote:
joshc wrote:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find
the first element greater than a certain value, is a simple linear
search the best here(for loop from beginning to end of array)? It
seems like some other search algorithm like binary or whatever
would be of no benefit on this data set.


Questions of efficiency are really outside the bounds
of the C language as such; there are many implementations
of C and the timing characteristics of a given operation
differ from one implementation to the next.

Still, for such a small array it seems likely (not
certain, of course) that the difference between a linear
search and a binary search will be negligible. The way
to find out, of course, is to implement the search both
ways and measure. Even then, of course, the result will
only be valid for the implementation at hand and will not
necessarily hold for others.


I realize this, but still, regardless, there are algorithms that
are in general better than others. Perhaps you could suggest a
more general programming newsgroup for this type of question? I
remember coming across one some time ago but am having trouble
finding it. It was a programming group in general.


comp.programming would be suitable.

Please don't toppost. Your answer belongs after (or intermixed
with) the material to which you reply, with non-germane stuff
snipped out.

As long as you only have 50 elements the complexity of anything
more elegant than a simple linear search suggest you shouldn't
bother. You can even avoid any need for sorting with a linear
search, except that you will have to check the entire array rather
than just values up to a threshold (one half, on average).

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson

Nov 14 '05 #8
Gregory Toomey <no****@bigpond.com> wrote:
joshc wrote:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the
first element greater than a certain value, is a simple linear search
the best here(for loop from beginning to end of array)? It seems like
some other search algorithm like binary or whatever would be of no
benefit on this data set.
Alternatives
- hash table - insert n slements O(n); search O(1)
- quick sort - sort n elements O(n log n); search O(log n)


It's already known to be sorted.
A hash table is close to optimal.


For fewer than 50 elements?

Richard
Nov 14 '05 #9
On 15 Feb 2005 19:27:59 -0800, in comp.lang.c , "joshc"
<jo********@gmail.com> wrote:
I realize this, but still, regardless, there are algorithms that are in
general better than others. Perhaps you could suggest a more general
programming newsgroup for this type of question? I


comp.programming?

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Nov 14 '05 #10
On 15 Feb 2005 18:35:49 -0800, in comp.lang.c , "joshc"
<jo********@gmail.com> wrote:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the
first element greater than a certain value, is a simple linear search
the best here(for loop from beginning to end of array)? It seems like
some other search algorithm like binary or whatever would be of no
benefit on this data set.


for such a small data set, unless the comparison of elements to your target
is horribly expensive or you loop repeatedly on the search, a linear search
will be perfectly adequate. For advice on general algos try
comp.programming.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Nov 14 '05 #11
On 15 Feb 2005 19:27:59 -0800, in comp.lang.c , "joshc"
<jo********@gmail.com> wrote:
I realize this, but still, regardless, there are algorithms that are in
general better than others. Perhaps you could suggest a more general
programming newsgroup for this type of question? I


comp.programming?

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Nov 14 '05 #12
On 15 Feb 2005 19:27:59 -0800, in comp.lang.c , "joshc"
<jo********@gmail.com> wrote:
I realize this, but still, regardless, there are algorithms that are in
general better than others. Perhaps you could suggest a more general
programming newsgroup for this type of question? I


comp.programming?

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>

----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Nov 14 '05 #13
On Tue, 15 Feb 2005 20:46:00 -0800, Chris Williams wrote:

....
In general the way to figure out how good an algorithm is, is using
"big O" This is just a way of saying, if you have an array of n items,
how many more loops will it take to find the answer as n increases. If
it is exponential then the algorithm isn't so good ( O(n^2) .)
That is polynomiual, exponential woul be something like O(2^n). E.g. an
algorithm that searched every bit patter of n bits.
If it is
linear ( O(n) ) then it's pretty normal. And if it is logarithmic (
O(n^(1/2)) )
That is also polynomial although a better behaved one. Logarithmic is
O(log(n))
then that's pretty much as good as it gets.


There are O(1) algorithms. Hash table lookups are (or can be) effectively
O(1).

Lawrence
Nov 14 '05 #14


joshc wrote:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the
first element greater than a certain value, is a simple linear search
the best here(for loop from beginning to end of array)? It seems like
some other search algorithm like binary or whatever would be of no
benefit on this data set.


You need to better define this procedure.

What is this "certain value"? Is this certain value guaranteed to
be in the array or not? If it isn't then search algrothm's may
be useless to find the first element after a certain value.

And what is "first element greater than a certain value". Greater
in element number or value? For example, what if there are two
or more "certain values" in the array. Do you want the element which
has a larger value or do you want the next element which might
contain a value equal to the "certain value"?
--
Al Bowers
Tampa, Fl USA
mailto: xa******@myrapidsys.com (remove the x to send email)
http://www.geocities.com/abowers822/

Nov 14 '05 #15
Mr. CBFalconer, I'm pretty new to the newsgroups but how did I toppost?
I am using google groups so to avoid their buginess I clicked the show
options->reply button on the most recent post. Is that not what I am
*supposed* to do?

Nov 14 '05 #16
Al Bowers wrote:
You need to better define this procedure.

What is this "certain value"? Is this certain value guaranteed to
be in the array or not? If it isn't then search algrothm's may
be useless to find the first element after a certain value.

And what is "first element greater than a certain value". Greater
in element number or value? For example, what if there are two
or more "certain values" in the array. Do you want the element which
has a larger value or do you want the next element which might
contain a value equal to the "certain value"?


Alright, everyone, no need to waste anymore time on this thread. In all
honesty I don't know whether the search is taking up a significant
amount of time, but I do know that some functions that perform the
linear search as part of their processing are accounting for most of
the execution time(I profiled it). I doubt it's the search though since
there are some multiplications and divisions in the functions.

I realize it was probably a stupid question to ask since the search is
probably not even what's slowing my code down.

Nov 14 '05 #17

Lawrence Kirby wrote:
On Tue, 15 Feb 2005 20:46:00 -0800, Chris Williams wrote:
If
it is exponential then the algorithm isn't so good ( O(n^2) .)
That is polynomiual, exponential woul be something like O(2^n). E.g.

an algorithm that searched every bit patter of n bits.
Hm. Had thought exponential growth referred to _any_ curve which
increased "steepness" as you went along the x axis.
If it is
linear ( O(n) ) then it's pretty normal. And if it is logarithmic (
O(n^(1/2)) )


That is also polynomial although a better behaved one. Logarithmic is
O(log(n))


=\
then that's pretty much as good as it gets.


There are O(1) algorithms. Hash table lookups are (or can be)

effectively O(1).


That makes sense. And in the example of the OP's array, if there was
some sort of pattern to the data you might be able to formulate an 0(1)
mathematical formula with which to access the array.

Thank you,
Chris

Nov 14 '05 #18
joshc wrote:

Mr. CBFalconer, I'm pretty new to the newsgroups but how did I toppost?
I am using google groups so to avoid their buginess I clicked the show
options->reply button on the most recent post. Is that not what I am
*supposed* to do?


Do that on the particular message you wish to reply to. Then
delete the quoted portions that are not germane to your reply.
Then put your reply AFTER (or intermixed with) the portions you are
replying to. That way the result stands by itself and reads
correctly.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
Nov 14 '05 #19
"joshc" <jo********@gmail.com> writes:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the
first element greater than a certain value, is a simple linear search
the best here(for loop from beginning to end of array)? It seems like
some other search algorithm like binary or whatever would be of no
benefit on this data set.


Gosh folks...

A linear search will take about 25 compares on the average.

A binary search will take about 6 compares on the average.

Roughly speaking, a reasonable guess is that the binary search will
run about 3 times as fast as the linear search. Whether that makes
any perceptible difference in the application or not is another
question altogether.

Programming a binary search should be something any competent
programmer can do almost as fast as he/she can write it down:

/* Return the smallest index of an element greater than
* the supplied argument x. If there is no such element,
* return the index where the element would be if there
* were one (which will be 'how_many' if how_many < 0
* is true).
*
* NEEDS: argument 'v' array be sorted in increasing
* order. NOT CHECKED.
*
* Disclaimer: not tested or compiled in any way.
*/

int
first_larger( SomeType v[], int how_many, SomeType x ){
int i = 0, j = how_many;

if( i >= j || v[0] > x ) return 0;

while( i + 1 != j ){
int m = (i + j) / 2;
if( v[m] <= x ) i = m;
else j = m;
}

return j;
}

And, just a few meta-comments.

The comp.lang.c group isn't normally the place to ask this kind of
question; comp.programming is better.

If you asked this question because you got assigned it as homework,
then look over the comments and the code until you think you see how
it works, then stop looking at it and try to write the code yourself.
If you don't you're cheating yourself (and besides which karma will
get you :).
Nov 14 '05 #20
On 16 Feb 2005 19:04:00 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:
"joshc" <jo********@gmail.com> writes:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the
first element greater than a certain value, is a simple linear search
the best here(for loop from beginning to end of array)? It seems like
some other search algorithm like binary or whatever would be of no
benefit on this data set.
Gosh folks...

A linear search will take about 25 compares on the average.

A binary search will take about 6 compares on the average.

Roughly speaking, a reasonable guess is that the binary search will
run about 3 times as fast as the linear search. Whether that makes
any perceptible difference in the application or not is another
question altogether.


This isn't right. First of all you need two compares in the binary
search loop, one for the element compare, and one for the loop test,
so the number of compares is about 12. Secondly the binary search
loop requires an addition and a divide by 2. On the other hand the
linear compare loop only requires a compare.
Programming a binary search should be something any competent
programmer can do almost as fast as he/she can write it down:
I haven't checked your code; however coding a binary search completely
correctly is notoriously trickier than it seems.

/* Return the smallest index of an element greater than
* the supplied argument x. If there is no such element,
* return the index where the element would be if there
* were one (which will be 'how_many' if how_many < 0
* is true).
*
* NEEDS: argument 'v' array be sorted in increasing
* order. NOT CHECKED.
*
* Disclaimer: not tested or compiled in any way.
*/

int
first_larger( SomeType v[], int how_many, SomeType x ){
int i = 0, j = how_many;

if( i >= j || v[0] > x ) return 0;

while( i + 1 != j ){
int m = (i + j) / 2;
if( v[m] <= x ) i = m;
else j = m;
}

return j;
}

And, just a few meta-comments.

The comp.lang.c group isn't normally the place to ask this kind of
question; comp.programming is better.
True.

If you asked this question because you got assigned it as homework,
then look over the comments and the code until you think you see how
it works, then stop looking at it and try to write the code yourself.
If you don't you're cheating yourself (and besides which karma will
get you :).


IMO it doesn't sound like a homework question; rather it sounds like
the sort of question that an inexperienced programmer would ask.

Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
All my life I wanted to be someone;
I guess I should have been more specific.
Nov 14 '05 #21
cr*@tiac.net (Richard Harter) writes:
On 16 Feb 2005 19:04:00 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:
"joshc" <jo********@gmail.com> writes:
If I have an array of data that I know to be sorted in increasing
order, and the array is less than 50 elements, and I want to find the
first element greater than a certain value, is a simple linear search
the best here(for loop from beginning to end of array)? It seems like
some other search algorithm like binary or whatever would be of no
benefit on this data set.


Gosh folks...

A linear search will take about 25 compares on the average.

A binary search will take about 6 compares on the average.

Roughly speaking, a reasonable guess is that the binary search will
run about 3 times as fast as the linear search. Whether that makes
any perceptible difference in the application or not is another
question altogether.


This isn't right. First of all you need two compares in the binary
search loop, one for the element compare, and one for the loop test,
so the number of compares is about 12. Secondly the binary search
loop requires an addition and a divide by 2. On the other hand the
linear compare loop only requires a compare.


The linear search needs two compares per loop iteration also - one to
test the value, and one to check the loop index.

The binary search needs an add and a divide by 2 (usually a shift);
also one assignment. The linear search needs an increment. Since
25/6 > 4, I think "about 3 as fast" is a fair statement, especially
since it was given as a "reasonable guess".

Programming a binary search should be something any competent
programmer can do almost as fast as he/she can write it down:


I haven't checked your code; however coding a binary search completely
correctly is notoriously trickier than it seems.


I used to think that too before I learned how to write a binary search
that is both simple and solid. I suspect the idea that writing a
binary search is tricky was much of the motivation of the OP; that's
part of why I wrote this.

Incidentally, looking over the code, it still looks right to me. The
comment, however, has an error - should read 'how_many >= 0'. And for
those who notice nits, "increasing" should be "non-decreasing"
instead.

/* Return the smallest index of an element greater than
* the supplied argument x. If there is no such element,
* return the index where the element would be if there
* were one (which will be 'how_many' if how_many < 0
* is true).
*
* NEEDS: argument 'v' array be sorted in increasing
* order. NOT CHECKED.
*
* Disclaimer: not tested or compiled in any way.
*/

int
first_larger( SomeType v[], int how_many, SomeType x ){
int i = 0, j = how_many;

if( i >= j || v[0] > x ) return 0;

while( i + 1 != j ){
int m = (i + j) / 2;
if( v[m] <= x ) i = m;
else j = m;
}

return j;
}

Nov 14 '05 #22
On Wed, 16 Feb 2005 05:17:17 -0800, Chris Williams wrote:

Lawrence Kirby wrote:
On Tue, 15 Feb 2005 20:46:00 -0800, Chris Williams wrote:
> If
> it is exponential then the algorithm isn't so good ( O(n^2) .)


That is polynomiual, exponential woul be something like O(2^n). E.g.

an
algorithm that searched every bit patter of n bits.


Hm. Had thought exponential growth referred to _any_ curve which
increased "steepness" as you went along the x axis.


Exponential typically refers to somthing of the form X^N where ^ is the
power operator. It is sometimes loosely used to refer to something that
grows faster than polynomial (i.e. N^X for arbirary large X). For example
2^N will exceed N^X with sufficiently large N, no matter how large you
make X.

Lawrence
Nov 14 '05 #23

In article <kf*************@alumnus.caltech.edu>, Tim Rentsch <tx*@alumnus.caltech.edu> writes:
cr*@tiac.net (Richard Harter) writes:
On 16 Feb 2005 19:04:00 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:
"joshc" <jo********@gmail.com> writes:

> If I have an array of data that I know to be sorted in increasing
> order, and the array is less than 50 elements, and I want to find the
> first element greater than a certain value, is a simple linear search
> the best here(for loop from beginning to end of array)?

A linear search will take about 25 compares on the average.

A binary search will take about 6 compares on the average.

Roughly speaking, a reasonable guess is that the binary search will
run about 3 times as fast as the linear search. Whether that makes
any perceptible difference in the application or not is another
question altogether.

However, a linear search will correctly predict nearly all the
branches, while a binary search will mispredict half of them, on
average.

Unless comparison time dominates, I wouldn't call your guess
"reasonable", for that reason alone - and there are others.
(Consider we know nothing of the size of items and the memory-
management features of the platform, so we can't guess whether the
binary search will tend to cause more cache misses, for example.)
This isn't right. First of all you need two compares in the binary
search loop, one for the element compare, and one for the loop test,
so the number of compares is about 12. Secondly the binary search
loop requires an addition and a divide by 2. On the other hand the
linear compare loop only requires a compare.


The linear search needs two compares per loop iteration also - one to
test the value, and one to check the loop index.


Any decent optimizer will unroll the loop, so loop index comparisons
will only add N/O comparisons, where O is the unrolling factor.

And if loop index comparisons take about as long as value comparisons,
then the latter must be very quick - so you fall foul of the branch-
misprediction penalty of binary search.

And this is why premature optimization is a poor idea.

Now, in this case, the OP did follow up by stating that profiling had
indicated his search function was a hot spot; and since that's the
case, he might indeed want to try various alternatives. If compar-
isons are very expensive, the binary search might even help - though
I suspect he'd be better off revisiting his data structure, as
various people suggested.

--
Michael Wojcik mi************@microfocus.com

Every allegiance to some community eventually involves such a fetish,
which functions as the disavowal of its founding crime: is not 'America'
the fetish of an infinitely open space enabling every individual to
pursue happiness in his or her own way? -- Slavoj Zizek
Nov 14 '05 #24
On 17 Feb 2005 02:30:11 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:
cr*@tiac.net (Richard Harter) writes:
On 16 Feb 2005 19:04:00 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:
>"joshc" <jo********@gmail.com> writes:
>
>> If I have an array of data that I know to be sorted in increasing
>> order, and the array is less than 50 elements, and I want to find the
>> first element greater than a certain value, is a simple linear search
>> the best here(for loop from beginning to end of array)? It seems like
>> some other search algorithm like binary or whatever would be of no
>> benefit on this data set.
>
>Gosh folks...
>
>A linear search will take about 25 compares on the average.
>
>A binary search will take about 6 compares on the average.
>
>Roughly speaking, a reasonable guess is that the binary search will
>run about 3 times as fast as the linear search. Whether that makes
>any perceptible difference in the application or not is another
>question altogether.
This isn't right. First of all you need two compares in the binary
search loop, one for the element compare, and one for the loop test,
so the number of compares is about 12. Secondly the binary search
loop requires an addition and a divide by 2. On the other hand the
linear compare loop only requires a compare.


The linear search needs two compares per loop iteration also - one to
test the value, and one to check the loop index.


One comparison suffices if you use a sentinel. The following code
fragment is illustrative (but unchecked).

if (data[0] >= testval) return 0;
ptr = data + data_length;
while (*--ptr >= testval);
return ptr-data;

One decrement, one compare per element.

The binary search needs an add and a divide by 2 (usually a shift);
also one assignment. The linear search needs an increment. Since
25/6 > 4, I think "about 3 as fast" is a fair statement, especially
since it was given as a "reasonable guess".
It's a reasonable guess if you don't do the arithmetic and count
operations, and if you've never benchmarked the two approaches. For a
linear search the costs per iteration are

1 data compare & branch
1 increment/decrement

For the binary search the costs per iteration are:

1 data compare & branch
1/2 branch (then clause)
1 index compare and branch
1 addition
1 assignment
1 divide by 2
1 addition by 1 (avoidable)

If we just count operations that's 2 for a linear search and 6.5 for a
binary search which is just about a wash. However the code for the
linear search is a better candidate for optimization. Some years ago
I ran benchmarks on just this question; the break even point was
somewhere around 50. I expect that the results would be about the
same on more modern machines.


>Programming a binary search should be something any competent
>programmer can do almost as fast as he/she can write it down:
I haven't checked your code; however coding a binary search completely
correctly is notoriously trickier than it seems.


I used to think that too before I learned how to write a binary search
that is both simple and solid. I suspect the idea that writing a
binary search is tricky was much of the motivation of the OP; that's
part of why I wrote this.


It's not just me or you; errors in writing binary searches are (or
were) common. IIRC Bentley makes just that observation in
"Programming Pearls" and, of course, Dijkstra had some pungent
comments on the subject.

Incidentally, looking over the code, it still looks right to me. The
comment, however, has an error - should read 'how_many >= 0'. And for
those who notice nits, "increasing" should be "non-decreasing"
instead.
Well you see, "looks right" is not quite as reassuring as "proven
right". I expect it probably is - most of us reuse validated code
schemas for these little exercises.

> /* Return the smallest index of an element greater than
> * the supplied argument x. If there is no such element,
> * return the index where the element would be if there
> * were one (which will be 'how_many' if how_many < 0
> * is true).
> *
> * NEEDS: argument 'v' array be sorted in increasing
> * order. NOT CHECKED.
> *
> * Disclaimer: not tested or compiled in any way.
> */
>
> int
> first_larger( SomeType v[], int how_many, SomeType x ){
> int i = 0, j = how_many;
>
> if( i >= j || v[0] > x ) return 0;
>
> while( i + 1 != j ){
> int m = (i + j) / 2;
> if( v[m] <= x ) i = m;
> else j = m;
> }
>
> return j;
> }
>


Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
All my life I wanted to be someone;
I guess I should have been more specific.
Nov 14 '05 #25
cr*@tiac.net (Richard Harter) writes:
On 17 Feb 2005 02:30:11 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:
cr*@tiac.net (Richard Harter) writes:
On 16 Feb 2005 19:04:00 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:

>"joshc" <jo********@gmail.com> writes:
>
>> If I have an array of data that I know to be sorted in increasing
>> order, and the array is less than 50 elements, and I want to find the
>> first element greater than a certain value, is a simple linear search
>> the best here(for loop from beginning to end of array)? It seems like
>> some other search algorithm like binary or whatever would be of no
>> benefit on this data set.
>
>Gosh folks...
>
>A linear search will take about 25 compares on the average.
>
>A binary search will take about 6 compares on the average.
>
>Roughly speaking, a reasonable guess is that the binary search will
>run about 3 times as fast as the linear search. Whether that makes
>any perceptible difference in the application or not is another
>question altogether.

This isn't right. First of all you need two compares in the binary
search loop, one for the element compare, and one for the loop test,
so the number of compares is about 12. Secondly the binary search
loop requires an addition and a divide by 2. On the other hand the
linear compare loop only requires a compare.


The linear search needs two compares per loop iteration also - one to
test the value, and one to check the loop index.


One comparison suffices if you use a sentinel. The following code
fragment is illustrative (but unchecked).

if (data[0] >= testval) return 0;
ptr = data + data_length;
while (*--ptr >= testval);
return ptr-data;

One decrement, one compare per element.

The binary search needs an add and a divide by 2 (usually a shift);
also one assignment. The linear search needs an increment. Since
25/6 > 4, I think "about 3 as fast" is a fair statement, especially
since it was given as a "reasonable guess".


It's a reasonable guess if you don't do the arithmetic and count
operations, and if you've never benchmarked the two approaches. For a
linear search the costs per iteration are

1 data compare & branch
1 increment/decrement

For the binary search the costs per iteration are:

1 data compare & branch
1/2 branch (then clause)
1 index compare and branch
1 addition
1 assignment
1 divide by 2
1 addition by 1 (avoidable)

If we just count operations that's 2 for a linear search and 6.5 for a
binary search which is just about a wash. However the code for the
linear search is a better candidate for optimization. Some years ago
I ran benchmarks on just this question; the break even point was
somewhere around 50. I expect that the results would be about the
same on more modern machines.


>Programming a binary search should be something any competent
>programmer can do almost as fast as he/she can write it down:

I haven't checked your code; however coding a binary search completely
correctly is notoriously trickier than it seems.


I used to think that too before I learned how to write a binary search
that is both simple and solid. I suspect the idea that writing a
binary search is tricky was much of the motivation of the OP; that's
part of why I wrote this.


It's not just me or you; errors in writing binary searches are (or
were) common. IIRC Bentley makes just that observation in
"Programming Pearls" and, of course, Dijkstra had some pungent
comments on the subject.

Incidentally, looking over the code, it still looks right to me. The
comment, however, has an error - should read 'how_many >= 0'. And for
those who notice nits, "increasing" should be "non-decreasing"
instead.


Well you see, "looks right" is not quite as reassuring as "proven
right". I expect it probably is - most of us reuse validated code
schemas for these little exercises.

> /* Return the smallest index of an element greater than
> * the supplied argument x. If there is no such element,
> * return the index where the element would be if there
> * were one (which will be 'how_many' if how_many < 0
> * is true).
> *
> * NEEDS: argument 'v' array be sorted in increasing
> * order. NOT CHECKED.
> *
> * Disclaimer: not tested or compiled in any way.
> */
>
> int
> first_larger( SomeType v[], int how_many, SomeType x ){
> int i = 0, j = how_many;
>
> if( i >= j || v[0] > x ) return 0;
>
> while( i + 1 != j ){
> int m = (i + j) / 2;
> if( v[m] <= x ) i = m;
> else j = m;
> }
>
> return j;
> }
>


Actually I think we are mostly in agreement here, despite the apparent
differences in the postings. There are some different assumptions and
the language used may have been misleading in places. Let's see if
those can be straightened out.

The phrase "is a reasonable guess" (for the performance factor of
three) probably is better put as "is a reasonable back of the envelope
estimate". That's meant with the usual errors bars and ignoring of
(unknown or unspecified) second-order effects.

I admit, the estimate was skewed toward the case where the element
compare was "expensive" relative to other operations. That assumption
is reasonably good if the elements being compared are strings; fair
if the elements being compared are floating point or mixed-mode; poor
if the elements being compared are int's. I still think it's a fair
assumption for calculating a BOTE estimate, since it's standard in
comparing algorithm performance; but you're right that it's not a
good assumption when the elements being compared are int's.

You're also right that, for int arrays of 50 elements, a straight
linear search (not with sentinel) runs only a little slower than a
binary search like the one I illustrated. Out of curiosity I ran a
few benchmarks, and the simple linear search was only 25% slower than
the simple binary search. (In case anyone was wondering, the platform
here is an x86 architecture, with gcc -O2 doing the compiles.)

Using a sentinel on the linear search is nice technique, and
definitely an improvement. (Incidentally, the code snippet posted has
two errors - the two >= comparisons should be > comparisons, and the
return value should be 'ptr-data + 1' rather than 'ptr-data'.) A
linear search using the sentinel technique ran roughly 1.7 times as
fast as a simple linear search, or 40% faster than a simple binary
search. Completely unrolling the linear search -- which is reasonable
for only 50 elements -- gives a 1.2x speedup relative to the sentinel
linear, or a factor of almost 2.1 over the original simple linear
search. Nice!

I will grant you that many programmers will get binary search wrong
(and that this observation may have been made by messieurs Bentley and
Dijkstra). However, I believe this result obtains not because binary
search is inherently difficult, but because it is practiced wrongly by
less competent programmers, and because the myth of it being difficult
is perpetuated by instructors who don't teach it very well. When I
said the binary search code I wrote "looks right" I meant it looked
like there were no clerical errors; I'm sure the invariants were
right, as I was sure they were when I first posted it. It's become
second nature for me to make sure loop invariants are right, and
anyone who adopts this discipline can program binary searches with
confidence. (Come to think of it, asking a candidate to program a
simple binary search might make a good interview question.)

Fortunately for the poor binary search, it still has something to
offer in response to the performance of linear/sentinel search and
unrolled linear search in our 'int array of 50 element' benchmarks.

Combining two levels of binary comparison and then doing a simple
linear search (that is, not using the sentinel technique) runs a tad
faster (roughly 10%) than a linear search with sentinel .

The function

uint
binary_sentinel( int v[], uint n, int x ){
/** NEEDS: n > 0 **/
if( v[n/2] > x ){
uint i = 0;
while( v[i] <= x ) i++;
return i;
} else {
while( v[--n] > x ) {}
return n+1;
}
}

which combines a single binary split with a linear/sentinel search,
outperforms even the fully unrolled linear search. (Here also the
speedup is roughly 10%.)

Finally, if we compare fully unrolled linear search and fully unrolled
binary search -- both reasonable on 50 element arrays if the search is
on the critical path for performance -- the binary search is 2.3 times
as fast as the linear search. The fully unrolled binary search also
has the nice property that it can work on arrays of any size up to 50
(ok, 63), rather than just a single fixed size of 50.

That all fit with what you were saying?
Nov 14 '05 #26
mw*****@newsguy.com (Michael Wojcik) writes:
In article <kf*************@alumnus.caltech.edu>, Tim Rentsch <tx*@alumnus.caltech.edu> writes:
cr*@tiac.net (Richard Harter) writes:
On 16 Feb 2005 19:04:00 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:
>"joshc" <jo********@gmail.com> writes:
>
>> If I have an array of data that I know to be sorted in increasing
>> order, and the array is less than 50 elements, and I want to find the
>> first element greater than a certain value, is a simple linear search
>> the best here(for loop from beginning to end of array)?
>
>A linear search will take about 25 compares on the average.
>
>A binary search will take about 6 compares on the average.
>
>Roughly speaking, a reasonable guess is that the binary search will
>run about 3 times as fast as the linear search. Whether that makes
>any perceptible difference in the application or not is another
>question altogether.
However, a linear search will correctly predict nearly all the
branches, while a binary search will mispredict half of them, on
average.

Unless comparison time dominates, I wouldn't call your guess
"reasonable", for that reason alone - and there are others.
(Consider we know nothing of the size of items and the memory-
management features of the platform, so we can't guess whether the
binary search will tend to cause more cache misses, for example.)
This isn't right. First of all you need two compares in the binary
search loop, one for the element compare, and one for the loop test,
so the number of compares is about 12. Secondly the binary search
loop requires an addition and a divide by 2. On the other hand the
linear compare loop only requires a compare.


The linear search needs two compares per loop iteration also - one to
test the value, and one to check the loop index.


Any decent optimizer will unroll the loop, so loop index comparisons
will only add N/O comparisons, where O is the unrolling factor.

And if loop index comparisons take about as long as value comparisons,
then the latter must be very quick - so you fall foul of the branch-
misprediction penalty of binary search.

And this is why premature optimization is a poor idea.

Now, in this case, the OP did follow up by stating that profiling had
indicated his search function was a hot spot; and since that's the
case, he might indeed want to try various alternatives. If compar-
isons are very expensive, the binary search might even help - though
I suspect he'd be better off revisiting his data structure, as
various people suggested.


I have the sense you think I was trying to provide some sort of
detailed performance analysis. I wasn't. There was no optimization -
premature or otherwise - in my posting. There were only observations,
no recommendation.

What I was trying to do is give a not-very-deep, but helpful, answer
to a question about whether binary searching would be an improvement
over linear searching. The answer I gave was intended to address the
question asked, as it was stated.

I was also trying to illustrate a way of thinking that would likely to
be helpful to someone thinking about whether it might be worth some
effort trying to improve the performance of this particular function.
I don't specialize in performance enhancement; but engineers that I
know who do (and who are good at what they do) say that big gains in
performance are almost always due to algorithmic improvements rather
than fussing with low-level machine details. I think that view is
borne out here; please see my response to Richard Harter's posting
for details.
Nov 14 '05 #27
On 23 Feb 2005 02:41:17 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:


Actually I think we are mostly in agreement here, despite the apparent
differences in the postings. There are some different assumptions and
the language used may have been misleading in places. Let's see if
those can be straightened out.
Agreed. (Mostly :-))
The phrase "is a reasonable guess" (for the performance factor of
three) probably is better put as "is a reasonable back of the envelope
estimate". That's meant with the usual errors bars and ignoring of
(unknown or unspecified) second-order effects.
Yes and no. The problem here is that we are comparing algorithms with
different cost functions on a restricted range. If we just use number
of iterations for our reasonable guess we're ignoring cost per
iteration. The "reasonable guess" will be off by some constant
factor. This doesn't matter for large n; big O dominants. It's a
different matter for a small restricted range of n. Still, it's a
reasonable guess.

I admit, the estimate was skewed toward the case where the element
compare was "expensive" relative to other operations. That assumption
is reasonably good if the elements being compared are strings; fair
if the elements being compared are floating point or mixed-mode; poor
if the elements being compared are int's. I still think it's a fair
assumption for calculating a BOTE estimate, since it's standard in
comparing algorithm performance; but you're right that it's not a
good assumption when the elements being compared are int's.
Granted. As a caveat comparison costs on strings are distribution
dependent. For quasi-uniform distributions comparison costs for
binary searches will be less than those of linear searches. Surprise!

You're also right that, for int arrays of 50 elements, a straight
linear search (not with sentinel) runs only a little slower than a
binary search like the one I illustrated. Out of curiosity I ran a
few benchmarks, and the simple linear search was only 25% slower than
the simple binary search. (In case anyone was wondering, the platform
here is an x86 architecture, with gcc -O2 doing the compiles.)

Using a sentinel on the linear search is nice technique, and
definitely an improvement. (Incidentally, the code snippet posted has
two errors - the two >= comparisons should be > comparisons, and the
return value should be 'ptr-data + 1' rather than 'ptr-data'.)
Oops.
A
linear search using the sentinel technique ran roughly 1.7 times as
fast as a simple linear search, or 40% faster than a simple binary
search. Completely unrolling the linear search -- which is reasonable
for only 50 elements -- gives a 1.2x speedup relative to the sentinel
linear, or a factor of almost 2.1 over the original simple linear
search. Nice!

I will grant you that many programmers will get binary search wrong
(and that this observation may have been made by messieurs Bentley and
Dijkstra). However, I believe this result obtains not because binary
search is inherently difficult, but because it is practiced wrongly by
less competent programmers, and because the myth of it being difficult
is perpetuated by instructors who don't teach it very well. When I
said the binary search code I wrote "looks right" I meant it looked
like there were no clerical errors; I'm sure the invariants were
right, as I was sure they were when I first posted it. It's become
second nature for me to make sure loop invariants are right, and
anyone who adopts this discipline can program binary searches with
confidence. (Come to think of it, asking a candidate to program a
simple binary search might make a good interview question.)
Ah, okay. I agree, programming a simple binary search would make a
good interview question.

Fortunately for the poor binary search, it still has something to
offer in response to the performance of linear/sentinel search and
unrolled linear search in our 'int array of 50 element' benchmarks.

Combining two levels of binary comparison and then doing a simple
linear search (that is, not using the sentinel technique) runs a tad
faster (roughly 10%) than a linear search with sentinel .

The function

uint
binary_sentinel( int v[], uint n, int x ){
/** NEEDS: n > 0 **/
if( v[n/2] > x ){
uint i = 0;
while( v[i] <= x ) i++;
return i;
} else {
while( v[--n] > x ) {}
return n+1;
}
}

which combines a single binary split with a linear/sentinel search,
outperforms even the fully unrolled linear search. (Here also the
speedup is roughly 10%.)
Finally, if we compare fully unrolled linear search and fully unrolled
binary search -- both reasonable on 50 element arrays if the search is
on the critical path for performance -- the binary search is 2.3 times
as fast as the linear search. The fully unrolled binary search also
has the nice property that it can work on arrays of any size up to 50
(ok, 63), rather than just a single fixed size of 50.
You can use a switch for the unrolled linear search. Still, I like
the unrolled binary search.
That all fit with what you were saying?


Pretty much. Whether or not optimization is premature, successful
optimization requires a full assessment of alternatives, a full
understanding of the data being operated upon, and careful
measurement.
Richard Harter, cr*@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
All my life I wanted to be someone;
I guess I should have been more specific.
Nov 14 '05 #28
cr*@tiac.net (Richard Harter) writes:
On 23 Feb 2005 02:41:17 -0800, Tim Rentsch <tx*@alumnus.caltech.edu>
wrote:
[snip]

Finally, if we compare fully unrolled linear search and fully unrolled
binary search -- both reasonable on 50 element arrays if the search is
on the critical path for performance -- the binary search is 2.3 times
as fast as the linear search. The fully unrolled binary search also
has the nice property that it can work on arrays of any size up to 50
(ok, 63), rather than just a single fixed size of 50.
You can use a switch for the unrolled linear search.


I tried code that used a 'switch' to vector into an unrolled linear
search. On this problem at least, the simple loop-using-sentinel ran
faster, by about 20%.

Still, I like the unrolled binary search.


I have an interesting story about that. I tried two versions of the
unrolled binary search - one version like this:

if( v[3] > x )
if( v[1] > x ) return v[0] > x ? 0 : 1;
else return v[2] > x ? 2 : 3;
else
if( v[5] > x ) return v[4] > x ? 4 : 5;
else return v[6] > x ? 6 : 7;

(only of course moreso, to cover all 50 cases), where the number of
elements in the array must match exactly the unrolling in the code;
and another version like this:

if( n < 4 || v[3] > x )
if( n < 2 || v[1] > x ) return n < 1 || v[0] > x ? 0 : 1;
else return n < 3 || v[2] > x ? 2 : 3;
else
if( n < 6 || v[5] > x ) return n < 5 || v[4] > x ? 4 : 5;
else return n < 7 || v[6] > x ? 6 : 7;

where the number of array elements is checked in parallel construction
with the tests of the values, so now any array size up to the unrolled
size will work. To my surprise the second version actually ran faster
than the first - not by much, only a few percent, but it was nice to
get the more general functionality and have it be "better than free".

Incidentally, I have no explanation for why this might be so. The
code in the second case was roughly twice as many instructions down
each path. I guess branch prediction was doing its thing.

That all fit with what you were saying?


Pretty much. Whether or not optimization is premature, successful
optimization requires a full assessment of alternatives, a full
understanding of the data being operated upon, and careful
measurement.


Roger that.
Nov 14 '05 #29

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: pembed2003 | last post by:
Hi all, I asked this question in the C group but no one seems to be interested in answering it. :-( Basically, I wrote a search and replace function so I can do: char source = "abcd?1234?x";...
1
by: Dave Townsend | last post by:
Hi, Can anybody help me with the following piece of code? The purpose behind the code is to parse HTML files, strip out the tags and return the text between tags. This is part of a larger...
3
by: Johann Blake | last post by:
This aticle presents factual evidence that Google's PageRank, inbound links and keywords are irrelevent to your placement in the search results. It presents several case studies that show...
60
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...
8
by: Ben Fidge | last post by:
Hi I'm working on a site which requires the users to specify a hotel at which they're staying in London. The complete list of hotels comes to something like 1600 records. Each record consists of...
4
by: Dameon | last post by:
Hi All, I have a process where I'd like to search the contents of a file(in a dir) for all occurences (or the count of) of a given string. My goal is to focus more on performance, as some of the...
9
by: Rick | last post by:
I have a large list of objects where each object has a unique (non-overlapping) date range. The list is sorted chronologically. What is the most efficient way to search this list for a single...
14
by: S | last post by:
Any idea on how I would be able to do a search within C# that does ranges or words For example I want to search for Chicken in the string string s1 = "This is Great Chicken";
6
Kelicula
by: Kelicula | last post by:
Why?: One commonly used algorithm is the binary search. If you don't already know it, you should read on. Very helpful. Saves much CPU. Reduces computations exponentially. When searching...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.