question on std::search_n

ma740988

I've been perusing for some time now - I suppose some of the algorithms
in Josuttis. 'Today', I do alot of funny math on data resident within
memory. As part of this. I find myself using std::search_n alot to
find a particular value in a range of memory. I'm fairly pleased with
std::search_n - in that my benchmark ( below ) shows on average an 8
second performance gain comparable to a for loop. The question.
Since I haven't quite gotten my head around most of the algorithms, is
there an alternative to std::search_n that I'm perhaps overlooking?
Ironically I didn't see a lot of use on the web with regards to
std::search_n, nonetheless, hopefully my question is not too 'open
ended'. Source code for bechmark - executed on a PC

# include <iostream>
# include <algorithm>
# include <ctime>

using namespace std;

int main()
{
int const buffer_len( 0x100000 );
int *ptr_mem = new int [ buffer_len ];
std::fill ( ptr_mem, ptr_mem + buffer_len, 0xA5 );

int *ptr_mem_end = ptr_mem + buffer_len;

std::clock_t Start, End;
double Elapsed;
int count(0);
int const num_test(1000);

Start = std::clock();
for ( int idx(0); idx < num_test; ++idx )
{
if ( std::search_n (
ptr_mem,
ptr_mem_end,
buffer_len,
0xA5 ) == ptr_mem_end )
{
++count;
break;
}
}
if ( !count )
{
End = std::clock();
Elapsed = static_cast<double>( ( End - Start ) / CLOCKS_PER_SEC );

std::cout << " Elapsed - std::search_n " << Elapsed << std::endl;
}

count = 0;
Start = std::clock();
for ( int jdx(0); jdx < num_test; ++jdx )
{
for ( int idx(0); idx < buffer_len; ++idx )
{
if ( ptr_mem [idx ] != 0xA5 )
{
++count;
break;
}
}
}
if ( !count )
{
End = std::clock();
Elapsed = static_cast<double>( ( End - Start ) / CLOCKS_PER_SEC );

std::cout << " Elapsed - for loop " << Elapsed << std::endl;
}
delete [] ptr_mem; // try to clean up your mess

}

/*
Elapsed - std::search_n 30
Elapsed - for loop 38

Press any key to continue
*/

Mar 8 '06 #1

Subscribe Post Reply

3218

Dietmar Kuehl

ma740988 wrote:

Since I haven't quite gotten my head around most of the algorithms, is
there an alternative to std::search_n that I'm perhaps overlooking?

The first question is: what do you intend to do? 'search_n()' locates
the first consecutive sequence of the specified length matching the
specified object. There is pretty rare use for this and the example
you gave is not a situation where I would apply 'search_n()': I would
apply 'find()' with an inverted predicate to locate the first position
where the condition does not hold (this is probably what is essentially
happening inside 'search_n()' in you situation).
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Mar 8 '06 #2

ma740988

Dietmar Kuehl wrote:

ma740988 wrote:
Since I haven't quite gotten my head around most of the algorithms, is
there an alternative to std::search_n that I'm perhaps overlooking?
The first question is: what do you intend to do?

Dietmar, Lets assume I transmitted (we'll ignore the protocol used -
just assume you got data somehow ) 4 MiB of data to you. For
simplicity*, lets also assume that this data is all '0xA5'. Your job
then is to ensure that ( as part of what I call 'communication check' )
you received 4 MiB worth of 0xA5's.

That's the gist in a nutshell.

Worse case scenario: Within the 4 MiB of data, each 1 MiB boundary
will have a different value. For instance:

Value Range
0xA5 - 0x000000 - 0x100000
0xA6 - 0x100000 - 0x200000
0xA7 - 0x200000 - 0x300000
0xA8 - 0x300000 - 0x400000

You'll know in advance that the received data will change on 1 MiB
boundaries. That's because I'll tell you prior to sending the data -
what 'mode' I'm in.
The mode is irrelevant though. The important thing surrounds an
approach that'll walk through the memory finding the specific values.

The program shown highlights my current approach to the - simple case.
4 MiB worth of 0xA5's.
'search_n()' locates
the first consecutive sequence of the specified length matching the
specified object. There is pretty rare use for this and the example
you gave is not a situation where I would apply 'search_n()': I would
apply 'find()' with an inverted predicate to locate the first position
where the condition does not hold (this is probably what is essentially
happening inside 'search_n()' in you situation).

An inverted predicated. Sounds like a not1 .. I'll look into a find
approach. The trouble with alot of these algorithms is the boundary
for me - today appears muddled. search versus find. We'll they're
synonymous, but I wonder sometimes if it would have been better to just
'provide the better of the two' as part of the standard. In any event
....

Mar 8 '06 #3

Dietmar Kuehl

ma740988 wrote:

Dietmar, Lets assume I transmitted (we'll ignore the protocol used -
just assume you got data somehow ) 4 MiB of data to you. For
simplicity*, lets also assume that this data is all '0xA5'. Your job
then is to ensure that ( as part of what I call 'communication check' )
you received 4 MiB worth of 0xA5's.
This would be a pretty stupid transmission and I would certainly
represent the received data differently, e.g. using a sequence of
(value, count) pairs. In any case, assuming I got a container with
the data in the form you described, I would use one of two approaches
depending on the characteristics of the container:

I would use 'std::find()' to locate the first object not matching the
condition and 'std::distance()' to determine the number of matching
elements:

if (n != std::distance(c.begin(),
std::find_if(c.begin(), c.end(),
std::bind2nd(std::not_equal_to<char>(), 0xA5))))
...

If the call to 'std::distance()' is not bundled with the call to
'std::find_if()', you would get the position of the first sequence,
too. Note, however, that the sequence is traversed twice in this
situation if the container does not provide random access.
The trouble with alot of these algorithms is the boundary
for me - today appears muddled. search versus find. We'll they're
synonymous,
In some sense 'std::search_n()' is a generalization of 'std::find()':
'std::find()' is identical to a 'std::search_n()' with a count of '1'.
but I wonder sometimes if it would have been better to just
'provide the better of the two' as part of the standard.

Which one is the better one? The simple one which does not work for
everybody? ...or the more general one which requires users in need
of a simple solution to pass extra arguments? Do you really want
to pass on all those arguments for the most general algorithms when
you only need a simple algorithm? Also, do you accept the possible
slowdown due to using a more general algorithm which cannot take
advantage of specific properties? ... or that the general algorithm
may be inapplicable because it requires more specific input e.g. a
forward iterator ('std::search_n()') instead of an input iterator
('std::find()')?
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Mar 8 '06 #4

ma740988

> This would be a pretty stupid transmission and I would certainly

represent the received data differently, e.g. using a sequence of
(value, count) pairs. You lost me there. You're told in advance about the sequence in a
header. You get header/data, header/data - at all times. The header
tells you what to do with the junk you receive/may have received
(assuming tranmittal errors).

In any case, assuming I got a container with the data in the form you described, I would use one of two approaches
depending on the characteristics of the container:

I would use 'std::find()' to locate the first object not matching the
condition and 'std::distance()' to determine the number of matching
elements:

if (n != std::distance(c.begin(),
std::find_if(c.begin(), c.end(),
std::bind2nd(std::not_equal_to<char>(), 0xA5))))
... Ahh!! That's right!! I was missing the bind2nd part. Actually, I was
trying out a combination of count_if(find_if ... )
If the call to 'std::distance()' is not bundled with the call to
'std::find_if()', you would get the position of the first sequence,
too. Note, however, that the sequence is traversed twice in this
situation if the container does not provide random access.

I'm a little ignorant in this area so bear with me. Just so I
understand this: Assume our container is std::list and given 4
elements. Pictorially:

[ begin ][ begin + 1 ][ begin + 2][ begin + 3]

So find_if will run from begin to 1 past begin + 3. Now why is there a
need to traverse twice? All the sequence needs to do is run from being
to 1 past begin + 3 to find a matching value.

The trouble with alot of these algorithms is the boundary
for me - today appears muddled. search versus find. We'll they're
synonymous,

In some sense 'std::search_n()' is a generalization of 'std::find()':
'std::find()' is identical to a 'std::search_n()' with a count of '1'.
but I wonder sometimes if it would have been better to just
'provide the better of the two' as part of the standard.

Which one is the better one? The simple one which does not work for
everybody? ...or the more general one which requires users in need
of a simple solution to pass extra arguments? Do you really want
to pass on all those arguments for the most general algorithms when
you only need a simple algorithm? Also, do you accept the possible
slowdown due to using a more general algorithm which cannot take
advantage of specific properties? ... or that the general algorithm
may be inapplicable because it requires more specific input e.g. a
forward iterator ('std::search_n()') instead of an input iterator
('std::find()')?

Ok. the devil is in the details, I suppose. :) I need to back up and
review once again distinction between the input versus forward
iterators

Thanks for the help

Mar 8 '06 #5

Dietmar Kuehl

ma740988 wrote:

This would be a pretty stupid transmission and I would certainly
represent the received data differently, e.g. using a sequence of
(value, count) pairs. You lost me there. You're told in advance about the sequence in a
header. You get header/data, header/data - at all times. The header
tells you what to do with the junk you receive/may have received
(assuming tranmittal errors).

All I'm saying is that I would read the input and check immediately
the data I receive without ever putting it into any form of container
or sequence (other than the input sequence you might obtain using
'std::istreambuf_iterator<char>'). Especially if the sequences are
huge, this would be much more memory conservative.

If the call to 'std::distance()' is not bundled with the call to
'std::find_if()', you would get the position of the first sequence,
too. Note, however, that the sequence is traversed twice in this
situation if the container does not provide random access.

I'm a little ignorant in this area so bear with me. Just so I
understand this: Assume our container is std::list and given 4
elements. Pictorially:

[ begin ][ begin + 1 ][ begin + 2][ begin + 3]

So find_if will run from begin to 1 past begin + 3. Now why is there a
need to traverse twice? All the sequence needs to do is run from being
to 1 past begin + 3 to find a matching value.

The sequence is traversed once for 'std::find_if()' to determine the
end and then another time for 'std::distance()' to count the number
of elements. Duplicate traversal of 1M list nodes could be more
expensive than bundling the counting with the finding and thus
requiring only one traversal.
Ok. the devil is in the details, I suppose. :)

I wouldn't call the difference between a forward iterator and an
input iterator to be much more severe than just a "detail". But
then, I'm pretty focused on the concepts involved in generic
programming and thus consider things a big deal which most others
don't really care about...
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Mar 8 '06 #6

ma740988

All I'm saying is that I would read the input and check immediately
the data I receive without ever putting it into any form of container
or sequence (other than the input sequence you might obtain using
'std::istreambuf_iterator<char>'). Especially if the sequences are
huge, this would be much more memory conservative.
I see. That's exactly what I doing though. I know where start and end
of the data is ( based on what the header told me ). I konw in memory
where the data resides. From there I just check to ensure that what
the 'sender' claimed he sent matched what I received.
My initial post showed me allocating memory and filling it with A5's.
In my real system, obviously the receiver doesn't need to do all that.
That said, those steps are omitted. The receiver just _walks_ across
the memory - validating the data.

The sequence is traversed once for 'std::find_if()' to determine the
end and then another time for 'std::distance()' to count the number
of elements. Duplicate traversal of 1M list nodes could be more
expensive than bundling the counting with the finding and thus
requiring only one traversal.

Got it. Thanks alot.

Mar 8 '06 #7

question on std::search_n

Similar topics