473,385 Members | 1,569 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

question on std::search_n


I've been perusing for some time now - I suppose some of the algorithms
in Josuttis. 'Today', I do alot of funny math on data resident within
memory. As part of this. I find myself using std::search_n alot to
find a particular value in a range of memory. I'm fairly pleased with
std::search_n - in that my benchmark ( below ) shows on average an 8
second performance gain comparable to a for loop. The question.
Since I haven't quite gotten my head around most of the algorithms, is
there an alternative to std::search_n that I'm perhaps overlooking?
Ironically I didn't see a lot of use on the web with regards to
std::search_n, nonetheless, hopefully my question is not too 'open
ended'. Source code for bechmark - executed on a PC

# include <iostream>
# include <algorithm>
# include <ctime>

using namespace std;

int main()
{
int const buffer_len( 0x100000 );
int *ptr_mem = new int [ buffer_len ];
std::fill ( ptr_mem, ptr_mem + buffer_len, 0xA5 );

int *ptr_mem_end = ptr_mem + buffer_len;

std::clock_t Start, End;
double Elapsed;
int count(0);
int const num_test(1000);

Start = std::clock();
for ( int idx(0); idx < num_test; ++idx )
{
if ( std::search_n (
ptr_mem,
ptr_mem_end,
buffer_len,
0xA5 ) == ptr_mem_end )
{
++count;
break;
}
}
if ( !count )
{
End = std::clock();
Elapsed = static_cast<double>( ( End - Start ) / CLOCKS_PER_SEC );

std::cout << " Elapsed - std::search_n " << Elapsed << std::endl;
}

count = 0;
Start = std::clock();
for ( int jdx(0); jdx < num_test; ++jdx )
{
for ( int idx(0); idx < buffer_len; ++idx )
{
if ( ptr_mem [idx ] != 0xA5 )
{
++count;
break;
}
}
}
if ( !count )
{
End = std::clock();
Elapsed = static_cast<double>( ( End - Start ) / CLOCKS_PER_SEC );

std::cout << " Elapsed - for loop " << Elapsed << std::endl;
}
delete [] ptr_mem; // try to clean up your mess

}

/*
Elapsed - std::search_n 30
Elapsed - for loop 38

Press any key to continue
*/

Mar 8 '06 #1
6 3218
ma740988 wrote:
Since I haven't quite gotten my head around most of the algorithms, is
there an alternative to std::search_n that I'm perhaps overlooking?


The first question is: what do you intend to do? 'search_n()' locates
the first consecutive sequence of the specified length matching the
specified object. There is pretty rare use for this and the example
you gave is not a situation where I would apply 'search_n()': I would
apply 'find()' with an inverted predicate to locate the first position
where the condition does not hold (this is probably what is essentially
happening inside 'search_n()' in you situation).
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Mar 8 '06 #2

Dietmar Kuehl wrote:
ma740988 wrote:
Since I haven't quite gotten my head around most of the algorithms, is
there an alternative to std::search_n that I'm perhaps overlooking?
The first question is: what do you intend to do?

Dietmar, Lets assume I transmitted (we'll ignore the protocol used -
just assume you got data somehow ) 4 MiB of data to you. For
simplicity*, lets also assume that this data is all '0xA5'. Your job
then is to ensure that ( as part of what I call 'communication check' )
you received 4 MiB worth of 0xA5's.

That's the gist in a nutshell.

Worse case scenario: Within the 4 MiB of data, each 1 MiB boundary
will have a different value. For instance:

Value Range
0xA5 - 0x000000 - 0x100000
0xA6 - 0x100000 - 0x200000
0xA7 - 0x200000 - 0x300000
0xA8 - 0x300000 - 0x400000

You'll know in advance that the received data will change on 1 MiB
boundaries. That's because I'll tell you prior to sending the data -
what 'mode' I'm in.
The mode is irrelevant though. The important thing surrounds an
approach that'll walk through the memory finding the specific values.

The program shown highlights my current approach to the - simple case.
4 MiB worth of 0xA5's.
'search_n()' locates
the first consecutive sequence of the specified length matching the
specified object. There is pretty rare use for this and the example
you gave is not a situation where I would apply 'search_n()': I would
apply 'find()' with an inverted predicate to locate the first position
where the condition does not hold (this is probably what is essentially
happening inside 'search_n()' in you situation).


An inverted predicated. Sounds like a not1 .. I'll look into a find
approach. The trouble with alot of these algorithms is the boundary
for me - today appears muddled. search versus find. We'll they're
synonymous, but I wonder sometimes if it would have been better to just
'provide the better of the two' as part of the standard. In any event
....

Mar 8 '06 #3
ma740988 wrote:
Dietmar, Lets assume I transmitted (we'll ignore the protocol used -
just assume you got data somehow ) 4 MiB of data to you. For
simplicity*, lets also assume that this data is all '0xA5'. Your job
then is to ensure that ( as part of what I call 'communication check' )
you received 4 MiB worth of 0xA5's.
This would be a pretty stupid transmission and I would certainly
represent the received data differently, e.g. using a sequence of
(value, count) pairs. In any case, assuming I got a container with
the data in the form you described, I would use one of two approaches
depending on the characteristics of the container:

I would use 'std::find()' to locate the first object not matching the
condition and 'std::distance()' to determine the number of matching
elements:

if (n != std::distance(c.begin(),
std::find_if(c.begin(), c.end(),
std::bind2nd(std::not_equal_to<char>(), 0xA5))))
...

If the call to 'std::distance()' is not bundled with the call to
'std::find_if()', you would get the position of the first sequence,
too. Note, however, that the sequence is traversed twice in this
situation if the container does not provide random access.
The trouble with alot of these algorithms is the boundary
for me - today appears muddled. search versus find. We'll they're
synonymous,
In some sense 'std::search_n()' is a generalization of 'std::find()':
'std::find()' is identical to a 'std::search_n()' with a count of '1'.
but I wonder sometimes if it would have been better to just
'provide the better of the two' as part of the standard.


Which one is the better one? The simple one which does not work for
everybody? ...or the more general one which requires users in need
of a simple solution to pass extra arguments? Do you really want
to pass on all those arguments for the most general algorithms when
you only need a simple algorithm? Also, do you accept the possible
slowdown due to using a more general algorithm which cannot take
advantage of specific properties? ... or that the general algorithm
may be inapplicable because it requires more specific input e.g. a
forward iterator ('std::search_n()') instead of an input iterator
('std::find()')?
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Mar 8 '06 #4
> This would be a pretty stupid transmission and I would certainly
represent the received data differently, e.g. using a sequence of
(value, count) pairs. You lost me there. You're told in advance about the sequence in a
header. You get header/data, header/data - at all times. The header
tells you what to do with the junk you receive/may have received
(assuming tranmittal errors).

In any case, assuming I got a container with the data in the form you described, I would use one of two approaches
depending on the characteristics of the container:

I would use 'std::find()' to locate the first object not matching the
condition and 'std::distance()' to determine the number of matching
elements:

if (n != std::distance(c.begin(),
std::find_if(c.begin(), c.end(),
std::bind2nd(std::not_equal_to<char>(), 0xA5))))
... Ahh!! That's right!! I was missing the bind2nd part. Actually, I was
trying out a combination of count_if(find_if ... )
If the call to 'std::distance()' is not bundled with the call to
'std::find_if()', you would get the position of the first sequence,
too. Note, however, that the sequence is traversed twice in this
situation if the container does not provide random access.


I'm a little ignorant in this area so bear with me. Just so I
understand this: Assume our container is std::list and given 4
elements. Pictorially:

[ begin ][ begin + 1 ][ begin + 2][ begin + 3]

So find_if will run from begin to 1 past begin + 3. Now why is there a
need to traverse twice? All the sequence needs to do is run from being
to 1 past begin + 3 to find a matching value.

The trouble with alot of these algorithms is the boundary
for me - today appears muddled. search versus find. We'll they're
synonymous,


In some sense 'std::search_n()' is a generalization of 'std::find()':
'std::find()' is identical to a 'std::search_n()' with a count of '1'.
but I wonder sometimes if it would have been better to just
'provide the better of the two' as part of the standard.


Which one is the better one? The simple one which does not work for
everybody? ...or the more general one which requires users in need
of a simple solution to pass extra arguments? Do you really want
to pass on all those arguments for the most general algorithms when
you only need a simple algorithm? Also, do you accept the possible
slowdown due to using a more general algorithm which cannot take
advantage of specific properties? ... or that the general algorithm
may be inapplicable because it requires more specific input e.g. a
forward iterator ('std::search_n()') instead of an input iterator
('std::find()')?

Ok. the devil is in the details, I suppose. :) I need to back up and
review once again distinction between the input versus forward
iterators

Thanks for the help

Mar 8 '06 #5
ma740988 wrote:
This would be a pretty stupid transmission and I would certainly
represent the received data differently, e.g. using a sequence of
(value, count) pairs. You lost me there. You're told in advance about the sequence in a
header. You get header/data, header/data - at all times. The header
tells you what to do with the junk you receive/may have received
(assuming tranmittal errors).


All I'm saying is that I would read the input and check immediately
the data I receive without ever putting it into any form of container
or sequence (other than the input sequence you might obtain using
'std::istreambuf_iterator<char>'). Especially if the sequences are
huge, this would be much more memory conservative.
If the call to 'std::distance()' is not bundled with the call to
'std::find_if()', you would get the position of the first sequence,
too. Note, however, that the sequence is traversed twice in this
situation if the container does not provide random access.


I'm a little ignorant in this area so bear with me. Just so I
understand this: Assume our container is std::list and given 4
elements. Pictorially:

[ begin ][ begin + 1 ][ begin + 2][ begin + 3]

So find_if will run from begin to 1 past begin + 3. Now why is there a
need to traverse twice? All the sequence needs to do is run from being
to 1 past begin + 3 to find a matching value.


The sequence is traversed once for 'std::find_if()' to determine the
end and then another time for 'std::distance()' to count the number
of elements. Duplicate traversal of 1M list nodes could be more
expensive than bundling the counting with the finding and thus
requiring only one traversal.
Ok. the devil is in the details, I suppose. :)


I wouldn't call the difference between a forward iterator and an
input iterator to be much more severe than just a "detail". But
then, I'm pretty focused on the concepts involved in generic
programming and thus consider things a big deal which most others
don't really care about...
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Mar 8 '06 #6
All I'm saying is that I would read the input and check immediately
the data I receive without ever putting it into any form of container
or sequence (other than the input sequence you might obtain using
'std::istreambuf_iterator<char>'). Especially if the sequences are
huge, this would be much more memory conservative.
I see. That's exactly what I doing though. I know where start and end
of the data is ( based on what the header told me ). I konw in memory
where the data resides. From there I just check to ensure that what
the 'sender' claimed he sent matched what I received.
My initial post showed me allocating memory and filling it with A5's.
In my real system, obviously the receiver doesn't need to do all that.
That said, those steps are omitted. The receiver just _walks_ across
the memory - validating the data.

The sequence is traversed once for 'std::find_if()' to determine the
end and then another time for 'std::distance()' to count the number
of elements. Duplicate traversal of 1M list nodes could be more
expensive than bundling the counting with the finding and thus
requiring only one traversal.

Got it. Thanks alot.

Mar 8 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Chris Jefferson | last post by:
Hello! For those who aren't familar BTW, the "search_n" STL function takes two iterators, a "count" value and and either a "value" or a "value" and a binary predicate and searches between the...
5
by: ma740988 | last post by:
There's a need for me to move around at specified offsets within memory. As as a result - long story short - unsigned char* is the type of choice. At issue: Consider the case ( test code ) where...
6
by: cesco | last post by:
Hi, say I have a vector like the following: vec = and I'd like to know via a function (for example, ConsecutiveIntegers(vec, N)) whether there are N consecutive integers. So, for example,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.