473,779 Members | 2,035 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

question on std::search_n


I've been perusing for some time now - I suppose some of the algorithms
in Josuttis. 'Today', I do alot of funny math on data resident within
memory. As part of this. I find myself using std::search_n alot to
find a particular value in a range of memory. I'm fairly pleased with
std::search_n - in that my benchmark ( below ) shows on average an 8
second performance gain comparable to a for loop. The question.
Since I haven't quite gotten my head around most of the algorithms, is
there an alternative to std::search_n that I'm perhaps overlooking?
Ironically I didn't see a lot of use on the web with regards to
std::search_n, nonetheless, hopefully my question is not too 'open
ended'. Source code for bechmark - executed on a PC

# include <iostream>
# include <algorithm>
# include <ctime>

using namespace std;

int main()
{
int const buffer_len( 0x100000 );
int *ptr_mem = new int [ buffer_len ];
std::fill ( ptr_mem, ptr_mem + buffer_len, 0xA5 );

int *ptr_mem_end = ptr_mem + buffer_len;

std::clock_t Start, End;
double Elapsed;
int count(0);
int const num_test(1000);

Start = std::clock();
for ( int idx(0); idx < num_test; ++idx )
{
if ( std::search_n (
ptr_mem,
ptr_mem_end,
buffer_len,
0xA5 ) == ptr_mem_end )
{
++count;
break;
}
}
if ( !count )
{
End = std::clock();
Elapsed = static_cast<dou ble>( ( End - Start ) / CLOCKS_PER_SEC );

std::cout << " Elapsed - std::search_n " << Elapsed << std::endl;
}

count = 0;
Start = std::clock();
for ( int jdx(0); jdx < num_test; ++jdx )
{
for ( int idx(0); idx < buffer_len; ++idx )
{
if ( ptr_mem [idx ] != 0xA5 )
{
++count;
break;
}
}
}
if ( !count )
{
End = std::clock();
Elapsed = static_cast<dou ble>( ( End - Start ) / CLOCKS_PER_SEC );

std::cout << " Elapsed - for loop " << Elapsed << std::endl;
}
delete [] ptr_mem; // try to clean up your mess

}

/*
Elapsed - std::search_n 30
Elapsed - for loop 38

Press any key to continue
*/

Mar 8 '06 #1
6 3238
ma740988 wrote:
Since I haven't quite gotten my head around most of the algorithms, is
there an alternative to std::search_n that I'm perhaps overlooking?


The first question is: what do you intend to do? 'search_n()' locates
the first consecutive sequence of the specified length matching the
specified object. There is pretty rare use for this and the example
you gave is not a situation where I would apply 'search_n()': I would
apply 'find()' with an inverted predicate to locate the first position
where the condition does not hold (this is probably what is essentially
happening inside 'search_n()' in you situation).
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Mar 8 '06 #2

Dietmar Kuehl wrote:
ma740988 wrote:
Since I haven't quite gotten my head around most of the algorithms, is
there an alternative to std::search_n that I'm perhaps overlooking?
The first question is: what do you intend to do?

Dietmar, Lets assume I transmitted (we'll ignore the protocol used -
just assume you got data somehow ) 4 MiB of data to you. For
simplicity*, lets also assume that this data is all '0xA5'. Your job
then is to ensure that ( as part of what I call 'communication check' )
you received 4 MiB worth of 0xA5's.

That's the gist in a nutshell.

Worse case scenario: Within the 4 MiB of data, each 1 MiB boundary
will have a different value. For instance:

Value Range
0xA5 - 0x000000 - 0x100000
0xA6 - 0x100000 - 0x200000
0xA7 - 0x200000 - 0x300000
0xA8 - 0x300000 - 0x400000

You'll know in advance that the received data will change on 1 MiB
boundaries. That's because I'll tell you prior to sending the data -
what 'mode' I'm in.
The mode is irrelevant though. The important thing surrounds an
approach that'll walk through the memory finding the specific values.

The program shown highlights my current approach to the - simple case.
4 MiB worth of 0xA5's.
'search_n()' locates
the first consecutive sequence of the specified length matching the
specified object. There is pretty rare use for this and the example
you gave is not a situation where I would apply 'search_n()': I would
apply 'find()' with an inverted predicate to locate the first position
where the condition does not hold (this is probably what is essentially
happening inside 'search_n()' in you situation).


An inverted predicated. Sounds like a not1 .. I'll look into a find
approach. The trouble with alot of these algorithms is the boundary
for me - today appears muddled. search versus find. We'll they're
synonymous, but I wonder sometimes if it would have been better to just
'provide the better of the two' as part of the standard. In any event
....

Mar 8 '06 #3
ma740988 wrote:
Dietmar, Lets assume I transmitted (we'll ignore the protocol used -
just assume you got data somehow ) 4 MiB of data to you. For
simplicity*, lets also assume that this data is all '0xA5'. Your job
then is to ensure that ( as part of what I call 'communication check' )
you received 4 MiB worth of 0xA5's.
This would be a pretty stupid transmission and I would certainly
represent the received data differently, e.g. using a sequence of
(value, count) pairs. In any case, assuming I got a container with
the data in the form you described, I would use one of two approaches
depending on the characteristics of the container:

I would use 'std::find()' to locate the first object not matching the
condition and 'std::distance( )' to determine the number of matching
elements:

if (n != std::distance(c .begin(),
std::find_if(c. begin(), c.end(),
std::bind2nd(st d::not_equal_to <char>(), 0xA5))))
...

If the call to 'std::distance( )' is not bundled with the call to
'std::find_if() ', you would get the position of the first sequence,
too. Note, however, that the sequence is traversed twice in this
situation if the container does not provide random access.
The trouble with alot of these algorithms is the boundary
for me - today appears muddled. search versus find. We'll they're
synonymous,
In some sense 'std::search_n( )' is a generalization of 'std::find()':
'std::find()' is identical to a 'std::search_n( )' with a count of '1'.
but I wonder sometimes if it would have been better to just
'provide the better of the two' as part of the standard.


Which one is the better one? The simple one which does not work for
everybody? ...or the more general one which requires users in need
of a simple solution to pass extra arguments? Do you really want
to pass on all those arguments for the most general algorithms when
you only need a simple algorithm? Also, do you accept the possible
slowdown due to using a more general algorithm which cannot take
advantage of specific properties? ... or that the general algorithm
may be inapplicable because it requires more specific input e.g. a
forward iterator ('std::search_n ()') instead of an input iterator
('std::find()') ?
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Mar 8 '06 #4
> This would be a pretty stupid transmission and I would certainly
represent the received data differently, e.g. using a sequence of
(value, count) pairs. You lost me there. You're told in advance about the sequence in a
header. You get header/data, header/data - at all times. The header
tells you what to do with the junk you receive/may have received
(assuming tranmittal errors).

In any case, assuming I got a container with the data in the form you described, I would use one of two approaches
depending on the characteristics of the container:

I would use 'std::find()' to locate the first object not matching the
condition and 'std::distance( )' to determine the number of matching
elements:

if (n != std::distance(c .begin(),
std::find_if(c. begin(), c.end(),
std::bind2nd(st d::not_equal_to <char>(), 0xA5))))
... Ahh!! That's right!! I was missing the bind2nd part. Actually, I was
trying out a combination of count_if(find_i f ... )
If the call to 'std::distance( )' is not bundled with the call to
'std::find_if() ', you would get the position of the first sequence,
too. Note, however, that the sequence is traversed twice in this
situation if the container does not provide random access.


I'm a little ignorant in this area so bear with me. Just so I
understand this: Assume our container is std::list and given 4
elements. Pictorially:

[ begin ][ begin + 1 ][ begin + 2][ begin + 3]

So find_if will run from begin to 1 past begin + 3. Now why is there a
need to traverse twice? All the sequence needs to do is run from being
to 1 past begin + 3 to find a matching value.

The trouble with alot of these algorithms is the boundary
for me - today appears muddled. search versus find. We'll they're
synonymous,


In some sense 'std::search_n( )' is a generalization of 'std::find()':
'std::find()' is identical to a 'std::search_n( )' with a count of '1'.
but I wonder sometimes if it would have been better to just
'provide the better of the two' as part of the standard.


Which one is the better one? The simple one which does not work for
everybody? ...or the more general one which requires users in need
of a simple solution to pass extra arguments? Do you really want
to pass on all those arguments for the most general algorithms when
you only need a simple algorithm? Also, do you accept the possible
slowdown due to using a more general algorithm which cannot take
advantage of specific properties? ... or that the general algorithm
may be inapplicable because it requires more specific input e.g. a
forward iterator ('std::search_n ()') instead of an input iterator
('std::find()') ?

Ok. the devil is in the details, I suppose. :) I need to back up and
review once again distinction between the input versus forward
iterators

Thanks for the help

Mar 8 '06 #5
ma740988 wrote:
This would be a pretty stupid transmission and I would certainly
represent the received data differently, e.g. using a sequence of
(value, count) pairs. You lost me there. You're told in advance about the sequence in a
header. You get header/data, header/data - at all times. The header
tells you what to do with the junk you receive/may have received
(assuming tranmittal errors).


All I'm saying is that I would read the input and check immediately
the data I receive without ever putting it into any form of container
or sequence (other than the input sequence you might obtain using
'std::istreambu f_iterator<char >'). Especially if the sequences are
huge, this would be much more memory conservative.
If the call to 'std::distance( )' is not bundled with the call to
'std::find_if() ', you would get the position of the first sequence,
too. Note, however, that the sequence is traversed twice in this
situation if the container does not provide random access.


I'm a little ignorant in this area so bear with me. Just so I
understand this: Assume our container is std::list and given 4
elements. Pictorially:

[ begin ][ begin + 1 ][ begin + 2][ begin + 3]

So find_if will run from begin to 1 past begin + 3. Now why is there a
need to traverse twice? All the sequence needs to do is run from being
to 1 past begin + 3 to find a matching value.


The sequence is traversed once for 'std::find_if() ' to determine the
end and then another time for 'std::distance( )' to count the number
of elements. Duplicate traversal of 1M list nodes could be more
expensive than bundling the counting with the finding and thus
requiring only one traversal.
Ok. the devil is in the details, I suppose. :)


I wouldn't call the difference between a forward iterator and an
input iterator to be much more severe than just a "detail". But
then, I'm pretty focused on the concepts involved in generic
programming and thus consider things a big deal which most others
don't really care about...
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Mar 8 '06 #6
All I'm saying is that I would read the input and check immediately
the data I receive without ever putting it into any form of container
or sequence (other than the input sequence you might obtain using
'std::istreambu f_iterator<char >'). Especially if the sequences are
huge, this would be much more memory conservative.
I see. That's exactly what I doing though. I know where start and end
of the data is ( based on what the header told me ). I konw in memory
where the data resides. From there I just check to ensure that what
the 'sender' claimed he sent matched what I received.
My initial post showed me allocating memory and filling it with A5's.
In my real system, obviously the receiver doesn't need to do all that.
That said, those steps are omitted. The receiver just _walks_ across
the memory - validating the data.

The sequence is traversed once for 'std::find_if() ' to determine the
end and then another time for 'std::distance( )' to count the number
of elements. Duplicate traversal of 1M list nodes could be more
expensive than bundling the counting with the finding and thus
requiring only one traversal.

Got it. Thanks alot.

Mar 8 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
1453
by: Chris Jefferson | last post by:
Hello! For those who aren't familar BTW, the "search_n" STL function takes two iterators, a "count" value and and either a "value" or a "value" and a binary predicate and searches between the iterators for the first occurance of "count" values which are all equal "value" (with respect to the binary predicate if given, == if not). In my STL manual, the complexity of search_n is given as O(dist.count)
5
2217
by: ma740988 | last post by:
There's a need for me to move around at specified offsets within memory. As as a result - long story short - unsigned char* is the type of choice. At issue: Consider the case ( test code ) where I'm comparing two structs. The struct test1 has information with regards to data_size and pointer to address. The struct test2 has information with regards to data_size and value. I will compare test1 and test2. For each matching data size,...
6
2932
by: cesco | last post by:
Hi, say I have a vector like the following: vec = and I'd like to know via a function (for example, ConsecutiveIntegers(vec, N)) whether there are N consecutive integers. So, for example, if N=3 then ConsecutiveIntegers(vec, N) should return true because 1, 2, 3 and 5, 6, 7 are consecutive. If N=4, then
0
9632
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9471
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10302
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10071
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8958
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7478
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5372
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
3631
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2867
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.