473,508 Members | 2,074 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Test for inclusion

I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

Thanks

Sep 5 '06 #1
13 1950
How about set_intersection?

http://www.sgi.com/tech/stl/set_intersection.html

Moohoo wrote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

Thanks
Sep 5 '06 #2
Moohoo wrote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?
You could use std::set_intersection(). If you are only interested in whether
the intersection is empty, you could use an output_iterator that throws as
soon as the std::set_intersection() writes the first element:

#include <set>
#include <algorithm>
#include <iterator>

template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, void, void, void, void >
{

struct flag_type {};

throwing_iterator & operator= ( T const & value ) {
throw ( flag_type() );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator
template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::flag_type ) {
return ( true );
}
return ( false );
}

Note that this is a little unorthodox.
An more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.
Best

Kai-Uwe Bux
Sep 5 '06 #3
I don't know if I would encourage this use of an exception - certainly
an empty intersection is not an exceptional state, unless you were
accessing it without checking to see if it contained anything.

You could just make a temporary container and see what gets put into
it. If you are holding very large objects, perhaps you could put them
in a container with boost::shared_ptr to prevent copying a large object
to a temporary set - you would only copy a shared_ptr.

The lower-bound thing would work, also.

-J

Kai-Uwe Bux wrote:
Moohoo wrote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

You could use std::set_intersection(). If you are only interested in whether
the intersection is empty, you could use an output_iterator that throws as
soon as the std::set_intersection() writes the first element:

#include <set>
#include <algorithm>
#include <iterator>

template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, void, void, void, void >
{

struct flag_type {};

throwing_iterator & operator= ( T const & value ) {
throw ( flag_type() );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator
template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::flag_type ) {
return ( true );
}
return ( false );
}

Note that this is a little unorthodox.
An more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.
Best

Kai-Uwe Bux
Sep 5 '06 #4
tr*****@gmail.com wrote:
I don't know if I would encourage this use of an exception - certainly
an empty intersection is not an exceptional state, unless you were
accessing it without checking to see if it contained anything.
http://groups.google.com/group/alt.c...e7fef082b5472/

A question I asked several months ago. Herb Sutter and Bjarne
Stroustrup weigh in quite strongly on this issue.
Sep 6 '06 #5
Thanks for the responses.

I'm not overly keen on using exceptions as a manor of short circuiting,
but reading the thread the Mark P linked, there is obviously some
debate over the idea.

In the past I've used std::set_intersection, and although I've never
gone to the trouble of optimising my output iterator to set a boolean
rather than fill a container, it still suffers from the problem that it
continues to work once the solution has been found.

I guess I'll end up writing a templated function to do this, as there
are several places I need to perform the test with various containers.
I was hoping there would be somthing already in existence that might
handle both random access iterators as well as std::set and std::map,
but it appears not.

Again, thank you for your replies,

Moohoo

Sep 6 '06 #6
In article <ed**********@murdoch.acc.Virginia.EDU>, jk********@gmx.net
says...
Moohoo wrote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

You could use std::set_intersection(). If you are only interested in whether
the intersection is empty, you could use an output_iterator that throws as
soon as the std::set_intersection() writes the first element:
[ ... ]
Note that this is a little unorthodox.
That's a rather neat idea -- one I'm pretty sure _I_ wouldn't have come
up with.

Just keep in mind that along with being unorthodox, throwing and
catching an exception can be quite slow -- and the speed seems to vary
pretty widely from one implementation to the next, so this might end up
saving a lot of time in some cases, but could lose a fair amouunt of
time in others.
An more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.
That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Sep 6 '06 #7
Jerry Coffin wrote:
In article <ed**********@murdoch.acc.Virginia.EDU>, jk********@gmx.net
says...
>Moohoo wrote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

You could use std::set_intersection(). If you are only interested in
whether the intersection is empty, you could use an output_iterator that
throws as soon as the std::set_intersection() writes the first element:

[ ... ]
>Note that this is a little unorthodox.

That's a rather neat idea -- one I'm pretty sure _I_ wouldn't have come
up with.

Just keep in mind that along with being unorthodox, throwing and
catching an exception can be quite slow -- and the speed seems to vary
pretty widely from one implementation to the next, so this might end up
saving a lot of time in some cases, but could lose a fair amouunt of
time in others.
>An more conventional method would be to iterate over one range and use
one of the various binary search methods (std::lower_bound(),
std::upper_bound, std::binary_search()) to find the element in the other
range.

That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.
Good points. So I did some measurements. Below is the code (sorry for the
long post). I implemented three methods:

a) the try-throw-catch idea,
b) using set_intersection with an iterator that sets a boolean flag,
c) a version that uses lower_bound.

The last is by far the fastest (I tried hard to make it as efficient as
reasonable). However, it is also the least obvious.

I grew to like the try-throw-catch idiom when I played around with puzzle
solvers: usually it is easy to write a puzzle solver that generates all
solutions and if you are only interested in one solution you have to tweak
the code and insert some return path. I figured that there is a nice
interface that allows one to use the same code for generating all solutions
and finding just one solution:

template < typename OutIter >
OutIter write_all_solitions ( PuzzleData input_data, OutIter where );

Now, you can use the throwing iterator to retrieve just one solution:

template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{

throwing_iterator & operator= ( T const & value ) {
throw ( value );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator

In these cases, efficiency is not that much of a concern since you throw
only once, namely when you are done anyway. I find that code like the
following conveys intent quite clearly:

void func ( PuzzleData const & initial_configuration ) {
try {
throwing_iterator< AdmissibleConfiguration hit_soltion;
write_all_solutions( initial_configuration, hit_solution );
}
catch ( AdmissibleConfiguration const & solution ) {
do_something( solution );
return;
}
no_solution_found();
}

Now for the performance:
#include <algorithm>
#include <iterator>
template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{

throwing_iterator & operator= ( T const & value ) {
throw ( value );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator

template < typename Iterator >
bool has_intersection_a ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::value_type ) {
return ( true );
}
return ( false );
}
template < typename T >
struct recording_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{

bool result;

recording_iterator ( void )
: result ( false )
{}

recording_iterator & operator= ( T const & value ) {
result = true;
}

recording_iterator & operator* ( void ) {
return ( *this );
}

recording_iterator & operator++ ( void ) {
return ( *this );
}

recording_iterator & operator++ ( int ) {
return ( *this );
}

}; // recording_iterator

template < typename Iterator >
bool has_intersection_b ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
recording_iterator< value_type r_iter;
std::set_intersection( from1, to1, from2, to2, r_iter );
return ( r_iter.result );
}
template < typename Iterator >
bool has_intersection_c ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
#include <iostream>
#include <set>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <ctime>
#include <cassert>

int main ( void ) {
std::srand( 982348 );
unsigned int const inner = 1000;
unsigned int const num_sets = 10;
unsigned int const num_runs = 40;
unsigned int const set_size = 50000;
unsigned int const set_range = 1000000;
std::vector< std::set< unsigned int a;
for ( unsigned int i = 0; i < num_sets; ++i ) {
a.push_back( std::set<unsigned int>() );
for ( unsigned int k = 0; k < set_size; ++k ) {
a.back().insert( std::rand() % set_range );
}
}

for ( unsigned int i = 0; i < num_runs; ++i ) {
unsigned int i = std::rand() % num_sets;
unsigned int j = std::rand() % num_sets;
std::vector< unsigned int v_1 ( a[i].begin(), a[i].end() );
std::vector< unsigned int v_2 ( a[j].begin(), a[j].end() );
switch ( std::rand() % 3 ) {
case 0 : {
std::cout << "throwing_iterator: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_a( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
case 1 : {
std::cout << "recording_iterator: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_b( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
case 2 : {
std::cout << "lower_bound: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_c( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
}
}
}

Results:

recording_iterator: 2940000
recording_iterator: 2880000
throwing_iterator: 30000
lower_bound: 0
recording_iterator: 2890000
lower_bound: 0
throwing_iterator: 30000
recording_iterator: 2870000
throwing_iterator: 30000
recording_iterator: 2830000
throwing_iterator: 30000
recording_iterator: 2830000
recording_iterator: 2890000
lower_bound: 0
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 10000
throwing_iterator: 30000
throwing_iterator: 30000
throwing_iterator: 30000
recording_iterator: 2870000
lower_bound: 0
throwing_iterator: 30000
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 0
recording_iterator: 2850000
recording_iterator: 2860000
throwing_iterator: 30000
recording_iterator: 2890000
throwing_iterator: 30000
recording_iterator: 2870000
recording_iterator: 510000
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 0
lower_bound: 0
throwing_iterator: 30000
recording_iterator: 2870000
throwing_iterator: 30000
Clearly, aborting set_intersection() pays off dramatically; but the winner
is, beyond any doubt:

template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
Best

Kai-Uwe Bux
Sep 6 '06 #8
"Moohoo" <mo******@yahoo.comwrote in message
news:11**********************@i42g2000cwa.googlegr oups.com...
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?
Here's a linear time (O(N+M)) solution which doesn't use exceptions:

bool any_a_in_b (vector<MyTypea, vector<MyTypeb) {
vector<MyType>::iterator a_iter, b_iter;
a_iter = a.begin();
b_iter = b.begin();
while(a_iter != a.end() && b_iter != b.end()) {
if (*a_iter < *b_iter)
++a_iter;
else if (*b_iter < *a_iter)
++b_iter;
else return true;
}
return false;
}

Sep 6 '06 #9

Kai-Uwe Bux wrote:
<snip>
>
Clearly, aborting set_intersection() pays off dramatically; but the winner
is, beyond any doubt:

template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
Best

Kai-Uwe Bux
You lower bound solution always has a time of 0. Is this not
suspicious? But regardless, thank you for the time you spent to put
this together. I was anticipating the recording iterator to be
somewhat slower than the other methods, which your numbers clearly
show.

Thanks again for everyones help and ideas,

Moohoo

Sep 6 '06 #10
Moohoo wrote:
>
Kai-Uwe Bux wrote:
<snip>
>>
Clearly, aborting set_intersection() pays off dramatically; but the
winner is, beyond any doubt:

template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
Best

Kai-Uwe Bux
You lower bound solution always has a time of 0. Is this not
suspicious?
Yes, it sure is. Thus, I had some test code where I put in asserts checking
that this method produces the same answers as the other. So, I am actually
somewhat confident that it is correct.
But regardless, thank you for the time you spent to put
this together. I was anticipating the recording iterator to be
somewhat slower than the other methods, which your numbers clearly
show.

Best

Kai-Uwe Bux
Sep 6 '06 #11
Jerry Coffin wrote:
In article <ed**********@murdoch.acc.Virginia.EDU>, jk********@gmx.net
says...
>Moohoo wrote:

A more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.

That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.
There's no reason to resort to binary search. If both collections are
sorted this can be done in linear time, essentially along the same lines
as the merge step of a merge sort.

Mark
Sep 6 '06 #12
tr*****@gmail.com wrote:
How about set_intersection?

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or the group FAQ list:
<http://www.parashift.com/c++-faq-lite/how-to-post.html>


Brian
Sep 6 '06 #13
"Mark P" <us****@fall2005REMOVE.fastmailCAPS.fmwrote in message
news:Ph*******************@newssvr27.news.prodigy. net...
Jerry Coffin wrote:
That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.

There's no reason to resort to binary search. If both collections are
sorted this can be done in linear time, essentially along the same lines
as the merge step of a merge sort.
Yes, as my reply to the original post has already shown.

Philip

Sep 7 '06 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
2024
by: Remy Blank | last post by:
Ok, here we go. I added the possibility for tests using the unittest.py framework to be skipped. Basically, I added two methods to TestCase: TestCase.skip(msg): skips unconditionally...
16
1985
by: Nathan Funk | last post by:
I used to work as a web designer a couple of years ago, but I haven't been closely in touch in the past years. Has anything changed recently for managing content that is common among many pages...
5
3123
by: Jan Roland Eriksson | last post by:
Some more fine tuning and inclusion of last suggested text from regulars has been done to this test post #4 of the mFAQ. As usual, rip it up anywhere you feel that it's appropriate to do so. ...
5
3707
by: Dave | last post by:
Hello all, To protect against multiple inclusions, it is standard practice to enclose the contents of a header file in a construct like this: #ifndef FOO_INCLUDED #define FOO_INCLUDED .......
6
5177
by: techBoy | last post by:
I am looking for a tool that can scan my soyrce code and check if a header file gets included more then once in a sequece of compiled code. Can some one guide me to such a tool !!
4
2078
by: venkatbo | last post by:
Hi folks, I'd like to disable the inclusion of tk graphics lib in my py build. Looked around but couldn't find a clear answer. Which one of the following would I need to use in the configure...
6
2596
by: vsgdp | last post by:
I was looking at some library code today and noticed something like this: // sublibrary.h // define some constants, enums, symbols #include "componentA.h" #include "componentB.h" #include...
6
3893
by: Juha Nieminen | last post by:
Multiple inclusion of the same header file can cause the compilation to fail because of multiple definitions of the same type. That's why it's standard practice to write all headers like this: ...
1
1100
by: RajinCodingForum | last post by:
I have some idea but i am puzzled. As i understand, file inclusion problems like x includes y and y in turn includes x etc. can be avoided by #ifdef preprocessor checks. Can you please explain with...
0
534
by: Rob Wilkerson | last post by:
Hey guys - I have a test suite that's working when I include a test case via its absolute path. For portability, I'd rather include it just using its name, but it's failing with a "Failed to...
0
7225
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7326
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7385
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7046
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7498
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
5053
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4707
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
1
766
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
418
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.