Test for inclusion

Moohoo

I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

Thanks

Sep 5 '06 #1

Subscribe Reply

1950

tryptik

How about set_intersection?

http://www.sgi.com/tech/stl/set_intersection.html

Moohoo wrote:

I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

Thanks

Sep 5 '06 #2

Kai-Uwe Bux

Moohoo wrote:

I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

You could use std::set_intersection(). If you are only interested in whether
the intersection is empty, you could use an output_iterator that throws as
soon as the std::set_intersection() writes the first element:

#include <set>
#include <algorithm>
#include <iterator>

template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, void, void, void, void >
{

struct flag_type {};

throwing_iterator & operator= ( T const & value ) {
throw ( flag_type() );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator
template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::flag_type ) {
return ( true );
}
return ( false );
}

Note that this is a little unorthodox.
An more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.
Best

Kai-Uwe Bux

Sep 5 '06 #3

tryptik

I don't know if I would encourage this use of an exception - certainly
an empty intersection is not an exceptional state, unless you were
accessing it without checking to see if it contained anything.

You could just make a temporary container and see what gets put into
it. If you are holding very large objects, perhaps you could put them
in a container with boost::shared_ptr to prevent copying a large object
to a temporary set - you would only copy a shared_ptr.

The lower-bound thing would work, also.

-J

Kai-Uwe Bux wrote:

Moohoo wrote:

I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

You could use std::set_intersection(). If you are only interested in whether
the intersection is empty, you could use an output_iterator that throws as
soon as the std::set_intersection() writes the first element:

#include <set>
#include <algorithm>
#include <iterator>

template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, void, void, void, void >
{

struct flag_type {};

throwing_iterator & operator= ( T const & value ) {
throw ( flag_type() );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator
template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::flag_type ) {
return ( true );
}
return ( false );
}

Note that this is a little unorthodox.
An more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.
Best

Kai-Uwe Bux

Sep 5 '06 #4

Mark P

tr*****@gmail.com wrote:

I don't know if I would encourage this use of an exception - certainly
an empty intersection is not an exceptional state, unless you were
accessing it without checking to see if it contained anything.

http://groups.google.com/group/alt.c...e7fef082b5472/

A question I asked several months ago. Herb Sutter and Bjarne
Stroustrup weigh in quite strongly on this issue.

Sep 6 '06 #5

Moohoo

Thanks for the responses.

I'm not overly keen on using exceptions as a manor of short circuiting,
but reading the thread the Mark P linked, there is obviously some
debate over the idea.

In the past I've used std::set_intersection, and although I've never
gone to the trouble of optimising my output iterator to set a boolean
rather than fill a container, it still suffers from the problem that it
continues to work once the solution has been found.

I guess I'll end up writing a templated function to do this, as there
are several places I need to perform the test with various containers.
I was hoping there would be somthing already in existence that might
handle both random access iterators as well as std::set and std::map,
but it appears not.

Again, thank you for your replies,

Moohoo

Sep 6 '06 #6

Jerry Coffin

In article <ed**********@murdoch.acc.Virginia.EDU>, jk********@gmx.net
says...

Moohoo wrote:

I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

You could use std::set_intersection(). If you are only interested in whether
the intersection is empty, you could use an output_iterator that throws as
soon as the std::set_intersection() writes the first element:

[ ... ]

Note that this is a little unorthodox.

That's a rather neat idea -- one I'm pretty sure _I_ wouldn't have come
up with.

Just keep in mind that along with being unorthodox, throwing and
catching an exception can be quite slow -- and the speed seems to vary
pretty widely from one implementation to the next, so this might end up
saving a lot of time in some cases, but could lose a fair amouunt of
time in others.

An more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.

That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Sep 6 '06 #7

Kai-Uwe Bux

Jerry Coffin wrote:

In article <ed**********@murdoch.acc.Virginia.EDU>, jk********@gmx.net
says...
>Moohoo wrote:

I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

You could use std::set_intersection(). If you are only interested in
whether the intersection is empty, you could use an output_iterator that
throws as soon as the std::set_intersection() writes the first element:

[ ... ]

>Note that this is a little unorthodox.

That's a rather neat idea -- one I'm pretty sure _I_ wouldn't have come
up with.

Just keep in mind that along with being unorthodox, throwing and
catching an exception can be quite slow -- and the speed seems to vary
pretty widely from one implementation to the next, so this might end up
saving a lot of time in some cases, but could lose a fair amouunt of
time in others.

>An more conventional method would be to iterate over one range and use
one of the various binary search methods (std::lower_bound(),
std::upper_bound, std::binary_search()) to find the element in the other
range.

That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.

Good points. So I did some measurements. Below is the code (sorry for the
long post). I implemented three methods:

a) the try-throw-catch idea,
b) using set_intersection with an iterator that sets a boolean flag,
c) a version that uses lower_bound.

The last is by far the fastest (I tried hard to make it as efficient as
reasonable). However, it is also the least obvious.

I grew to like the try-throw-catch idiom when I played around with puzzle
solvers: usually it is easy to write a puzzle solver that generates all
solutions and if you are only interested in one solution you have to tweak
the code and insert some return path. I figured that there is a nice
interface that allows one to use the same code for generating all solutions
and finding just one solution:

template < typename OutIter >
OutIter write_all_solitions ( PuzzleData input_data, OutIter where );

Now, you can use the throwing iterator to retrieve just one solution:

template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{

throwing_iterator & operator= ( T const & value ) {
throw ( value );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator

In these cases, efficiency is not that much of a concern since you throw
only once, namely when you are done anyway. I find that code like the
following conveys intent quite clearly:

void func ( PuzzleData const & initial_configuration ) {
try {
throwing_iterator< AdmissibleConfiguration hit_soltion;
write_all_solutions( initial_configuration, hit_solution );
}
catch ( AdmissibleConfiguration const & solution ) {
do_something( solution );
return;
}
no_solution_found();
}

Now for the performance:
#include <algorithm>
#include <iterator>
template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{

throwing_iterator & operator= ( T const & value ) {
throw ( value );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator

template < typename Iterator >
bool has_intersection_a ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::value_type ) {
return ( true );
}
return ( false );
}
template < typename T >
struct recording_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{

bool result;

recording_iterator ( void )
: result ( false )
{}

recording_iterator & operator= ( T const & value ) {
result = true;
}

recording_iterator & operator* ( void ) {
return ( *this );
}

recording_iterator & operator++ ( void ) {
return ( *this );
}

recording_iterator & operator++ ( int ) {
return ( *this );
}

}; // recording_iterator

template < typename Iterator >
bool has_intersection_b ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
recording_iterator< value_type r_iter;
std::set_intersection( from1, to1, from2, to2, r_iter );
return ( r_iter.result );
}
template < typename Iterator >
bool has_intersection_c ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
#include <iostream>
#include <set>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <ctime>
#include <cassert>

int main ( void ) {
std::srand( 982348 );
unsigned int const inner = 1000;
unsigned int const num_sets = 10;
unsigned int const num_runs = 40;
unsigned int const set_size = 50000;
unsigned int const set_range = 1000000;
std::vector< std::set< unsigned int a;
for ( unsigned int i = 0; i < num_sets; ++i ) {
a.push_back( std::set<unsigned int>() );
for ( unsigned int k = 0; k < set_size; ++k ) {
a.back().insert( std::rand() % set_range );
}
}

for ( unsigned int i = 0; i < num_runs; ++i ) {
unsigned int i = std::rand() % num_sets;
unsigned int j = std::rand() % num_sets;
std::vector< unsigned int v_1 ( a[i].begin(), a[i].end() );
std::vector< unsigned int v_2 ( a[j].begin(), a[j].end() );
switch ( std::rand() % 3 ) {
case 0 : {
std::cout << "throwing_iterator: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_a( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
case 1 : {
std::cout << "recording_iterator: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_b( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
case 2 : {
std::cout << "lower_bound: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_c( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
}
}
}

Results:

recording_iterator: 2940000
recording_iterator: 2880000
throwing_iterator: 30000
lower_bound: 0
recording_iterator: 2890000
lower_bound: 0
throwing_iterator: 30000
recording_iterator: 2870000
throwing_iterator: 30000
recording_iterator: 2830000
throwing_iterator: 30000
recording_iterator: 2830000
recording_iterator: 2890000
lower_bound: 0
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 10000
throwing_iterator: 30000
throwing_iterator: 30000
throwing_iterator: 30000
recording_iterator: 2870000
lower_bound: 0
throwing_iterator: 30000
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 0
recording_iterator: 2850000
recording_iterator: 2860000
throwing_iterator: 30000
recording_iterator: 2890000
throwing_iterator: 30000
recording_iterator: 2870000
recording_iterator: 510000
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 0
lower_bound: 0
throwing_iterator: 30000
recording_iterator: 2870000
throwing_iterator: 30000
Clearly, aborting set_intersection() pays off dramatically; but the winner
is, beyond any doubt:

template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
Best

Kai-Uwe Bux

Sep 6 '06 #8

Philip Potter

"Moohoo" <mo******@yahoo.comwrote in message
news:11**********************@i42g2000cwa.googlegr oups.com...

I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

Here's a linear time (O(N+M)) solution which doesn't use exceptions:

bool any_a_in_b (vector<MyTypea, vector<MyTypeb) {
vector<MyType>::iterator a_iter, b_iter;
a_iter = a.begin();
b_iter = b.begin();
while(a_iter != a.end() && b_iter != b.end()) {
if (*a_iter < *b_iter)
++a_iter;
else if (*b_iter < *a_iter)
++b_iter;
else return true;
}
return false;
}

Sep 6 '06 #9

Moohoo

Kai-Uwe Bux wrote:
<snip>

>
Clearly, aborting set_intersection() pays off dramatically; but the winner
is, beyond any doubt:

template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
Best

Kai-Uwe Bux

You lower bound solution always has a time of 0. Is this not
suspicious? But regardless, thank you for the time you spent to put
this together. I was anticipating the recording iterator to be
somewhat slower than the other methods, which your numbers clearly
show.

Thanks again for everyones help and ideas,

Moohoo

Sep 6 '06 #10

Kai-Uwe Bux

Moohoo wrote:

>
Kai-Uwe Bux wrote:
<snip>
>>
Clearly, aborting set_intersection() pays off dramatically; but the
winner is, beyond any doubt:

template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
Best

Kai-Uwe Bux
You lower bound solution always has a time of 0. Is this not
suspicious?

Yes, it sure is. Thus, I had some test code where I put in asserts checking
that this method produces the same answers as the other. So, I am actually
somewhat confident that it is correct.

But regardless, thank you for the time you spent to put
this together. I was anticipating the recording iterator to be
somewhat slower than the other methods, which your numbers clearly
show.

Best

Kai-Uwe Bux

Sep 6 '06 #11

Mark P

Jerry Coffin wrote:

In article <ed**********@murdoch.acc.Virginia.EDU>, jk********@gmx.net
says...
>Moohoo wrote:

A more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.

That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.

There's no reason to resort to binary search. If both collections are
sorted this can be done in linear time, essentially along the same lines
as the merge step of a merge sort.

Mark

Sep 6 '06 #12

Default User

tr*****@gmail.com wrote:

How about set_intersection?

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or the group FAQ list:
<http://www.parashift.com/c++-faq-lite/how-to-post.html>

Brian

Sep 6 '06 #13

Philip Potter

"Mark P" <us****@fall2005REMOVE.fastmailCAPS.fmwrote in message
news:Ph*******************@newssvr27.news.prodigy. net...

Jerry Coffin wrote:
That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.

There's no reason to resort to binary search. If both collections are
sorted this can be done in linear time, essentially along the same lines
as the merge step of a merge sort.

Yes, as my reply to the original post has already shown.

Philip

Sep 7 '06 #14

Test for inclusion

Similar topics