Connecting Tech Pros Worldwide Help | Site Map

Test for inclusion

 
LinkBack Thread Tools Search this Thread
  #1  
Old September 5th, 2006, 10:05 PM
Moohoo
Guest
 
Posts: n/a
Default Test for inclusion

I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?

Thanks


  #2  
Old September 5th, 2006, 10:45 PM
tryptik@gmail.com
Guest
 
Posts: n/a
Default Re: Test for inclusion

How about set_intersection?

http://www.sgi.com/tech/stl/set_intersection.html

Moohoo wrote:
Quote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?
>
Thanks
  #3  
Old September 5th, 2006, 10:45 PM
Kai-Uwe Bux
Guest
 
Posts: n/a
Default Re: Test for inclusion

Moohoo wrote:
Quote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?
You could use std::set_intersection(). If you are only interested in whether
the intersection is empty, you could use an output_iterator that throws as
soon as the std::set_intersection() writes the first element:

#include <set>
#include <algorithm>
#include <iterator>

template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, void, void, void, void >
{

struct flag_type {};

throwing_iterator & operator= ( T const & value ) {
throw ( flag_type() );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator


template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::flag_type ) {
return ( true );
}
return ( false );
}

Note that this is a little unorthodox.


An more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.


Best

Kai-Uwe Bux
  #4  
Old September 5th, 2006, 11:15 PM
tryptik@gmail.com
Guest
 
Posts: n/a
Default Re: Test for inclusion

I don't know if I would encourage this use of an exception - certainly
an empty intersection is not an exceptional state, unless you were
accessing it without checking to see if it contained anything.

You could just make a temporary container and see what gets put into
it. If you are holding very large objects, perhaps you could put them
in a container with boost::shared_ptr to prevent copying a large object
to a temporary set - you would only copy a shared_ptr.

The lower-bound thing would work, also.

-J

Kai-Uwe Bux wrote:
Quote:
Moohoo wrote:
>
Quote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?
>
You could use std::set_intersection(). If you are only interested in whether
the intersection is empty, you could use an output_iterator that throws as
soon as the std::set_intersection() writes the first element:
>
#include <set>
#include <algorithm>
#include <iterator>
>
template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, void, void, void, void >
{
>
struct flag_type {};
>
throwing_iterator & operator= ( T const & value ) {
throw ( flag_type() );
}
>
throwing_iterator & operator* ( void ) {
return ( *this );
}
>
throwing_iterator & operator++ ( void ) {
return ( *this );
}
>
throwing_iterator & operator++ ( int ) {
return ( *this );
}
>
}; // throwing_iterator
>
>
template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::flag_type ) {
return ( true );
}
return ( false );
}
>
Note that this is a little unorthodox.
>
>
An more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.
>
>
Best
>
Kai-Uwe Bux
  #5  
Old September 6th, 2006, 01:15 AM
Mark P
Guest
 
Posts: n/a
Default Re: Test for inclusion

tryptik@gmail.com wrote:
Quote:
I don't know if I would encourage this use of an exception - certainly
an empty intersection is not an exceptional state, unless you were
accessing it without checking to see if it contained anything.
>
http://groups.google.com/group/alt.c...e7fef082b5472/

A question I asked several months ago. Herb Sutter and Bjarne
Stroustrup weigh in quite strongly on this issue.
  #6  
Old September 6th, 2006, 01:55 AM
Moohoo
Guest
 
Posts: n/a
Default Re: Test for inclusion

Thanks for the responses.

I'm not overly keen on using exceptions as a manor of short circuiting,
but reading the thread the Mark P linked, there is obviously some
debate over the idea.

In the past I've used std::set_intersection, and although I've never
gone to the trouble of optimising my output iterator to set a boolean
rather than fill a container, it still suffers from the problem that it
continues to work once the solution has been found.

I guess I'll end up writing a templated function to do this, as there
are several places I need to perform the test with various containers.
I was hoping there would be somthing already in existence that might
handle both random access iterators as well as std::set and std::map,
but it appears not.

Again, thank you for your replies,

Moohoo

  #7  
Old September 6th, 2006, 02:15 AM
Jerry Coffin
Guest
 
Posts: n/a
Default Re: Test for inclusion

In article <edkv3m$h50$1@murdoch.acc.Virginia.EDU>, jkherciueh@gmx.net
says...
Quote:
Moohoo wrote:
>
Quote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?
>
You could use std::set_intersection(). If you are only interested in whether
the intersection is empty, you could use an output_iterator that throws as
soon as the std::set_intersection() writes the first element:
[ ... ]
Quote:
Note that this is a little unorthodox.
That's a rather neat idea -- one I'm pretty sure _I_ wouldn't have come
up with.

Just keep in mind that along with being unorthodox, throwing and
catching an exception can be quite slow -- and the speed seems to vary
pretty widely from one implementation to the next, so this might end up
saving a lot of time in some cases, but could lose a fair amouunt of
time in others.
Quote:
An more conventional method would be to iterate over one range and use one
of the various binary search methods (std::lower_bound(), std::upper_bound,
std::binary_search()) to find the element in the other range.
That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.

--
Later,
Jerry.

The universe is a figment of its own imagination.
  #8  
Old September 6th, 2006, 11:15 AM
Kai-Uwe Bux
Guest
 
Posts: n/a
Default Re: Test for inclusion

Jerry Coffin wrote:
Quote:
In article <edkv3m$h50$1@murdoch.acc.Virginia.EDU>, jkherciueh@gmx.net
says...
Quote:
>Moohoo wrote:
>>
Quote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?
>>
>You could use std::set_intersection(). If you are only interested in
>whether the intersection is empty, you could use an output_iterator that
>throws as soon as the std::set_intersection() writes the first element:
>
[ ... ]
>
Quote:
>Note that this is a little unorthodox.
>
That's a rather neat idea -- one I'm pretty sure _I_ wouldn't have come
up with.
>
Just keep in mind that along with being unorthodox, throwing and
catching an exception can be quite slow -- and the speed seems to vary
pretty widely from one implementation to the next, so this might end up
saving a lot of time in some cases, but could lose a fair amouunt of
time in others.
>
Quote:
>An more conventional method would be to iterate over one range and use
>one of the various binary search methods (std::lower_bound(),
>std::upper_bound, std::binary_search()) to find the element in the other
>range.
>
That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.
>
You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.
>
OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.
Good points. So I did some measurements. Below is the code (sorry for the
long post). I implemented three methods:

a) the try-throw-catch idea,
b) using set_intersection with an iterator that sets a boolean flag,
c) a version that uses lower_bound.

The last is by far the fastest (I tried hard to make it as efficient as
reasonable). However, it is also the least obvious.

I grew to like the try-throw-catch idiom when I played around with puzzle
solvers: usually it is easy to write a puzzle solver that generates all
solutions and if you are only interested in one solution you have to tweak
the code and insert some return path. I figured that there is a nice
interface that allows one to use the same code for generating all solutions
and finding just one solution:

template < typename OutIter >
OutIter write_all_solitions ( PuzzleData input_data, OutIter where );

Now, you can use the throwing iterator to retrieve just one solution:

template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{

throwing_iterator & operator= ( T const & value ) {
throw ( value );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator

In these cases, efficiency is not that much of a concern since you throw
only once, namely when you are done anyway. I find that code like the
following conveys intent quite clearly:

void func ( PuzzleData const & initial_configuration ) {
try {
throwing_iterator< AdmissibleConfiguration hit_soltion;
write_all_solutions( initial_configuration, hit_solution );
}
catch ( AdmissibleConfiguration const & solution ) {
do_something( solution );
return;
}
no_solution_found();
}



Now for the performance:


#include <algorithm>
#include <iterator>


template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{

throwing_iterator & operator= ( T const & value ) {
throw ( value );
}

throwing_iterator & operator* ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( void ) {
return ( *this );
}

throwing_iterator & operator++ ( int ) {
return ( *this );
}

}; // throwing_iterator

template < typename Iterator >
bool has_intersection_a ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::value_type ) {
return ( true );
}
return ( false );
}


template < typename T >
struct recording_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{

bool result;

recording_iterator ( void )
: result ( false )
{}

recording_iterator & operator= ( T const & value ) {
result = true;
}

recording_iterator & operator* ( void ) {
return ( *this );
}

recording_iterator & operator++ ( void ) {
return ( *this );
}

recording_iterator & operator++ ( int ) {
return ( *this );
}

}; // recording_iterator

template < typename Iterator >
bool has_intersection_b ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
recording_iterator< value_type r_iter;
std::set_intersection( from1, to1, from2, to2, r_iter );
return ( r_iter.result );
}


template < typename Iterator >
bool has_intersection_c ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}


#include <iostream>
#include <set>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <ctime>
#include <cassert>

int main ( void ) {
std::srand( 982348 );
unsigned int const inner = 1000;
unsigned int const num_sets = 10;
unsigned int const num_runs = 40;
unsigned int const set_size = 50000;
unsigned int const set_range = 1000000;
std::vector< std::set< unsigned int a;
for ( unsigned int i = 0; i < num_sets; ++i ) {
a.push_back( std::set<unsigned int>() );
for ( unsigned int k = 0; k < set_size; ++k ) {
a.back().insert( std::rand() % set_range );
}
}

for ( unsigned int i = 0; i < num_runs; ++i ) {
unsigned int i = std::rand() % num_sets;
unsigned int j = std::rand() % num_sets;
std::vector< unsigned int v_1 ( a[i].begin(), a[i].end() );
std::vector< unsigned int v_2 ( a[j].begin(), a[j].end() );
switch ( std::rand() % 3 ) {
case 0 : {
std::cout << "throwing_iterator: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_a( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
case 1 : {
std::cout << "recording_iterator: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_b( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
case 2 : {
std::cout << "lower_bound: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_c( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
}
}
}

Results:

recording_iterator: 2940000
recording_iterator: 2880000
throwing_iterator: 30000
lower_bound: 0
recording_iterator: 2890000
lower_bound: 0
throwing_iterator: 30000
recording_iterator: 2870000
throwing_iterator: 30000
recording_iterator: 2830000
throwing_iterator: 30000
recording_iterator: 2830000
recording_iterator: 2890000
lower_bound: 0
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 10000
throwing_iterator: 30000
throwing_iterator: 30000
throwing_iterator: 30000
recording_iterator: 2870000
lower_bound: 0
throwing_iterator: 30000
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 0
recording_iterator: 2850000
recording_iterator: 2860000
throwing_iterator: 30000
recording_iterator: 2890000
throwing_iterator: 30000
recording_iterator: 2870000
recording_iterator: 510000
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 0
lower_bound: 0
throwing_iterator: 30000
recording_iterator: 2870000
throwing_iterator: 30000


Clearly, aborting set_intersection() pays off dramatically; but the winner
is, beyond any doubt:

template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}


Best

Kai-Uwe Bux
  #9  
Old September 6th, 2006, 12:55 PM
Philip Potter
Guest
 
Posts: n/a
Default Re: Test for inclusion

"Moohoo" <moohooya@yahoo.comwrote in message
news:1157494671.994582.243750@i42g2000cwa.googlegr oups.com...
Quote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?
Here's a linear time (O(N+M)) solution which doesn't use exceptions:

bool any_a_in_b (vector<MyTypea, vector<MyTypeb) {
vector<MyType>::iterator a_iter, b_iter;
a_iter = a.begin();
b_iter = b.begin();
while(a_iter != a.end() && b_iter != b.end()) {
if (*a_iter < *b_iter)
++a_iter;
else if (*b_iter < *a_iter)
++b_iter;
else return true;
}
return false;
}

  #10  
Old September 6th, 2006, 01:35 PM
Moohoo
Guest
 
Posts: n/a
Default Re: Test for inclusion


Kai-Uwe Bux wrote:
<snip>
Quote:
>
Clearly, aborting set_intersection() pays off dramatically; but the winner
is, beyond any doubt:
>
template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
>
>
Best
>
Kai-Uwe Bux
You lower bound solution always has a time of 0. Is this not
suspicious? But regardless, thank you for the time you spent to put
this together. I was anticipating the recording iterator to be
somewhat slower than the other methods, which your numbers clearly
show.

Thanks again for everyones help and ideas,

Moohoo

  #11  
Old September 6th, 2006, 02:15 PM
Kai-Uwe Bux
Guest
 
Posts: n/a
Default Re: Test for inclusion

Moohoo wrote:
Quote:
>
Kai-Uwe Bux wrote:
<snip>
Quote:
>>
>Clearly, aborting set_intersection() pays off dramatically; but the
>winner is, beyond any doubt:
>>
> template < typename Iterator >
> bool has_intersection ( Iterator from1, Iterator to1,
> Iterator from2, Iterator to2 ) {
> while ( true ) {
> if ( from1 == to1 ) {
> return ( false );
> }
> from2 = std::lower_bound( from2, to2, *from1 );
> if ( from2 == to2 ) {
> return ( false );
> }
> if ( *from1 == *from2 ) {
> return ( true );
> }
> from1 = std::lower_bound( from1, to1, *from2 );
> }
> }
>>
>>
>Best
>>
>Kai-Uwe Bux
You lower bound solution always has a time of 0. Is this not
suspicious?
Yes, it sure is. Thus, I had some test code where I put in asserts checking
that this method produces the same answers as the other. So, I am actually
somewhat confident that it is correct.
Quote:
But regardless, thank you for the time you spent to put
this together. I was anticipating the recording iterator to be
somewhat slower than the other methods, which your numbers clearly
show.

Best

Kai-Uwe Bux
  #12  
Old September 6th, 2006, 05:05 PM
Mark P
Guest
 
Posts: n/a
Default Re: Test for inclusion

Jerry Coffin wrote:
Quote:
In article <edkv3m$h50$1@murdoch.acc.Virginia.EDU>, jkherciueh@gmx.net
says...
Quote:
>Moohoo wrote:
>>
>A more conventional method would be to iterate over one range and use one
>of the various binary search methods (std::lower_bound(), std::upper_bound,
>std::binary_search()) to find the element in the other range.
>
That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.
>
You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.
>
OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.
>
There's no reason to resort to binary search. If both collections are
sorted this can be done in linear time, essentially along the same lines
as the merge step of a merge sort.

Mark
  #13  
Old September 6th, 2006, 05:45 PM
Default User
Guest
 
Posts: n/a
Default Re: Test for inclusion

tryptik@gmail.com wrote:
Quote:
How about set_intersection?

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or the group FAQ list:
<http://www.parashift.com/c++-faq-lite/how-to-post.html>




Brian
  #14  
Old September 7th, 2006, 09:25 AM
Philip Potter
Guest
 
Posts: n/a
Default Re: Test for inclusion

"Mark P" <usenet@fall2005REMOVE.fastmailCAPS.fmwrote in message
news:PhDLg.16028$1f6.11648@newssvr27.news.prodigy. net...
Quote:
Jerry Coffin wrote:
Quote:
That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.

You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.

OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.
>
There's no reason to resort to binary search. If both collections are
sorted this can be done in linear time, essentially along the same lines
as the merge step of a merge sort.
Yes, as my reply to the original post has already shown.

Philip

 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,840 network members.