Jerry Coffin wrote:
In article <ed**********@murdoch.acc.Virginia.EDU>, jk********@gmx.net
says...
>Moohoo wrote:
I have two sorted containers, and I wish to know if any items from one
container exist in the second container. Is there anything in the
standard library to do this?
You could use std::set_intersection(). If you are only interested in
whether the intersection is empty, you could use an output_iterator that
throws as soon as the std::set_intersection() writes the first element:
[ ... ]
>Note that this is a little unorthodox.
That's a rather neat idea -- one I'm pretty sure _I_ wouldn't have come
up with.
Just keep in mind that along with being unorthodox, throwing and
catching an exception can be quite slow -- and the speed seems to vary
pretty widely from one implementation to the next, so this might end up
saving a lot of time in some cases, but could lose a fair amouunt of
time in others.
>An more conventional method would be to iterate over one range and use
one of the various binary search methods (std::lower_bound(),
std::upper_bound, std::binary_search()) to find the element in the other
range.
That would certainly be an obvious method -- and quite efficient:
dependably O(N lg M), regardless.
You should be able to do a _little_ bit better on your own though. Since
both collections are sorted, an unscuccesfful binary search for an item
from collection A in collection B also establishes the lower bound in
collection B where the next item from collection A could possibly occur.
OTOH, this really is only a _little_ better. Assuming items in both
collections are about evenly distributed, it averages halving the amount
of the second collection that needs to be searched each time. That means
it only reduces the average number of comparisons for each binary search
by 1.
Good points. So I did some measurements. Below is the code (sorry for the
long post). I implemented three methods:
a) the try-throw-catch idea,
b) using set_intersection with an iterator that sets a boolean flag,
c) a version that uses lower_bound.
The last is by far the fastest (I tried hard to make it as efficient as
reasonable). However, it is also the least obvious.
I grew to like the try-throw-catch idiom when I played around with puzzle
solvers: usually it is easy to write a puzzle solver that generates all
solutions and if you are only interested in one solution you have to tweak
the code and insert some return path. I figured that there is a nice
interface that allows one to use the same code for generating all solutions
and finding just one solution:
template < typename OutIter >
OutIter write_all_solitions ( PuzzleData input_data, OutIter where );
Now, you can use the throwing iterator to retrieve just one solution:
template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{
throwing_iterator & operator= ( T const & value ) {
throw ( value );
}
throwing_iterator & operator* ( void ) {
return ( *this );
}
throwing_iterator & operator++ ( void ) {
return ( *this );
}
throwing_iterator & operator++ ( int ) {
return ( *this );
}
}; // throwing_iterator
In these cases, efficiency is not that much of a concern since you throw
only once, namely when you are done anyway. I find that code like the
following conveys intent quite clearly:
void func ( PuzzleData const & initial_configuration ) {
try {
throwing_iterator< AdmissibleConfiguration hit_soltion;
write_all_solutions( initial_configuration, hit_solution );
}
catch ( AdmissibleConfiguration const & solution ) {
do_something( solution );
return;
}
no_solution_found();
}
Now for the performance:
#include <algorithm>
#include <iterator>
template < typename T >
struct throwing_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{
throwing_iterator & operator= ( T const & value ) {
throw ( value );
}
throwing_iterator & operator* ( void ) {
return ( *this );
}
throwing_iterator & operator++ ( void ) {
return ( *this );
}
throwing_iterator & operator++ ( int ) {
return ( *this );
}
}; // throwing_iterator
template < typename Iterator >
bool has_intersection_a ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
throwing_iterator< value_type t_iter;
try {
std::set_intersection( from1, to1, from2, to2, t_iter );
}
catch ( typename throwing_iterator< value_type >::value_type ) {
return ( true );
}
return ( false );
}
template < typename T >
struct recording_iterator
: public std::iterator< std::output_iterator_tag, T, void, void, void >
{
bool result;
recording_iterator ( void )
: result ( false )
{}
recording_iterator & operator= ( T const & value ) {
result = true;
}
recording_iterator & operator* ( void ) {
return ( *this );
}
recording_iterator & operator++ ( void ) {
return ( *this );
}
recording_iterator & operator++ ( int ) {
return ( *this );
}
}; // recording_iterator
template < typename Iterator >
bool has_intersection_b ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
typedef typename std::iterator_traits< Iterator >::value_type value_type;
recording_iterator< value_type r_iter;
std::set_intersection( from1, to1, from2, to2, r_iter );
return ( r_iter.result );
}
template < typename Iterator >
bool has_intersection_c ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
#include <iostream>
#include <set>
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <ctime>
#include <cassert>
int main ( void ) {
std::srand( 982348 );
unsigned int const inner = 1000;
unsigned int const num_sets = 10;
unsigned int const num_runs = 40;
unsigned int const set_size = 50000;
unsigned int const set_range = 1000000;
std::vector< std::set< unsigned int a;
for ( unsigned int i = 0; i < num_sets; ++i ) {
a.push_back( std::set<unsigned int>() );
for ( unsigned int k = 0; k < set_size; ++k ) {
a.back().insert( std::rand() % set_range );
}
}
for ( unsigned int i = 0; i < num_runs; ++i ) {
unsigned int i = std::rand() % num_sets;
unsigned int j = std::rand() % num_sets;
std::vector< unsigned int v_1 ( a[i].begin(), a[i].end() );
std::vector< unsigned int v_2 ( a[j].begin(), a[j].end() );
switch ( std::rand() % 3 ) {
case 0 : {
std::cout << "throwing_iterator: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_a( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
case 1 : {
std::cout << "recording_iterator: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_b( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
case 2 : {
std::cout << "lower_bound: ";
{
std::clock_t ticks = std::clock();
for ( unsigned int l = 0; l < inner; ++l ) {
if ( has_intersection_c( v_1.begin(), v_1.end(),
v_2.begin(), v_2.end() ) ) {
std::swap( a[ std::rand() % num_sets ],
a[ std::rand() % num_sets ] );
}
}
std::cout << std::clock() - ticks << '\n';
}
break;
}
}
}
}
Results:
recording_iterator: 2940000
recording_iterator: 2880000
throwing_iterator: 30000
lower_bound: 0
recording_iterator: 2890000
lower_bound: 0
throwing_iterator: 30000
recording_iterator: 2870000
throwing_iterator: 30000
recording_iterator: 2830000
throwing_iterator: 30000
recording_iterator: 2830000
recording_iterator: 2890000
lower_bound: 0
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 10000
throwing_iterator: 30000
throwing_iterator: 30000
throwing_iterator: 30000
recording_iterator: 2870000
lower_bound: 0
throwing_iterator: 30000
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 0
recording_iterator: 2850000
recording_iterator: 2860000
throwing_iterator: 30000
recording_iterator: 2890000
throwing_iterator: 30000
recording_iterator: 2870000
recording_iterator: 510000
throwing_iterator: 30000
throwing_iterator: 30000
lower_bound: 0
lower_bound: 0
throwing_iterator: 30000
recording_iterator: 2870000
throwing_iterator: 30000
Clearly, aborting set_intersection() pays off dramatically; but the winner
is, beyond any doubt:
template < typename Iterator >
bool has_intersection ( Iterator from1, Iterator to1,
Iterator from2, Iterator to2 ) {
while ( true ) {
if ( from1 == to1 ) {
return ( false );
}
from2 = std::lower_bound( from2, to2, *from1 );
if ( from2 == to2 ) {
return ( false );
}
if ( *from1 == *from2 ) {
return ( true );
}
from1 = std::lower_bound( from1, to1, *from2 );
}
}
Best
Kai-Uwe Bux