471,893 Members | 1,745 Online

# pairwise comparison and merging of vector elements

I am wondering if anyone has any better idea of how to approach
this problem than I do. . . .

I have a vector of items (data). I have to do a pairwise
comparison of each item to each other item and apply logic to see if
one should be deleted. In the past I have done this using for
statements, like:

for (int i = 0; i < input_vector.size(); i++)
for (int j = i; i < input_vector.size(); j++)
{
. . . process data . . .
}

The problem is that I have do delete items in the middle, which makes
index-watching tricky. So, in the past, I have copied the vector
first and manipulated the copy vice the original when I process the
data. However, then I have to keep track of which items in the
original vector have already been merged away. For example, I compare
items 1 and 2, and I decide to delete 2. So, then after I finish all
the comparisons with item 1, I need to do that with item 3 (not 2,
which I`ve deleted in the copied vector.

In addition to being complicated in the past, this has been slow.

Is there a better approach? I come across the same scenario
frequently.

Thanks, Alan

Jun 4 '07 #1
1 1942
Alan wrote:
I am wondering if anyone has any better idea of how to approach
this problem than I do. . . .

I have a vector of items (data). I have to do a pairwise
comparison of each item to each other item and apply logic to see if
one should be deleted. In the past I have done this using for
statements, like:

for (int i = 0; i < input_vector.size(); i++)
for (int j = i; i < input_vector.size(); j++)
{
. . . process data . . .
}

The problem is that I have do delete items in the middle, which makes
index-watching tricky.
Tricky? Really?... Most of vector cleanup code I've seen used to
decrement the index ('j', for example) right after deletion of the
element which it indexes (so the next ++ would essentially keep it
"correct").
So, in the past, I have copied the vector
first and manipulated the copy vice the original when I process the
data. However, then I have to keep track of which items in the
original vector have already been merged away. For example, I compare
items 1 and 2, and I decide to delete 2. So, then after I finish all
the comparisons with item 1, I need to do that with item 3 (not 2,
which I`ve deleted in the copied vector.

In addition to being complicated in the past, this has been slow.

Is there a better approach? I come across the same scenario
frequently.
Keep the second vector<charand "mark" the elements you've decided
to throw away. When going over your 'i' and 'j', first check with the
"marked" vector and if the element is set, skip to the next. Fast
and simple to understand. At the end walk from start and copy into
another vector only the elements that don't have corresponding "marked"
elements set.

Or write a proper functor and use 'remove_if'. You can rely on the
requirement that elements of a vector exist in an array...

And if you post a bit more of the code that illustrates your problem
we might point out inefficiencies further (if you want).

V
--