Efficiently Extracting Identical Values From A List/Array

Adam Hartshorne

As a result of a graphics based algorihtms, I have a list of indices to
a set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence
the output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this
problem could become a bottleneck,

Adam

Jul 23 '05 #1

Subscribe Reply

2378

Thomas Maier-Komor

Adam Hartshorne wrote:

As a result of a graphics based algorihtms, I have a list of indices to
a set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence
the output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this
problem could become a bottleneck,

Adam

what about STL's unique_copy?

Tom

--
_______________ _______________ _______________ _______________ ____________
Dipl.-Ing. Thomas Maier-Komor http://www.rcs.ei.tum.de
Institute for Real-Time Computer Systems (RCS) fon +49-89-289-23578
Technische Universitaet Muenchen, D-80290 Muenchen fax +49-89-289-23555

Jul 23 '05 #2

Ivan Vecerina

"Adam Hartshorne" <or********@yah oo.com> wrote in message
news:cv******** **@wisteria.csv .warwick.ac.uk. ..

As a result of a graphics based algorihtms, I have a list of indices to a
set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence the
output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this problem
could become a bottleneck,

An "easy" way would be to use:
std::multimap< int/*nodeIndex*/, std::vector<int/*arrayIndex*/> > myList;
// for each index:
myList[aNodeIndex].push_back( anArrayIndex );

Likely to be more efficient:
std::vector< std::pair<int/*nodeIndex*/,int/*arrayIndex*/> > myList;
myList.reserve( theSizeOfTheArr ayOfIndices );
// for each index:
myList.push_bac k( std::pair<int,i nt>( aNodeIndex, anArrayIndex ) );
std::sort( myList.begin(), myList.end() );
// --> scan for consecutive items with the same node index

A hash_map (or unordered_map) could be tested too, but I would expect
the vector version to be faster (just a guess...).
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form
Brainbench MVP for C++ <> http://www.brainbench.com

Jul 23 '05 #3

Adam Hartshorne

Ivan Vecerina wrote:

"Adam Hartshorne" <or********@yah oo.com> wrote in message
news:cv******** **@wisteria.csv .warwick.ac.uk. ..
As a result of a graphics based algorihtms, I have a list of indices to a
set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence the
output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this problem
could become a bottleneck,
An "easy" way would be to use:
std::multimap< int/*nodeIndex*/, std::vector<int/*arrayIndex*/> > myList;
// for each index:
myList[aNodeIndex].push_back( anArrayIndex );

Likely to be more efficient:
std::vector< std::pair<int/*nodeIndex*/,int/*arrayIndex*/> > myList;
myList.reserve( theSizeOfTheArr ayOfIndices );
// for each index:
myList.push_bac k( std::pair<int,i nt>( aNodeIndex, anArrayIndex ) );
std::sort( myList.begin(), myList.end() );
// --> scan for consecutive items with the same node index

A hash_map (or unordered_map) could be tested too, but I would expect
the vector version to be faster (just a guess...).
Ivan

Maybe I'm missing something, but using this way
An "easy" way would be to use:
std::multimap< int/*nodeIndex*/, std::vector<int/*arrayIndex*/> > myList; // for each index:
myList[aNodeIndex].push_back( anArrayIndex );

will give me the list of lists, but I only want to consider those nodes
which are mentioned multiple times in the storage array. The above will
form me the list of lists, based upon the node indices and for each of
those a list of array indices.

I would then have to search / sort the whole MyList to isolate the
elements in the new MyList that had multiple values stored. Is that correct?

Adam

Jul 23 '05 #4

Karl Heinz Buchegger

Adam Hartshorne wrote:

As a result of a graphics based algorihtms, I have a list of indices to
a set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence
the output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this
problem could become a bottleneck,

Yep. That definitly will become an issue for large point sets.
What you need to do:
sort the points(*) and keep an eye of where the point was in the
original data structure.
You might want to use a helper data structure for that.

After the sort has been done, all points with identical coordinates
are consecutive and the additional information will tell you where
it was in the original data set.

(*) sorting criterium:
if x_coordinates are equal
if y_coordinates are equal
return z1 < z2
else
return y1 < y2
else
return x1 < x2

--
Karl Heinz Buchegger
kb******@gascad .at

Jul 23 '05 #5

Adam Hartshorne

Karl Heinz Buchegger wrote:

Adam Hartshorne wrote:
As a result of a graphics based algorihtms, I have a list of indices to
a set of nodes.

I want to efficiently identify any node indices that are stored multiple
times in the array and the location of them in the array /list. Hence
the output being some list of lists, containing groups of indices of the
storage array that point to the same node index.

This is obviously a trivial problem, but if my storage list is large and
the set of nodes large (and hence lots of repeated indices) this
problem could become a bottleneck,

Yep. That definitly will become an issue for large point sets.
What you need to do:
sort the points(*) and keep an eye of where the point was in the
original data structure.
You might want to use a helper data structure for that.

After the sort has been done, all points with identical coordinates
are consecutive and the additional information will tell you where
it was in the original data set.

(*) sorting criterium:
if x_coordinates are equal
if y_coordinates are equal
return z1 < z2
else
return y1 < y2
else
return x1 < x2

I think you may have misunderstood, there are no actual point
coordinates. Simply a list of points, a list of lines and a list that is
been used to link lines to the points.

What I am concerned with is the linking list. So say the following

I = {10,10,4,6,5,5}

That says lines 1 and 2 are linked to node 10, line 3 to node 4 etc etc

What I want is a result of the search that gives me

O = {10{1,2}, 5{5,6}}

Jul 23 '05 #6

Karl Heinz Buchegger

Adam Hartshorne wrote:

I think you may have misunderstood,
Maybe
there are no actual point
coordinates. Simply a list of points, a list of lines and a list that is
been used to link lines to the points.

What I am concerned with is the linking list. So say the following

I = {10,10,4,6,5,5}

That says lines 1 and 2 are linked to node 10, line 3 to node 4 etc etc

What I want is a result of the search that gives me

O = {10{1,2}, 5{5,6}}

Same strategy.
Set up a helper datastructure

struct SortHelper
{
int NodeIndex;
int OriginalPositio n;
}

and create an array (or whatever) of that:

I = { 10, 4, 8, 10, 4, 5 }

becomes

{ 10, 1 }
{ 4, 2 }
{ 8, 3 }
{ 10, 4 }
{ 4, 5 }
{ 5, 6 }

Now sort that array according to NodeIndex:

{ 4, 2 }
{ 4, 5 }
{ 5, 6 }
{ 8, 3 }
{ 10, 1 }
{ 10, 4 }

and scan through it: there are 2 consecutive '4' Nodes in the list and they
appeared in the original I at positions 2 and 5. '5' is single and thus
of no interest to you (if I understand correctly), same for '8'. But then
there is 10 which occours 2 times in I at positions 1 and 4.

The strategy is always the same. If you need to compare each element with each
other element in a datastructure, you have a potential O(n^2) algorithm. If
possible (and often it is), sort that thing such that equal elements get
consecutive. Sorting is of order O(n*log(n)), plus an additional O(n) for
running through the data structure and sorting things out. Much better
then O(n^2) for large values of n.

--
Karl Heinz Buchegger
kb******@gascad .at

Jul 23 '05 #7

Ivan Vecerina

"Adam Hartshorne" <or********@yah oo.com> wrote in message
news:cv******** **@wisteria.csv .warwick.ac.uk. ..

Maybe I'm missing something, but using this way
An "easy" way would be to use:
std::multimap< int/*nodeIndex*/, std::vector<int/*arrayIndex*/> >

myList;

NB: I actually meant to write std::map< .... >

// for each index:
myList[aNodeIndex].push_back( anArrayIndex );

will give me the list of lists, but I only want to consider those nodes
which are mentioned multiple times in the storage array. The above will
form me the list of lists, based upon the node indices and for each of
those a list of array indices.

I would then have to search / sort the whole MyList to isolate the
elements in the new MyList that had multiple values stored.
Is that correct?

Yes: sorting first tends to be the fastest way to find
identical values in a list.

This said, in your case, aNodeIndex values are in a know 0-based
interval. Because of that, you could probably use a faster approach:
// initial map filled with -1 to say no ArrayIndex points to that node
std::vector<int > nodeToFirstInd( maxNodeIndex, -1 );

// this will store only nodes with multiple indices
std::map< int/*nodeIndex*/, std::vector<int/*arrayIndex*/ > > multiLinked;

for(....)// for each arrayIndex, nodeIndex pair:
{
if( nodeToFirstInd[nodeIndex]==-1 )
nodeToFirstInd[nodeIndex] = arrayIndex; // mark node as 'used'
else {
std::vector<int/*arrayIndex*/ >& list = multiLinked[nodeIndex];
if( list.empty() ) // put the initial item in
list.push_back( nodeToFirstInd[nodeIndex] )
list.push_back( arrayIndex ); // add the new value index
}
}

// now multiLinked contains what you want
Sorry the code samples are a mess - just written in a rush.
I hope it is understandable and helpful, though.

Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form

Jul 23 '05 #8

Similar topics

6955

finding and extracting from a string

by: lecichy | last post by:

Hello Heres the situation: I got a file with lines like: name:second_name:somenumber:otherinfo etc with different values between colons ( just like passwd file) What I want is to extract some part of it like all names or numbers from each line, simply text fom between e.g. second and third colon. And turn it

PHP

3167

Efficiently & safely (re)filling array from $_POST

by: Raptor | last post by:

I'm using a single script to generate a table with <input>s in each row. I fill the array with initial values, then write it out to the table and let the user edit the values. Something like: $myarray = $array(1, 2, 3, ... 100); echo 'Enter your changes, then click Submit:'; foreach ($array as $i) echo '<table tags> <input value="'.$i.'" name="index.'$i.'"> <table tags>';

PHP

2956

Extracting Numerica Data Pairs from Text Box

by: Michael Hill | last post by:

Hi, folks. I am writing a Javascript program that accepts (x, y) data pairs from a text box and then analyzes that data in various ways. This is my first time using text area boxes; in the past, I have used individual entry fields for each variable. I would now like to use text area boxes to simplify the data entry (this way, data can be produced by another program--FORTRAN, "C", etc.--but analyzed online, so long as it is first...

Javascript

1200

extracting unique strings from text file

by: Bubbles | last post by:

Hello. New to ASP.NET and struggling on this one. I have a text file with a bunch of text in it. Throughout the file words followed by a ":" will appear. I need to pull every such string out and then display all unique values. I am able to read the file in, but am having trouble getting regex to work. What regex and setting should I use?

ASP.NET

1935

Extracting unique values ??

by: jerryyang_la1 | last post by:

I'm reading in a CSV and displaying using the code below: $users = file("text.txt"); echo "<table border='1' width='75%' align='center'>"; echo "<tr>"; echo "<td width='33%' align='center'><b>Username</b></td>"; echo "<td width='33%' align='center'><b>Domain Name</b></td>"; echo "<td width='34%' align='center'><b>Date & Time</b></td>"; echo "</tr>"; foreach ($users as $user) {

PHP

2596

How to best extract a list of identical keys in a sorted ArrayList with BinarySearch ?

by: Guy | last post by:

Is there a better way to search identical elements in a sorted array list than the following: iIndex = Array.BinarySearch( m_Array, 0, m_Array.Count, aSearchedObject ); aFoundObject= m_Array; m_ResultArray.Add ( aFoundObject);

C# / C Sharp

2682

How to efficiently determine if a string contains any one of many strings

by: | last post by:

I am interested in scanning web pages for content of interest, and then auto-classifying that content. I have tables of metadata that I can use for the classification, e.g. : "John P. Jones" "Jane T. Smith" "Fred Barzowsky" "Department of Oncology" "Office of Student Affairs" "Lewis Hall" etc. etc. etc. I am wondering what the efficient way to do this in code might be. The dumb and brute-force way would be to loop through the content...

C# / C Sharp

3647

Extracting single dimensional array out of two dimensional array

by: Mukesh | last post by:

Hi, I am using framework 2.0. I am writing a foreach loop that will extract single dimensional arrays out of double dimensional array. I am trying writing something like this. string strDetails foreach(string str in strDetails) { //Code comes here }

C# / C Sharp

19846

Store list box values in database using an array

by: Ajinkya | last post by:

Hello friends ! , I am very new to java script.If anyone can help me then I will be very very thankful to his/her. I am using php and mysql in my project and I have one textarea and one list boxes,now whenever user fill any value(email-Id) to textarea and press submit button then value going to listbox. These values are one or multiple , then they goes into list box , In list box these values are store in array . I mean user...

Javascript

9422

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10208

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10038

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9987

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8867

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

6662

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5444

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3952

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2812

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General