473,395 Members | 1,941 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Logic check please

I'm reading through records in a file creating bitset objects from them.
Each bitset object has a score and I'm trying to collect just the top 2000
scores from over a million records.

I tihnk the best way to go is to create a vector, add the objects to it in
ascending order while keeping track of the size and lowest score. If the
size reaches the 2000 mark then evaluate the score and if it's lower than
the lowest score, discard the object otherwise traverse the vector and
insert it at its appropriate space and finally chop off the beginning of the
vector so that we're still in the 2000 range.

When I say best way I mean the way I thought of given my somewhat limited
experience with stl containers. I was also thinking about determining if the
object belongs in the 2000 set based upon it's score and then tossing it in
the end of the vector and doing a vector sort but I figured I'd skip the
sort and insert it myself at the proper position.

I didn't see anything that sets a vectors size as fixed and only inserts if
above this range type of thing but I guess that's where you override the
allocater? I haven't been down that road but nows a good time to jump in I
guess.

Thanks

This convoluted method below if you care to read and process is what I came
up with.

iSize = m_BitSetVector.size();

// if the vector is empty then just add it and collect the score
if(iSize == 0)
{
iLowestTotal = currentSet.GetScore();
m_BitSetVector.push_back(currentSet);
continue;
}
bool bAtFullCapacity = false;
if(m_BitSetVector.size() > m_iTotalScoresToKeep)
{
bAtFullCapacity = true;

// if the score is less than the lowest in the vector and
// we're at full capacity then continue
if(currentSet.GetScore() < m_BitSetVector.begin()->GetScore())
continue;
}

BitSetVector::iterator p;
bool bInserted = false;

for(p = m_BitSetVector.begin(); p != m_BitSetVector.end(); ++p)
{
if(currentSet.GetScore() < p->GetScore())
{
// insert at this location
m_BitSetVector.insert(p, currentSet);
bInserted = true;

if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;

// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}

break;
}
}

// if it's not inserted yet then
if(!bInserted)
{
m_BitSetVector.push_back(currentSet);

if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;

// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}
}
Sep 21 '05 #1
5 1185
Rich S. wrote:
I'm reading through records in a file creating bitset objects from them.
Each bitset object has a score and I'm trying to collect just the top 2000
scores from over a million records.


A more efficient way than what you describe would be
to use priority_queue. Define the predicate in such
a way that the top() always returns the lowest scoring
object.

Let q be the priority_queue and suppose you want to
find the top N scores (where N is 2000 in your example),
the pseudo-code would be something like this:

q.reserve(N);

// add the first N elements to the priority queue
for (i = 0; i < N; ++i)
{
if (!ReadRecord(&obj))
break;

q.push(obj);
}

// invariant: q contains highest N elements
// q.top() is the lowest of the highest N elements
while (ReadRecord(&obj))
{
if (obj > q.top())
{
q.pop();
q.push(obj);
}
}

Sep 21 '05 #2
Rich S. wrote:
I'm reading through records in a file creating bitset objects from them.
Each bitset object has a score and I'm trying to collect just the top 2000
scores from over a million records.

I tihnk the best way to go is to create a vector, add the objects to it in
ascending order while keeping track of the size and lowest score. If the
size reaches the 2000 mark then evaluate the score and if it's lower than
the lowest score, discard the object otherwise traverse the vector and
insert it at its appropriate space and finally chop off the beginning of the
vector so that we're still in the 2000 range.

When I say best way I mean the way I thought of given my somewhat limited
experience with stl containers. I was also thinking about determining if the
object belongs in the 2000 set based upon it's score and then tossing it in
the end of the vector and doing a vector sort but I figured I'd skip the
sort and insert it myself at the proper position.

I didn't see anything that sets a vectors size as fixed and only inserts if
above this range type of thing but I guess that's where you override the
allocater? I haven't been down that road but nows a good time to jump in I
guess.

Thanks

This convoluted method below if you care to read and process is what I came
up with.

iSize = m_BitSetVector.size();

// if the vector is empty then just add it and collect the score
if(iSize == 0)
{
iLowestTotal = currentSet.GetScore();
m_BitSetVector.push_back(currentSet);
continue;
}
bool bAtFullCapacity = false;
if(m_BitSetVector.size() > m_iTotalScoresToKeep)
{
bAtFullCapacity = true;

// if the score is less than the lowest in the vector and
// we're at full capacity then continue
if(currentSet.GetScore() < m_BitSetVector.begin()->GetScore())
continue;
}

BitSetVector::iterator p;
bool bInserted = false;

for(p = m_BitSetVector.begin(); p != m_BitSetVector.end(); ++p)
{
if(currentSet.GetScore() < p->GetScore())
{
// insert at this location
m_BitSetVector.insert(p, currentSet);
bInserted = true;

if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;

// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}

break;
}
}

// if it's not inserted yet then
if(!bInserted)
{
m_BitSetVector.push_back(currentSet);

if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;

// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}
}


Just call std::nth_element() to group the 2,000 highest items together
at the end of their container.

Here's a program to illustrate:

#include <vector>
#include <algorithm>
#include <iostream>

int main()
{
const int kTop2K = 2000;

std::vector<int> numberV;

// add numbers 1 - 1,000,000
for (int i=0; i < 10000000; i++)
numberV.push_back( i);

// shuffle the vector's contents
std::random_shuffle( numberV.begin(), numberV.end());

// visually confirm items are in random order
std::cout << "Last 2,000 elements (shuffled) " << std::endl;
for ( int i= numberV.size() - kTop2K; i < numberV.size() ; i++)
std::cout << numberV[i] << " ";
std::cout << std::endl << std::endl;

assert( numberV.size() >= kTop2K);

// use nth_element with the nth element being the 2,000th
// position from the end of the container

std::nth_element(numberV.begin(),
numberV.end()-kTop2K,
numberV.end());

// the top 2,000 items have been moved after the nth element
// all lower ranked items are are now before nth element.
// Note that neither range has been completely sorted

std::cout << "Last 2,000 elements (final): " << std::endl;

// visually inspect results.
// all numbers listed should be > 998,000

for (int i=numberV.size() - kTop2K; i < numberV.size() ; i++)
std::cout << numberV[i] << " ";
}

Greg

Sep 22 '05 #3

<ni*****@microsoft.com> wrote in message
news:11**********************@g49g2000cwa.googlegr oups.com...
Rich S. wrote:
I'm reading through records in a file creating bitset objects from them.
Each bitset object has a score and I'm trying to collect just the top
2000
scores from over a million records.


A more efficient way than what you describe would be
to use priority_queue. Define the predicate in such
a way that the top() always returns the lowest scoring
object.

Let q be the priority_queue and suppose you want to
find the top N scores (where N is 2000 in your example),
the pseudo-code would be something like this:

q.reserve(N);

// add the first N elements to the priority queue
for (i = 0; i < N; ++i)
{
if (!ReadRecord(&obj))
break;

q.push(obj);
}

// invariant: q contains highest N elements
// q.top() is the lowest of the highest N elements
while (ReadRecord(&obj))
{
if (obj > q.top())
{
q.pop();
q.push(obj);
}
}

I tried it and it worked but the only down side is that when I go to dump
the 2000 they're are in ascending order and I wish them to be in descending
order. There must be a way to dump the p queue into a vector quickly and
sort that and then dump it.

Thanks, the logic is much cleaner now.
Sep 22 '05 #4

"Greg" <gr****@pacbell.net> wrote in message
news:11**********************@z14g2000cwz.googlegr oups.com...
Rich S. wrote:
I'm reading through records in a file creating bitset objects from them.
Each bitset object has a score and I'm trying to collect just the top
2000
scores from over a million records.

I tihnk the best way to go is to create a vector, add the objects to it
in
ascending order while keeping track of the size and lowest score. If the
size reaches the 2000 mark then evaluate the score and if it's lower than
the lowest score, discard the object otherwise traverse the vector and
insert it at its appropriate space and finally chop off the beginning of
the
vector so that we're still in the 2000 range.

When I say best way I mean the way I thought of given my somewhat limited
experience with stl containers. I was also thinking about determining if
the
object belongs in the 2000 set based upon it's score and then tossing it
in
the end of the vector and doing a vector sort but I figured I'd skip the
sort and insert it myself at the proper position.

I didn't see anything that sets a vectors size as fixed and only inserts
if
above this range type of thing but I guess that's where you override the
allocater? I haven't been down that road but nows a good time to jump in
I
guess.

Thanks

This convoluted method below if you care to read and process is what I
came
up with.

iSize = m_BitSetVector.size();

// if the vector is empty then just add it and collect the score
if(iSize == 0)
{
iLowestTotal = currentSet.GetScore();
m_BitSetVector.push_back(currentSet);
continue;
}
bool bAtFullCapacity = false;
if(m_BitSetVector.size() > m_iTotalScoresToKeep)
{
bAtFullCapacity = true;

// if the score is less than the lowest in the vector and
// we're at full capacity then continue
if(currentSet.GetScore() < m_BitSetVector.begin()->GetScore())
continue;
}

BitSetVector::iterator p;
bool bInserted = false;

for(p = m_BitSetVector.begin(); p != m_BitSetVector.end(); ++p)
{
if(currentSet.GetScore() < p->GetScore())
{
// insert at this location
m_BitSetVector.insert(p, currentSet);
bInserted = true;

if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;

// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}

break;
}
}

// if it's not inserted yet then
if(!bInserted)
{
m_BitSetVector.push_back(currentSet);

if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;

// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}
}


Just call std::nth_element() to group the 2,000 highest items together
at the end of their container.

Here's a program to illustrate:

#include <vector>
#include <algorithm>
#include <iostream>

int main()
{
const int kTop2K = 2000;

std::vector<int> numberV;

// add numbers 1 - 1,000,000
for (int i=0; i < 10000000; i++)
numberV.push_back( i);

// shuffle the vector's contents
std::random_shuffle( numberV.begin(), numberV.end());

// visually confirm items are in random order
std::cout << "Last 2,000 elements (shuffled) " << std::endl;
for ( int i= numberV.size() - kTop2K; i < numberV.size() ; i++)
std::cout << numberV[i] << " ";
std::cout << std::endl << std::endl;

assert( numberV.size() >= kTop2K);

// use nth_element with the nth element being the 2,000th
// position from the end of the container

std::nth_element(numberV.begin(),
numberV.end()-kTop2K,
numberV.end());

// the top 2,000 items have been moved after the nth element
// all lower ranked items are are now before nth element.
// Note that neither range has been completely sorted

std::cout << "Last 2,000 elements (final): " << std::endl;

// visually inspect results.
// all numbers listed should be > 998,000

for (int i=numberV.size() - kTop2K; i < numberV.size() ; i++)
std::cout << numberV[i] << " ";
}

Greg


That's good and I'll definately keep that snippet for later use but I don't
think I want to store all of the objects in memory at the same time because
I'll quickly run out of memory.

Sep 22 '05 #5
"Rich S." wrote:

I tried it and it worked but the only down side is that when I go to dump
the 2000 they're are in ascending order and I wish them to be in descending
order. There must be a way to dump the p queue into a vector quickly
A loop ?

std::vector<T> Result;
Result.reserve( N );

while( !q.empty() )
Result.push_back( q.top() );
q.pop();
}
and
sort that
No need to sort it.
The items poped from the queue already are sorted. Just iterate
through the vector in reverse order.
and then dump it.


If you don't want to create that vector, you could eg. use
a recursive function to flip the order:

void Dump( priority_queue< T, std::vector<int>, MyLess<int> > q )
{
if( q.empty() )
return;

T Value = q.top();
q.pop();

Dump( q );
cout << Value << endl;
}

but I doubt, that this is faster, clearer or uses less memory space then
the vector.

--
Karl Heinz Buchegger
kb******@gascad.at
Sep 22 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Bartosz Wegrzyn | last post by:
I need help with sessions. I createt set of web site for nav with authorization. first I go into main.php which looks like this: <?php //common functions include_once '../login/common.php';...
6
by: Ted Singh | last post by:
(I apologize in advance if this is an obvious question, my experience is more with console or client-only apps) I am trying to build an web-based HTML UI, which will work with a web-server on...
7
by: David Shorthouse | last post by:
I am attempting to create a "new account creation" asp, but would ideally like the routine to check the Access db for an existing email address and username (called UID below). The select query...
2
by: Andreas Schmitt | last post by:
Hi, Sorry for posting in German before, totally forgot about that when I was pasting this in here from another German newsgroup I was writing to, trying to get help I am programming a simple...
2
by: Jake | last post by:
Hi, I originally posted in the db group but thought this might be more appropriate here. I have a scheduling application which allows the user to reserve timeslots for various things but I'm...
0
by: antonyliu2002 | last post by:
The full code is pasted below. It looks scarily long, but it's pretty simple. It is adapted from a sample code at MSDN. If you run it, and check it out from your browser, you'll see: ...
1
by: Kondapanaidu | last post by:
Hi, What is the logic to compare Values. First array={5,5,6,7}; Second array={5,3,3,2}; Explanation: 1.First array first element is greater than the second array first element, No need to...
15
by: Jay | last post by:
I have a multi threaded VB.NET application (4 threads) that I use to send text messages to many, many employees via system.timer at a 5 second interval. Basically, I look in a SQL table (queue) to...
2
by: nsharish20 | last post by:
Hi, I have a sensor in the shape of a small circle and a hole which is larger than the sensor circle. Basically i have a small circle and a large circle. Now i would like to check these...
1
by: jonathan184 | last post by:
how to monitor and find out if files test1_* and test2_* files were sent in an hour and if not send an email This is on a unix system basically I got a cronjob that runs every sec polling a ftp dir...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.