473,396 Members | 2,037 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Performance help

hi, i'm reading a binary data file, which is basically a list of 7296
offsets pointing to data blocks which represent scanlines containing
three numbers; leftx, id, rightx. i can read these fine and push them
into vectors using;

for (int a = 0; a < 7296; a++)
{
for (int c = 1; ; c=c+2)
{
province *temp_prov = new province;
cords *temp_cords = new cords;

file.seekg (4*(7296+(1*c)+offsets[a]));

file.read ((char *)&temp_cords->left, sizeof(short));
file.read ((char *)&temp_prov->id, sizeof(short));
file.read ((char *)&temp_cords->right, sizeof(short));
temp_cords->line = a;

temp_prov->lines.push_back (*temp_cords);
provinces.push_back (*temp_prov);

if (temp_cords->right == 18944)
{
delete temp_cords;
delete temp_prov;
break;
}
else
{
delete temp_cords;
delete temp_prov;
}
}
}

while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces[b].id && a != b)
{
provinces[a].lines.push_back(provinces[b].lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
i'll never know because it apparently will takes hours to complete. i
should mention that the amount of elements in provinces is; 557,406.
any alternative approaches that'll let me do this quickly on a Pentium
3 800?

ps; if you want a better description of the file than i can provide
http://www.inferis.org/eu2/tbl/id.tbl.asp

Sep 10 '07 #1
7 1653
Steve Chow wrote:
[..] here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces[b].id && a != b)
{
provinces[a].lines.push_back(provinces[b].lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
i'll never know because it apparently will takes hours to complete. i
should mention that the amount of elements in provinces is; 557,406.
any alternative approaches that'll let me do this quickly on a Pentium
3 800? [..]
Don't know of a particular comipiler, but if you could (a) instead of
erasing the 'b' right away mark it for erasure later, and then (b) do
remove_if() on your vector with one erase:

provinces.erase(std::remove_if(provinces.begin(),
provinces.end(),
is_specially_marked()),
provinces.end());

You're going to get the improvement you seek.

Also, while it's not necessarily an improvement, try 'std::list' with
your current algorithm (you won't be able to use index 'a' or 'b', but
use the iterators instead, same difference).

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Sep 10 '07 #2
In article <11**********************@w3g2000hsg.googlegroups. com>,
ms*********@hotmail.com says...

[ ... ]
for (int a = 0; a < 7296; a++)
{
for (int c = 1; ; c=c+2)
{
province *temp_prov = new province;
cords *temp_cords = new cords;
There doesn't seem to be any reason to allocate these dynamically. I'd
just use:

province temp_prov;
temp_cords cords;
>
file.seekg (4*(7296+(1*c)+offsets[a]));

file.read ((char *)&temp_cords->left, sizeof(short));
file.read ((char *)&temp_prov->id, sizeof(short));
file.read ((char *)&temp_cords->right, sizeof(short));
temp_cords->line = a;

temp_prov->lines.push_back (*temp_cords);
provinces.push_back (*temp_prov);
Changing those to automatic variables means you can reduce this:
>
if (temp_cords->right == 18944)
{
delete temp_cords;
delete temp_prov;
break;
}
else
{
delete temp_cords;
delete temp_prov;
}
Down to simply:
if ((temp_cords->right == 18944)
break;

The next step would be to overload operator>for struct cords to make
the mainline of the code a bit cleaner:

std::istream &operator>>(std::istream &is, cords &c) {
is.read((char *)c->left, sizeof(c->left));
is.read((char *)c->id, sizeof(c->id));
is.read((char *)c->right, sizeof(c->right));
return is;
}

Then your loop looks something like this:

for (int a=0; a<7296; a++) {
cords temp_cords = {};

for (int c=1; temp_cords.right != 18944; c+=2) {
province temp_prov;

file.seekg(4*(7296+c+offsets[a]));

file >temp_cords;
temp_cords.line = a;

temp_prov.lines.push_back(temp_cord);
provinces.push_back(temp_prov);
}
while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces[b].id && a != b)
{
provinces[a].lines.push_back(provinces[b].lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}
Okay, if I understand this, the idea is to merge all the lines forom
provinces that have the same id into a single province. If that's the
case, I'd start by sorting based on id -- then all the lines with the
same id will be right next to each other. Being the way I am, I'd
probably also avoid doing the manipulation in-place. Instead, I'd modify
the previous loop to just create a vector of cords. Then, after I was
done reading those, I'd sort the cords by id, and then copy the data for
the cords into the vector of provinces.

Right now, you have an O(N*N) pair of loops, so it executes.

A quick sort (for example) will do the sort in approximately N*lg2(N)
operations, and then the merge will take about N operations.

Since N is 557,406, your pair of loops executes 310,701,448,836
iterations. The quick sort will take approximately 10,639,972 operations
and the merge takes 557,406 operations.

If we figure all the operations involved areabout the same speed, the
sort/merge should be about 29,000 _times_ as fast as what you're doing
now. Some of the operations in the sort are probably a little slower,
but I'd expect a _substantial_ improvement anyway.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Sep 10 '07 #3
LR
Steve Chow wrote:
hi, i'm reading a binary data file, which is basically a list of 7296
offsets pointing to data blocks which represent scanlines containing
three numbers; leftx, id, rightx. i can read these fine and push them
into vectors using;
What kind of vector do you push these onto?

provinces? What type is that?

>
for (int a = 0; a < 7296; a++)
{
for (int c = 1; ; c=c+2)
{
It's not clear to me why you're new'ing these each time through
the loop. Why not just make temps like,
cords temp_cords;
//province *temp_prov = new province;
//cords *temp_cords = new cords;

file.seekg (4*(7296+(1*c)+offsets[a]));
have to rewrite temp_cords-as needed
file.read ((char *)&temp_cords->left, sizeof(short));
file.read ((char *)&temp_prov->id, sizeof(short));
file.read ((char *)&temp_cords->right, sizeof(short));
temp_cords->line = a;
province temp_prov;
temp_prov.lines.push_back (temp_cords);
provinces.push_back (temp_prov);
ok, here if you're deleteing in both cases,
if you use new, why not just delete
unconditionally?
delete temp_cords;
delete temp_prov;
>
if (temp_cords->right == 18944)
{
break;
}
and this whole else block has no real purpose
that I can see
else
{
delete temp_cords;
delete temp_prov;
}
}
}

while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
starting here from zero each time?
doesn't that mean that you'll count some
instances twice?

consider
.... b = a + 1; ....

BTW, if you have a vector,
std::vector<SomeTypevec;
then your for loop should probably look like:
for(std::vector<SomeType>::size_type i = 0; ...
some compilers will give an error if you use int.
{
if (provinces[a].id == provinces[b].id && a != b)
I don't know what provinces[a].id is, and I
don't know how it got initialized.
{
provinces[a].lines.push_back(provinces[b].lines[0]);
provinces.erase (provinces.begin()+b);
what should the next value of b be as
you go through the loop?
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
i'll never know because it apparently will takes hours to complete. i
should mention that the amount of elements in provinces is; 557,406.
Then perhaps you should construct a smaller data test set so you can
test your algorithm.
any alternative approaches that'll let me do this quickly on a Pentium
3 800?
And perhaps, more importantly correctly? I don't think that the
platform is relevant.

BTW, have you thought about std::map?

LR
Sep 10 '07 #4

Steve Chow wrote:
while ugly and most likely wrong, i /assure/ you it works. here is my
problem; i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like

for (int a = 0; a < provinces.size (); a++)
{
for (int b = 0; b < provinces.size (); b++)
{
if (provinces[a].id == provinces[b].id && a != b)
{
provinces[a].lines.push_back(provinces[b].lines[0]);
provinces.erase (provinces.begin()+b);
}
}
}

but this is SLOWER than slow and im sure its probably 100% wrong but
Bad code using vectors.
Because vectors keep an array format, erasing on positions other than
the vector end also moves all the elements after the segment erased to
their new positions, which may not be a method as efficient as erasing
in other kinds of sequence containers (deque, list).And also this
invalidates all iterator and references to elements after that
position.

vectors are good only at removing element from end, not between. you
are doing N*N time erase on vector of large size.it would be slow.

Sep 10 '07 #5
Steve Chow <ms*********@hotmail.comwrites:
[...]
i want to move the line data of all areas with a matching id
into one vector, and then remove them. i was thinking something like
As others have mentioned, a vector probably isn't the right data
structure for this. An STL multimap sounds might do most of the work
for you, and the nonstandard but common hash_multimap would be
somewhat faster if you have one (there's probably one in Boost if
not). You would use the provice ID as the key. See:

http://www.sgi.com/tech/stl/Multimap.html
http://www.sgi.com/tech/stl/hash_multimap.html

-----Scott.
Sep 10 '07 #6
thanks for all your help! if you're interested i'll post the code i
come up with after taking all your advice.

Sep 10 '07 #7
In article <ly************@gfn.org>, sg******@suspectclass.com says...

[ ... ]
As others have mentioned, a vector probably isn't the right data
structure for this.
I think a vector can work just fine for this -- but it requires doing
the job a bit differently. If (as I advised) you start with separate
vectors, so you copy the data from one to the other, and then clear the
first one all at once, you encounter no problem.

If the extra memory usage from temporarily having two complete copies of
the data is a problem, you can avoid that as well: iterate through the
first vector in reverse, and erase each item after its data has been
copied to the second vector. Since this only ever erases items from the
end, the erases are quick and efficient.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Sep 11 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
by: Brian Patterson | last post by:
I have noticed in the book of words that hasattr works by calling getattr and raising an exception if no such attribute exists. If I need the value in any case, am I better off using getattr...
5
by: rc | last post by:
Hi We have a SQL server on Win2k. the physical size of the db is about 40G and the main table has approx 65m rows in it. At the moment the entire database is on one data file. The entire server...
12
by: serge | last post by:
I have an SP that is big, huge, 700-800 lines. I am not an expert but I need to figure out every possible way that I can improve the performance speed of this SP. In the next couple of weeks I...
6
by: teedilo | last post by:
We have an application with a SQL Server 2000 back end that is fairly database intensive -- lots of fairly frequent queries, inserts, updates -- the gamut. The application does not make use of...
7
by: Randell D. | last post by:
Folks, I have a Javascript performance question that I might have problems explaining... In PHP, better performance can be obtained dealing directly with a variable, as opposed to an element...
4
by: Martin | last post by:
I am using graphics as backgrounds for forms,buttons,labels etc. The question is: is it faster to load all graphics from files on app start or to use it embeded (places in editor during design)....
5
by: Markus Ernst | last post by:
Hello A class that composes the output of shop-related data gets some info from the main shop class. Now I wonder whether it is faster to store the info in the output class or get it from the...
1
by: jvn | last post by:
I am experiencing a particular problem with performance counters. I have created a set of classes, that uses System.Diagnostics.PerformanceCounter to increment custom performance counters (using...
4
by: skotapal | last post by:
Hello I manage a web based VB .net application. This application has 3 components: 1. Webapp (this calls the executibles) 2. database 3. business logic is contained in individual exe...
0
by: Eric Davidson | last post by:
As part of my thesis for my MSc Course with the Open University UK, I need to collect various performance statics for IBM's DB2 database on Windows. To this end I have developed a performance...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.