473,471 Members | 1,937 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Illogical std::vector size?

Hi,

First some background.

I have a structure,

struct sFileData
{
char*sSomeString1;
char*sSomeString2;
int iSomeNum1;
int iSomeNum2;
sFileData(){...};
~sFileData(){...};
sFileData(const sFileData&){...};
const sFileData operator=( const sFileData &s ){...}
};

std::vector< sFileData, std::allocator<sFileData>> address_;

for the sake of simplicity I remove the body of the 'tors
I have no memory leaks as far as I can tell.

Then I read a file, (each line is 190 chars mostly blank spaces).
In each line I 'read' info to fill in the structure.

Because there are some many blank spaces in the line I make sure that my
data is 'trimmed'.

So in effect sSomeString1 and sSomeString2 are never more than 10 chars,
(although in the file they could be up to 40 chars).

I chose vectors because after reading the file I need to do searches of
sSomeString1 and sSomeString2, (no other reasons really).

But my problem is the size of address_ is not consistent with the size of
the file.

The file is around 13Mb with around 100000 'lines' of 190 chars each.
Because I remove blank spaces and I convert 2 numbers to int, (from char). I
guess I should not use more than half, 5Mb.

But after loading I see that I used around 40Mb, (3 times more than the
original size).

as far as I can tell you cannot really tell the size of a vector, but I use
windows and the task manager and I can see the size of my app before and
after reading the file, (I do nothing else).

So what could be the reason for those inconsistencies?
How could I optimize my code to compress those 40mb even more?

Many thanks

Simon

Jul 23 '05 #1
24 2888
simon schreef:
Hi,

First some background.

I have a structure,

struct sFileData
{
char*sSomeString1;
char*sSomeString2;
int iSomeNum1;
int iSomeNum2;
sFileData(){...};
~sFileData(){...};
sFileData(const sFileData&){...};
const sFileData operator=( const sFileData &s ){...}
};

std::vector< sFileData, std::allocator<sFileData>> address_;

for the sake of simplicity I remove the body of the 'tors
I have no memory leaks as far as I can tell.

Then I read a file, (each line is 190 chars mostly blank spaces).
In each line I 'read' info to fill in the structure.

Because there are some many blank spaces in the line I make sure that my
data is 'trimmed'.

So in effect sSomeString1 and sSomeString2 are never more than 10 chars,
(although in the file they could be up to 40 chars).

I chose vectors because after reading the file I need to do searches of
sSomeString1 and sSomeString2, (no other reasons really).

But my problem is the size of address_ is not consistent with the size of
the file.

The file is around 13Mb with around 100000 'lines' of 190 chars each.
Because I remove blank spaces and I convert 2 numbers to int, (from char). I
guess I should not use more than half, 5Mb.

But after loading I see that I used around 40Mb, (3 times more than the
original size).

as far as I can tell you cannot really tell the size of a vector, but I use
windows and the task manager and I can see the size of my app before and
after reading the file, (I do nothing else).


1) Windows Task Manager is not suited for this
2) vector only stores sFileData objects, not the strings themselves
3) Even when vector has excess size (which is common, don't want to
reallocate after each pusch_back) it won't include the strings
4) Many implementations of new[] allocate at least 16 bytes, plus
the overhead needed for delete[]
5) So what? 40MB is not a lot. Worry when it exceeds 1.5Gb. Memory
is cheap. Writing a custom string class is not. BTDT.

HTH,
Michiel Salters

Jul 23 '05 #2
>
1) Windows Task Manager is not suited for this
yea, but it was what raised suspicion in the first place.
What might be better?
2) vector only stores sFileData objects, not the strings themselves
3) Even when vector has excess size (which is common, don't want to
reallocate after each pusch_back) it won't include the strings
4) Many implementations of new[] allocate at least 16 bytes, plus
the overhead needed for delete[]
Are you saying that std::string might actually be better in that case?
What might be a better way?
5) So what? 40MB is not a lot. Worry when it exceeds 1.5Gb. Memory
is cheap. Writing a custom string class is not. BTDT.
40Mb or 4Gb, there is still something not quite right, and i would preffer
to know what it is rather than brushing it under the rug.

HTH,
Michiel Salters

Jul 23 '05 #3
> > 1) Windows Task Manager is not suited for this

yea, but it was what raised suspicion in the first place.
What might be better?


What suspicion? It only tells that the total program is now using 40 MB
not that this particular part of your program is using 40 MB.
Or if it is a change in the total amount of memory used there could be
a near infinite number of other reasons that the program is now using
40 MB instead of the expected 5 MB increase.
You'd need a profiler to check if it is indeed the vector of structs
that is the problem.

Jul 23 '05 #4

> 1) Windows Task Manager is not suited for this
yea, but it was what raised suspicion in the first place.
What might be better?


What suspicion?


That i was doing something wrong or that i did not understand something
else.
It only tells that the total program is now using 40 MB
not that this particular part of your program is using 40 MB.
Or if it is a change in the total amount of memory used there could be
a near infinite number of other reasons that the program is now using
40 MB instead of the expected 5 MB increase.
You'd need a profiler to check if it is indeed the vector of structs
that is the problem.


I placed a break point before reading the file, check the memory, and one
after reading the file.
I then compared the before and after.
The odds of it been something else are fairly small, IMO.

Simon
Jul 23 '05 #5
20 MB of file in memory is a good starter to explain away the
discrepancy.
You need something better then two brwakpoints and the taskmanager to
make statements about what uses the memory.

Jul 23 '05 #6
"simon" <sp********@myoddweb.com> wrote in message
news:3h************@individual.net
Hi,

First some background.

I have a structure,

struct sFileData
{
char*sSomeString1;
char*sSomeString2;
int iSomeNum1;
int iSomeNum2;
sFileData(){...};
~sFileData(){...};
sFileData(const sFileData&){...};
const sFileData operator=( const sFileData &s ){...}
};

std::vector< sFileData, std::allocator<sFileData>> address_;

for the sake of simplicity I remove the body of the 'tors
I have no memory leaks as far as I can tell.

Then I read a file, (each line is 190 chars mostly blank spaces).
In each line I 'read' info to fill in the structure.

Because there are some many blank spaces in the line I make sure that
my data is 'trimmed'.

So in effect sSomeString1 and sSomeString2 are never more than 10
chars, (although in the file they could be up to 40 chars).

I chose vectors because after reading the file I need to do searches
of sSomeString1 and sSomeString2, (no other reasons really).

But my problem is the size of address_ is not consistent with the
size of the file.

The file is around 13Mb with around 100000 'lines' of 190 chars each.
Because I remove blank spaces and I convert 2 numbers to int, (from
char). I guess I should not use more than half, 5Mb.

But after loading I see that I used around 40Mb, (3 times more than
the original size).

as far as I can tell you cannot really tell the size of a vector, but
I use windows and the task manager and I can see the size of my app
before and after reading the file, (I do nothing else).


I think your problem has nothing to do with the vector. As has already been
pointed out, the vector doesn't store the characters, only the pointer. With
VC++, sizeof(sFileData) is 16. The memory used by the vector should be
16*address_.capacity() plus a small amount of overhead, which we can
approximate with sizeof(address_). Try this:

#include <vector>
#include <iostream>
using namespace std;

struct sFileData
{
char*sSomeString1;
char*sSomeString2;
int iSomeNum1;
int iSomeNum2;
sFileData(){}
~sFileData(){}
sFileData(const sFileData&){}
const sFileData operator=( const sFileData &s ){ return *this;}
};

std::vector< sFileData, std::allocator<sFileData> > address_;
int main()
{
sFileData sfd;
for(int i=0; i<100000; ++i)
address_.push_back(sfd);
cout << "storage size of vector is approx ";
cout << sizeof(address_)+sizeof(sFileData)*address_.capaci ty() << endl;
return 0;
}

When I run this, I get

storage size of vector is approx 2212100

and task manager similarly shows about a 2Mb increase in memory useage.
Accordingly, it seems that the other 38Mb is due to whatever else you are
doing to allocate memory for the characters --- unless, as someone else
suggested, you are reading the whole file into memory and not taking that
into account.
--
John Carson

Jul 23 '05 #7
simon wrote:
1) Windows Task Manager is not suited for this

yea, but it was what raised suspicion in the first place.
What might be better?

2) vector only stores sFileData objects, not the strings themselves
3) Even when vector has excess size (which is common, don't want to
reallocate after each pusch_back) it won't include the strings
4) Many implementations of new[] allocate at least 16 bytes, plus
the overhead needed for delete[]

Are you saying that std::string might actually be better in that case?
What might be a better way?


No, std::string have the same problem. That isn't the cause.
5) So what? 40MB is not a lot. Worry when it exceeds 1.5Gb. Memory
is cheap. Writing a custom string class is not. BTDT.

40Mb or 4Gb, there is still something not quite right, and i would preffer
to know what it is rather than brushing it under the rug.


40 Mb is about (42000000 bytes)
42000000 bytes / 100000 records = 420 bytes/record.

Even if each new char* uses at least 16 bytes, one record uses about
40 bytes.
Definetively I think you have a problem elsewhere probably in loop
Jul 23 '05 #8
> 20 MB of file in memory is a good starter to explain away the
discrepancy.
I don't read the whole file in memory.

I use fopen(...)
read each chunk of data using fread(...) and then close the file using
fclose(...).
You need something better then two brwakpoints and the taskmanager to
make statements about what uses the memory.


Well i am sorry, but i can see that before i read the file i use x amount of
memory and that just after i finish reading the file, (after the fclose), i
used x+40mb.

Simon

Jul 23 '05 #9
>> Hi,

First some background.

I have a structure,
I think your problem has nothing to do with the vector. As has already
been pointed out, the vector doesn't store the characters, only the
pointer. With VC++, sizeof(sFileData) is 16. The memory used by the vector
should be 16*address_.capacity() plus a small amount of overhead, which we
can approximate with sizeof(address_). Try this:
<snip code >

When I run this, I get

storage size of vector is approx 2212100
I will try that, but i should get the same thing myself.

and task manager similarly shows about a 2Mb increase in memory useage.
Accordingly, it seems that the other 38Mb is due to whatever else you are
doing to allocate memory for the characters --- unless, as someone else
suggested, you are reading the whole file into memory and not taking that
into account.

I don't read the whole file in memory.

I use fopen(...)
read each chunk of data using fread(...) and then close the file using
fclose(...).

Simon
Jul 23 '05 #10
>>

Are you saying that std::string might actually be better in that case?
What might be a better way?


No, std::string have the same problem. That isn't the cause.


That's what i thought.

40Mb or 4Gb, there is still something not quite right, and i would
preffer to know what it is rather than brushing it under the rug.


40 Mb is about (42000000 bytes)
42000000 bytes / 100000 records = 420 bytes/record.

Even if each new char* uses at least 16 bytes, one record uses about
40 bytes.
Definetively I think you have a problem elsewhere probably in loop


I cannot see where the problem might be.
I read each line of data, create a structure with the data.

and then push_back(...) the data.

Simon
Jul 23 '05 #11
simon wrote:
Hi,

First some background.

I have a structure,

struct sFileData
{
char*sSomeString1;
char*sSomeString2;
int iSomeNum1;
int iSomeNum2;
sFileData(){...};
~sFileData(){...};
sFileData(const sFileData&){...};
const sFileData operator=( const sFileData &s ){...}
};

std::vector< sFileData, std::allocator<sFileData>> address_;

for the sake of simplicity I remove the body of the 'tors
I have no memory leaks as far as I can tell.

Then I read a file, (each line is 190 chars mostly blank spaces).
In each line I 'read' info to fill in the structure.

Because there are some many blank spaces in the line I make sure that my
data is 'trimmed'.

So in effect sSomeString1 and sSomeString2 are never more than 10 chars,
(although in the file they could be up to 40 chars).

I chose vectors because after reading the file I need to do searches of
sSomeString1 and sSomeString2, (no other reasons really).

But my problem is the size of address_ is not consistent with the size of
the file.

The file is around 13Mb with around 100000 'lines' of 190 chars each.
Because I remove blank spaces and I convert 2 numbers to int, (from char). I
guess I should not use more than half, 5Mb.

But after loading I see that I used around 40Mb, (3 times more than the
original size).

as far as I can tell you cannot really tell the size of a vector, but I use
windows and the task manager and I can see the size of my app before and
after reading the file, (I do nothing else).

So what could be the reason for those inconsistencies?
How could I optimize my code to compress those 40mb even more?

Many thanks

Simon


Please provide a complete (compilable) code example that
demonstrates the problem. Include the complete struct def
for sFileData (ctors, dtors, operator=, etc) and a simple
main() that uses the SAME fopen/fread/push_back/fclose code
block/loop used in your real program (your problem may be there).

Regards,
Larry
Jul 23 '05 #12

Please provide a complete (compilable) code example that
demonstrates the problem. Include the complete struct def
for sFileData (ctors, dtors, operator=, etc) and a simple
main() that uses the SAME fopen/fread/push_back/fclose code
block/loop used in your real program (your problem may be there).

Regards,
Larry


if i run the code bellow i use 700k just before the for(...) loop
At the end of the loop i have 16880.

So I used around 16Mb to store what appears to be

(5+8)*100000 = 800005; or just under a Mb
Plus the size of the struct, around 2Mb I should not use more than 4Mb to
store the data.

Did I miss something here?

/////////////////////////////////

#include <vector>
#include <iostream>
using namespace std;

struct sFileData
{
char*sSomeString1;
char*sSomeString2;
int iSomeNum1;
int iSomeNum2;
sFileData(){
NullAll();
}
~sFileData(){
CleanAll();
}
sFileData(const sFileData&sfd)
{
NullAll();
*this = sfd;
}
const sFileData& operator=( const sFileData &sfd ){
if( this != &sfd)
{
CleanAll();
iSomeNum1 = sfd.iSomeNum1;
iSomeNum2 = sfd.iSomeNum2;

if( sfd.sSomeString1 ){
sSomeString1 = new char[strlen(sfd.sSomeString1)+1];
strcpy( sSomeString1, sfd.sSomeString1 );
}
if( sfd.sSomeString2 ){
sSomeString2 = new char[strlen(sfd.sSomeString2)+1];
strcpy( sSomeString2, sfd.sSomeString2 );
}
}
return *this;
}

void CleanAll(){
if(sSomeString1) delete [] sSomeString1;
if(sSomeString2) delete [] sSomeString2;
}
void NullAll(){
sSomeString1 = 0;
sSomeString2 = 0;
iSomeNum1 = 0;
iSomeNum2 = 0;
}

};

std::vector< sFileData, std::allocator<sFileData> > address_;
int main()
{
for(int i=0; i<100000; ++i)
{
sFileData sfd;

sfd.iSomeNum1 = 1;
sfd.iSomeNum2 = 2;
sfd.sSomeString1 = new char[5];
sfd.sSomeString2 = new char[8];
strcpy( sfd.sSomeString1, "Helo" );
strcpy( sfd.sSomeString2, "Goodbye" );

address_.push_back(sfd);
}
return 0;
}
/////////////////////////////////

Thanks

Simon
Jul 23 '05 #13
> if i run the code bellow i use 700k just before the for(...) loop
At the end of the loop i have 16880.

So I used around 16Mb to store what appears to be

(5+8)*100000 = 800005; or just under a Mb
Plus the size of the struct, around 2Mb I should not use more than 4Mb to
store the data.

Did I miss something here?


And it is dog slow as well.
There must be a better way.

Simon
Jul 23 '05 #14
On 2005-06-23 13:31:32 -0400, "simon" <sp********@myoddweb.com> said:

Please provide a complete (compilable) code example that
demonstrates the problem. Include the complete struct def
for sFileData (ctors, dtors, operator=, etc) and a simple
main() that uses the SAME fopen/fread/push_back/fclose code
block/loop used in your real program (your problem may be there).

Regards,
Larry


if i run the code bellow i use 700k just before the for(...) loop
At the end of the loop i have 16880.

So I used around 16Mb to store what appears to be

(5+8)*100000 = 800005; or just under a Mb
Plus the size of the struct, around 2Mb I should not use more than 4Mb
to store the data.

Did I miss something here?


I suspect that your problem is that you're using the Windows Task
Manager to determine memory usage. If you use real debugging and memory
tracking tools, you're more likely to get accurate results.

[OT]For example, when I run your code on my computer (a Mac), and use
the debugging tools provided with the OS (MallocDebug), I see that the
code has allocated approximately 1.5 MB of memory.

However, if I look at the memory usage with other tools such as top, I
see that my program is taking about 41.7M total of VM. I strongly
suspect that this is analogous to what you are seeing on your
machine.[/OT]

Another possibility is that the implementation of the STL on your
platform doesn't return memory to the OS directly, but keeps it around
in case it's needed later. If this is the case, then every time the
vector reallocates itself, the old buffer *might* look to the OS as if
it were a leak. If this is happening, it would account for as much as
(assuming I did my estimations correctly) (16 * 100000 / 2 *
sizeof(sFileData)), or roughly 12MB.

Try again with proper tools (ask on a news group appropriate to
developing/debugging on your platform, and/or check that newsgroups FAQ
for such tools). Such tools should show you exactly what's happening.

--
Clark S. Cox, III
cl*******@gmail.com

Jul 23 '05 #15
>>
if i run the code bellow i use 700k just before the for(...) loop
At the end of the loop i have 16880.

So I used around 16Mb to store what appears to be

(5+8)*100000 = 800005; or just under a Mb
Plus the size of the struct, around 2Mb I should not use more than 4Mb to
store the data.

Did I miss something here?
I suspect that your problem is that you're using the Windows Task Manager
to determine memory usage. If you use real debugging and memory tracking
tools, you're more likely to get accurate results.


Maybe, but if windows thinks I am using +16Mb then I must try and fix the
problem.
But I realize that it is not the right group for this.

The speed is a bit of a problem as well. I cannot believe how long it takes
to fill 4Mb.

Try again with proper tools (ask on a news group appropriate to
developing/debugging on your platform, and/or check that newsgroups FAQ
for such tools). Such tools should show you exactly what's happening.


I might do that then.

Simon
Jul 23 '05 #16

"simon" <sp********@myoddweb.com> wrote in message
news:3i************@individual.net...

Please provide a complete (compilable) code example that
demonstrates the problem. Include the complete struct def
for sFileData (ctors, dtors, operator=, etc) and a simple
main() that uses the SAME fopen/fread/push_back/fclose code
block/loop used in your real program (your problem may be there).

Regards,
Larry
if i run the code bellow i use 700k just before the for(...) loop
At the end of the loop i have 16880.

So I used around 16Mb to store what appears to be

(5+8)*100000 = 800005; or just under a Mb


??? 5+8 = 13, 13*100000 = 1300000 -> 1.3MB
Plus the size of the struct, around 2Mb I should not use more than 4Mb to
store the data.

Did I miss something here?
Try adding "address_.reserve(100000);" just before your for loop.

Although I can't imagine why you're not using std::string's here to obviate
the manual memory management.
/////////////////////////////////

#include <vector>
#include <iostream>
using namespace std;

struct sFileData
{
char*sSomeString1;
char*sSomeString2;
int iSomeNum1;
int iSomeNum2;
sFileData(){
NullAll();
}
~sFileData(){
CleanAll();
}
sFileData(const sFileData&sfd)
{
NullAll();
*this = sfd;
}
const sFileData& operator=( const sFileData &sfd ){
if( this != &sfd)
{
CleanAll();
iSomeNum1 = sfd.iSomeNum1;
iSomeNum2 = sfd.iSomeNum2;

if( sfd.sSomeString1 ){
sSomeString1 = new char[strlen(sfd.sSomeString1)+1];
strcpy( sSomeString1, sfd.sSomeString1 );
}
if( sfd.sSomeString2 ){
sSomeString2 = new char[strlen(sfd.sSomeString2)+1];
strcpy( sSomeString2, sfd.sSomeString2 );
}
}
return *this;
}

void CleanAll(){
if(sSomeString1) delete [] sSomeString1;
if(sSomeString2) delete [] sSomeString2;
}
void NullAll(){
sSomeString1 = 0;
sSomeString2 = 0;
iSomeNum1 = 0;
iSomeNum2 = 0;
}

};

std::vector< sFileData, std::allocator<sFileData> > address_;

int main()
{
for(int i=0; i<100000; ++i)
{
sFileData sfd;

sfd.iSomeNum1 = 1;
sfd.iSomeNum2 = 2;
sfd.sSomeString1 = new char[5];
sfd.sSomeString2 = new char[8];
strcpy( sfd.sSomeString1, "Helo" );
strcpy( sfd.sSomeString2, "Goodbye" );

address_.push_back(sfd);
}
return 0;
}
/////////////////////////////////

Thanks

Simon

Jul 23 '05 #17
>>
So I used around 16Mb to store what appears to be

(5+8)*100000 = 800005; or just under a Mb
??? 5+8 = 13, 13*100000 = 1300000 -> 1.3MB


OOps, my bad...


Try adding "address_.reserve(100000);" just before your for loop.
Sorry, that's doesn't help, but loading is faster...

Although I can't imagine why you're not using std::string's here to
obviate the manual memory management.


maybe, but if i replace std::string all over, i still use the same amount of
memory, (16Mb).

Simon
Jul 23 '05 #18
simon wrote:
Please provide a complete (compilable) code example that
demonstrates the problem. Include the complete struct def
for sFileData (ctors, dtors, operator=, etc) and a simple
main() that uses the SAME fopen/fread/push_back/fclose code
block/loop used in your real program (your problem may be there).

Regards,
Larry
if i run the code bellow i use 700k just before the for(...) loop
At the end of the loop i have 16880.

So I used around 16Mb to store what appears to be

(5+8)*100000 = 800005; or just under a Mb
Plus the size of the struct, around 2Mb I should not use more than 4Mb to
store the data.

Did I miss something here?

/////////////////////////////////

#include <vector>
#include <iostream>

// for strlen, strcpy
#include <string.h>

using namespace std;

struct sFileData
{
char*sSomeString1;
char*sSomeString2;
int iSomeNum1;
int iSomeNum2;
sFileData(){
NullAll();
}
~sFileData(){
CleanAll();
}
sFileData(const sFileData&sfd)
{
NullAll();
*this = sfd;
}
const sFileData& operator=( const sFileData &sfd ){
if( this != &sfd)
{
CleanAll();
iSomeNum1 = sfd.iSomeNum1;
iSomeNum2 = sfd.iSomeNum2;

if( sfd.sSomeString1 ){
sSomeString1 = new char[strlen(sfd.sSomeString1)+1];
strcpy( sSomeString1, sfd.sSomeString1 );
}
if( sfd.sSomeString2 ){
sSomeString2 = new char[strlen(sfd.sSomeString2)+1];
strcpy( sSomeString2, sfd.sSomeString2 );
}
}
return *this;
}
void CleanAll(){
if(sSomeString1) { delete [] sSomeString1; sSomeString1 = 0; }
if(sSomeString2) { delete [] sSomeString2; sSomeString2 = 0; }
}

void NullAll(){
sSomeString1 = 0;
sSomeString2 = 0;
iSomeNum1 = 0;
iSomeNum2 = 0;
}

};

std::vector< sFileData, std::allocator<sFileData> > address_;
int main()
{
for(int i=0; i<100000; ++i)
{
sFileData sfd;

sfd.iSomeNum1 = 1;
sfd.iSomeNum2 = 2;
sfd.sSomeString1 = new char[5];
sfd.sSomeString2 = new char[8];
strcpy( sfd.sSomeString1, "Helo" );
strcpy( sfd.sSomeString2, "Goodbye" );

address_.push_back(sfd);
// the push_back() (above) makes a COPY of sfd,
// and puts that copy into the vector.
// the copy constructor for sFileData allocates
// space to hold copies of the strings from the
// sFileData being copied from. So we must free
// the strings in 'sfd' after the push_back() (new
// copies were allocated by the copy of 'sfd' that
// was added to the vector), otherwise they will
// not be freed and we will have a memory leak until
// the program ends (i.e. 100000 extra copies of
// sSomeString1 and sSomeString2 will exist).
sfd.CleanAll();
}
return 0;
}
/////////////////////////////////

Thanks

Simon


See the comments and changes embedded in the code above.

Test 1:
On my machine (Gateway PII 450MHZ with 384MB of RAM),
adding some 'cout' and 'clock()' statements to the above code
(which I named simon.cpp), and compiling with GCC g++ v3.3.5,
I got this:

larry@linux:~/x> ./simon

sizeof sFileData = 16
MINIMUM memory used per sFileData = 29 (16 + 5 + 8)
MINIMUM memory used for 100000 sFileData instances = 2,900,000
press any alpha key followed by Enter to start
v
execution time = 0.76 secs
press any alpha key followed by Enter to finish
v

The working-memory set for ./simon was 5,616KB

On most operating systems, memory allocated by malloc or new, then
freed with free or delete is not returned to the operating system until
the program terminates. This memory remains in the program's heap
where it MAY be reused by malloc and new to fulfill additional requests
from memory. Large numbers of small allocations/deallocations tend
to fragment the heap, causing inefficient memory usage.
This is not an STL issue, as you'll see in the paragraph labeled
"Test 3" below, it is a dynamic memory allocation issue.

Test 2:
I modified the program to use std::string instead of allocating
char[] with new. The execution time dropped from 0.76 secs to
0.68 secs, but the memory usage increased by approx 400KB. The
400KB is a relatively fixed memory overhead due to additional STL
stuff brought in by std::string.

Test 3:
Next I modified the program to use fixed length char[] arrays for
the 2 strings (no usage of malloc or new at all). The execution time
dropped to 0.30 seconds and the memory usage dropped to 2624KB
(sizeof(sFileData) is now 24 because of the 2 fixed length char[]
arrays).

The above experiments help demonstrate the overhead incurred
when large numbers of dynamic allocations and deallocations
are made using malloc, new, free, and delete.

Regards,
Larry
Jul 23 '05 #19
>>
/////////////////////////////////

#include <vector>
#include <iostream>

// for strlen, strcpy
#include <string.h>


I did not need that, maybe it's a window thing.
using namespace std;
void CleanAll(){
if(sSomeString1) { delete [] sSomeString1; sSomeString1 = 0; }
if(sSomeString2) { delete [] sSomeString2; sSomeString2 = 0; }
}
Ok, but that's just 'good' practice, not really related to my prblem.


// the push_back() (above) makes a COPY of sfd,
// and puts that copy into the vector.
// the copy constructor for sFileData allocates
// space to hold copies of the strings from the
// sFileData being copied from. So we must free
// the strings in 'sfd' after the push_back() (new
// copies were allocated by the copy of 'sfd' that
// was added to the vector), otherwise they will
// not be freed and we will have a memory leak until
// the program ends (i.e. 100000 extra copies of
// sSomeString1 and sSomeString2 will exist).
sfd.CleanAll();
Are you sure the destructor would not handle it?
See the comments and changes embedded in the code above.

Test 1:
On my machine (Gateway PII 450MHZ with 384MB of RAM),
adding some 'cout' and 'clock()' statements to the above code
(which I named simon.cpp), and compiling with GCC g++ v3.3.5,
I got this:

larry@linux:~/x> ./simon

sizeof sFileData = 16
MINIMUM memory used per sFileData = 29 (16 + 5 + 8)
MINIMUM memory used for 100000 sFileData instances = 2,900,000
press any alpha key followed by Enter to start
v
execution time = 0.76 secs
press any alpha key followed by Enter to finish
v

The working-memory set for ./simon was 5,616KB
I get 16Mb

On most operating systems, memory allocated by malloc or new, then
freed with free or delete is not returned to the operating system until
the program terminates. This memory remains in the program's heap
where it MAY be reused by malloc and new to fulfill additional requests
from memory. Large numbers of small allocations/deallocations tend
to fragment the heap, causing inefficient memory usage.
This is not an STL issue, as you'll see in the paragraph labeled
"Test 3" below, it is a dynamic memory allocation issue.
I see, i will try and ask the MFC group to see how i can release the memory
to the system.

Test 2:
I modified the program to use std::string instead of allocating
char[] with new. The execution time dropped from 0.76 secs to
0.68 secs, but the memory usage increased by approx 400KB. The
400KB is a relatively fixed memory overhead due to additional STL
stuff brought in by std::string.
Yes, i also noticed that.

Test 3:
Next I modified the program to use fixed length char[] arrays for
the 2 strings (no usage of malloc or new at all). The execution time
dropped to 0.30 seconds and the memory usage dropped to 2624KB
(sizeof(sFileData) is now 24 because of the 2 fixed length char[]
arrays).
Also noticed it. A bit of a shame really, the memory just does seem to be
handled properly at all.
by doing test 3 it appears that less memory is used.
I understand that the memory is free, but it is not freed to other
applications, on some smaller system that can impact the performances.

I wish I knew how to release the memory.

The above experiments help demonstrate the overhead incurred
when large numbers of dynamic allocations and deallocations
are made using malloc, new, free, and delete.


Thanks,

Simon.
Jul 23 '05 #20
simon wrote:
/////////////////////////////////

#include <vector>
#include <iostream>
// for strlen, strcpy
#include <string.h>


I did not need that, maybe it's a window thing.

Your compiler made assumptions about strlen and strcpy.
You should always include the appropriate header for
functions used; you'll prevent a lot of subtle errors
that way.

using namespace std;
void CleanAll(){
if(sSomeString1) { delete [] sSomeString1; sSomeString1 = 0; }
if(sSomeString2) { delete [] sSomeString2; sSomeString2 = 0; }
}


Ok, but that's just 'good' practice, not really related to my prblem.

Good practice is always good practice. Sometimes failure to
follow good practice can lead to hard to trace bugs.


// the push_back() (above) makes a COPY of sfd,
// and puts that copy into the vector.
// the copy constructor for sFileData allocates
// space to hold copies of the strings from the
// sFileData being copied from. So we must free
// the strings in 'sfd' after the push_back() (new
// copies were allocated by the copy of 'sfd' that
// was added to the vector), otherwise they will
// not be freed and we will have a memory leak until
// the program ends (i.e. 100000 extra copies of
// sSomeString1 and sSomeString2 will exist).
sfd.CleanAll();
Are you sure the destructor would not handle it?

You may be right. I tend to forget that objects within
a loop get constructed/destructed on each loop iteration.
It's a hold-over from 'the really olden days' of C++; so I
always (sigh - yes still) do my own cleanup.

See the comments and changes embedded in the code above.

Test 1:
On my machine (Gateway PII 450MHZ with 384MB of RAM),
adding some 'cout' and 'clock()' statements to the above code
(which I named simon.cpp), and compiling with GCC g++ v3.3.5,
I got this:

larry@linux:~/x> ./simon

sizeof sFileData = 16
MINIMUM memory used per sFileData = 29 (16 + 5 + 8)
MINIMUM memory used for 100000 sFileData instances = 2,900,000
press any alpha key followed by Enter to start
v
execution time = 0.76 secs
press any alpha key followed by Enter to finish
v

The working-memory set for ./simon was 5,616KB
I get 16Mb


16MB is a lot for such a simple program...

I'm running Linux and you're running MS Windows, and
we're using different compilers & libs; so we can't
really compare - it's not apples-to-apples.
[snip]


Thanks,

Simon.


Regards,
Larry
Jul 23 '05 #21
simon wrote:

struct sFileData
{
char*sSomeString1;
char*sSomeString2;
int iSomeNum1;
int iSomeNum2;
sFileData(){ NullAll(); }
~sFileData(){ CleanAll(); }
sFileData(const sFileData&sfd) { NullAll(); *this = sfd; }
const sFileData& operator=( const sFileData &sfd ) {
if( this != &sfd)
{
CleanAll();
iSomeNum1 = sfd.iSomeNum1;
iSomeNum2 = sfd.iSomeNum2;

if( sfd.sSomeString1 ){
sSomeString1 = new char[strlen(sfd.sSomeString1)+1];
strcpy( sSomeString1, sfd.sSomeString1 );
}
if( sfd.sSomeString2 ){
sSomeString2 = new char[strlen(sfd.sSomeString2)+1];
strcpy( sSomeString2, sfd.sSomeString2 );
}
}
return *this;
}

void CleanAll(){
if(sSomeString1) delete [] sSomeString1;
if(sSomeString2) delete [] sSomeString2;
}
BTW, these if() tests are not needed, as delete[] is defined
to have no effect if the pointer is null.
void NullAll(){
sSomeString1 = 0;
sSomeString2 = 0;
iSomeNum1 = 0;
iSomeNum2 = 0;
}
std::vector< sFileData, std::allocator<sFileData> > address_;
It appears (from discussion elsewhere in the thread) that your
problem is excessive memory chewing by copying strings around
all over the place.

In an ideal world, your compiler would optimise to avoid this,
but it looks like we are in the real world :) I would suggest
trying some hand-optimisations in this case.
(Before you do this, try compiling in release mode instead of
debug mode, that may help).
sFileData sfd;
sfd.iSomeNum1 = 1;
sfd.iSomeNum2 = 2;
sfd.sSomeString1 = new char[5];
sfd.sSomeString2 = new char[8];
strcpy( sfd.sSomeString1, "Helo" );
strcpy( sfd.sSomeString2, "Goodbye" );
address_.push_back(sfd);


Try creating the object directly in the vector.. then the strings
only have to be allocated once:

address_.resize( address_.size() + 1 );
sFileData &sfd = address_.end()[-1];
sfd.iSomeNum1 = 1;
sfd.iSomeNum2 = 2;
sfd.sSomeString1 = new char[5];
sfd.sSomeString2 = new char[8];
strcpy( sfd.sSomeString1, "Helo" );
strcpy( sfd.sSomeString2, "Goodbye" );

You could avoid vector copies (using lots of time and memory)
by reserving all of the memory at the start:

address_.reserve(100000);

In your real program, where you don't know the exact size in
advance, you might like to estimate this number from the file size,
or something. Or, preferably,use a deque or a list, which don't
require huge reallocations when you add new members.

Another thing you might do (if the strings aren't going to be
manipulated too much by the rest of your program) is to allocate
them both at once (since your OS probably has a minimum allocation
size anyway, we will be halving memory usage):

sfd.sSomeString1 = new char[13];
sfd.sSomeString2 = sfd.sSomeString1 + 5;

(and modify your destructors and operator= accordingly).

Jul 23 '05 #22
simon schreef:

1) Windows Task Manager is not suited for this


yea, but it was what raised suspicion in the first place.
What might be better?
2) vector only stores sFileData objects, not the strings themselves
3) Even when vector has excess size (which is common, don't want to
reallocate after each pusch_back) it won't include the strings
4) Many implementations of new[] allocate at least 16 bytes, plus
the overhead needed for delete[]


Are you saying that std::string might actually be better in that case?
What might be a better way?


Some std::string implementations have a sizeof()==16 but need not use
the heap. That would definitely save memory, and it's also faster
However, do check to see if replacing vector with deque helps.
5) So what? 40MB is not a lot. Worry when it exceeds 1.5Gb. Memory
is cheap. Writing a custom string class is not. BTDT.


40Mb or 4Gb, there is still something not quite right, and i would prefer
to know what it is rather than brushing it under the rug.


Is it worth it? It's expensive to find out. You'd need a profiler, or
something like that. I once had to find out, because a colleague had
used char*, exceeded an implementation limit (2Gb) and therefore
had to break up his program in 6 stages, each taking 4 hours. Finding
out what was wrong, writing a custom replacement string and merging the
stages took me several weeks, but reduced runtime to 1 hour total.
Do you have the $$$ for that?

HTH,
Michiel Salters

Jul 23 '05 #23
> > 40Mb or 4Gb, there is still something not quite right, and i would prefer
to know what it is rather than brushing it under the rug.


(Disclaimer: As others already pointed out task manager is not the best
tool for memory profiling)

2. one thing you could do to see if your code is at fault is to read
the file, do everything you do in your real code, but don't actually
store the data in the vector. Then use the task manager to see how the
memory consumption behaves.

If you get the same (or almost the same) memory growth, then windows
uses up the memory for reading the file.

If you get significantly different results, then you can continue
hunting the cause in your code.

Just my 2 cents worth.

Csaba

Jul 23 '05 #24
>> > 40Mb or 4Gb, there is still something not quite right, and i would
> prefer
> to know what it is rather than brushing it under the rug.

(Disclaimer: As others already pointed out task manager is not the best
tool for memory profiling)


I only used it to notice there was a problem.
If Windows report 16Mb been used then that must be close to the truth.

But I agree that it is not an exact science

2. one thing you could do to see if your code is at fault is to read
the file, do everything you do in your real code, but don't actually
store the data in the vector. Then use the task manager to see how the
memory consumption behaves.
I tried that already, with no vector call the memory usage is negligible,
700k.
The speed is around 1sec.

With the insert everything jumps to 16Mb and around 30sec.

I posted a small piece of code in this thread that does not even use files.
It clearly shows that something that should not be more than 5Mb grows to
around 25Mb.

If you get significantly different results, then you can continue
hunting the cause in your code.
Just my 2 cents worth.


It looks like windows does not see the need to free the memory.
Maybe if my machine was a bit closer to the memory limits it would not
happen.
I don't know how to release the memory, but it looks like an OS issue to me.

Thanks.

Simon
Jul 23 '05 #25

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Chris Thompson | last post by:
Hi I'm writing a p2p client for an existing protocol. I used a std::vector<char> as a buffer for messages read from the server. The message length is the first 4 bytes. The message code the...
18
by: Janina Kramer | last post by:
hi ng, i'm working on a multiplayer game for a variable number of players and on the client side, i'm using a std::vector<CPlayer> to store informatik about the players. CPlayer is a class that...
20
by: Anonymous | last post by:
Is there a non-brute force method of doing this? transform() looked likely but had no predefined function object. std::vector<double> src; std::vector<int> dest; ...
17
by: Michael Hopkins | last post by:
Hi all I want to create a std::vector that goes from 1 to n instead of 0 to n-1. The only change this will have is in loops and when the vector returns positions of elements etc. I am calling...
8
by: Ross A. Finlayson | last post by:
I'm trying to write some C code, but I want to use C++'s std::vector. Indeed, if the code is compiled as C++, I want the container to actually be std::vector, in this case of a collection of value...
7
by: Dilip | last post by:
If you reserve a certain amount of memory for a std::vector, what happens when a reallocation is necessary because I overshot the limit? I mean, say I reserve for 500 elements, the insertion of...
4
by: mathieu | last post by:
Hello, I am looking at the API of std::vector but I cannot find a way to specify explicitely the size of my std::vector. I would like to avoid vector::resize since it first initializes the...
3
by: n.torrey.pines | last post by:
I'd like to be able to view two contiguous elements of a vector as a pair. Assuming I'm not accessing the last element, of course, and the element type is not bool, when is it safe to do so,...
23
by: Mike -- Email Ignored | last post by:
In std::vector, is reserve or resize required? On: Linux mbrc32 2.6.22.1-41.fc7 #1 SMP Fri Jul 27 18:10:34 EDT 2007 i686 athlon i386 GNU/Linux Using: g++ (GCC) 4.1.2 20070502 (Red Hat...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.