By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,428 Members | 1,845 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,428 IT Pros & Developers. It's quick & easy.

stl vector/memory question

P: n/a
Hi,

I'm working on a scheme to store several millions of records (Subdata),
each of which contains a variable number of sub records (Celldata). My concern is that
my prototype uses a lot more memory than I'm expecting. To store 1.7million Subdatas
that have an average of 7 Celldatas, I'm using 85 Mb. Thats about 50 bytes/record, where as
actual data is about 14 bytes/record (average).

Any insight is greatly appreciated!
Ken

Cliffnoted code is as follows:

class Celldata
{
public:
short cellnum:10,
time_stamp:6;

Celldata(int, int);
};
class Subdata
{
public:
vector<Celldata> cells;

int add_cell(int cell, int time);
};

class Esn_global
{
public:

vector<Subdata> sublists;
}
int
Subdata::add_cell( int cell, int time)
{
Celldata newcell(cell, time);
cells.push_back(newcell);
return 1;
}
Esn_global g_esn;

main()
{
for (loop forever)
{
-read record containing subscriber identifier and cell
- add cell to subcribers list (if not already on list)
}

}
I've trimmed some of the code out, there is another structure that takes a subscriber's
identifier (phone number) and returns an index into g_esn.sublists that corresponds to that users
data.

I'm wondering if I should be making new class instances with a 'new' and then pushing onto my
vectors?

I've also played around with resizing the vectors by 2 entries when they are full to prevent them from
doubling in size when they fill; this didn't seem to be a benefit....

I was pushing all the instances onto vectors to avoid any memory leaking and also thought it would be better
for daily audits of the data. Maybe this is my problem?

thanks

Jul 22 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Kenneth W Del Signore <kw**@lucent.com> wrote in
news:41**************@lucent.com:
Hi,

I'm working on a scheme to store several millions of records
(Subdata),
each of which contains a variable number of sub records (Celldata).
My concern is that my prototype uses a lot more memory than I'm
expecting. To store 1.7million Subdatas that have an average of 7
Celldatas, I'm using 85 Mb. Thats about 50 bytes/record, where as
actual data is about 14 bytes/record (average).
[snip]
I've also played around with resizing the vectors by 2 entries when
they are full to prevent them from doubling in size when they fill;
this didn't seem to be a benefit....
Note that when one resizes a vector, it is not required that the vector
have the exact capacity that you specify, it may allocate more memory than
you specify. So if you .resize(2), the vector may actually have allocated
memory to hold 16.

Reminder that .reserve and .resize don't do the same thing....
I was pushing all the instances onto vectors to avoid any memory
leaking and also thought it would be better for daily audits of the
data. Maybe this is my problem?


I suppose if you want stricter control, you might try std::list ?
Jul 22 '05 #2

P: n/a
Andre Kostur wrote:
I suppose if you want stricter control, you might try std::list ?


With 12 bytes per record, a list isn't about to solve the OP's memory
woes, except maybe with a little list unrolling. Dealing with all
operations on unrolled lists is complicated, but if there isn't an
unrolled list template out there there should be.
--
Derrick Coetzee
I grant this newsgroup posting into the public domain. I disclaim all
express or implied warranty and all liability. I am not a professional.
Jul 22 '05 #3

P: n/a
Kenneth W Del Signore wrote:

I'm working on a scheme to store several millions of records (Subdata),
each of which contains a variable number of sub records (Celldata). My concern is that
my prototype uses a lot more memory than I'm expecting. To store 1.7million Subdatas
that have an average of 7 Celldatas, I'm using 85 Mb. Thats about 50 bytes/record, where as
actual data is about 14 bytes/record (average).
[...]

class Celldata
{
public:
short cellnum:10,
time_stamp:6;

Celldata(int, int);
};

class Subdata
{
public:
vector<Celldata> cells;

int add_cell(int cell, int time);
};

class Esn_global
{
public:

vector<Subdata> sublists;
}

[...]

Let's assume pointer and integer size is 4 for the sake of calculations.

First, look at the dynamic part of
vector<Subdata> sublists;

You didn't account for
1.7 million * sizeof(Subdata)

where
sizeof(Subdata) is typically sizeof(vector<Celldata>) which
will possibly be 12 bytes (a vector needs to know where its reserved
storage begins and ends, as well as how many elements are actually used).

That's not the end of it yet. Unless you reserve the right number of
elements for sublists initially, the storage will need to be reallocated,
possibly multiple times. Reallocation of a large block may leave an
unused region not big enough for larger size allocation requests but
too big to be used up by all subsequently allocated small objects
combined. In addition, unless perhaps you know the element count
beforehand, the reserved storage of sublists will be larger than
the used part.

Now, the dynamic part of vector<Celldata> cells:
Add the overhead of heap allocation performed by the vector allocator
to what Andre has mentioned.

All things combined, 50 bytes per Subdata looks reasonable to me.
All of the above, of course, are just guesses. I would first find out
which part of that is actually the most problematic, then optimize
accordingly.

For example, for vector<Celldata>, you /might/ want to use some sort
of a small object allocator (instead of the default one).

For vector<Subdata> sublists, you /might/ benefit from an allocator
using its own dedicated heap. Alternatively, you could implement and
use a different kind of container (which, e.g., would allocate its
elements in pages). It may also prove totally unnecessary depending
on the results of the profiling.

Denis
Jul 22 '05 #4

P: n/a
"Denis Remezov" <fi***************@yahoo.removethis.ca> wrote in message
news:41***************@yahoo.removethis.ca...
For vector<Subdata> sublists, you /might/ benefit from an allocator
using its own dedicated heap. Alternatively, you could implement and
use a different kind of container (which, e.g., would allocate its
elements in pages).

Such a container exists in the standard library: it's called std::deque,
and offers pretty much the same interface as std::vector.
std::deque<Subdata> is likely to be a good alternative here.

hth -Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- e-mail contact form
Brainbench MVP for C++ <> http://www.brainbench.com

Jul 22 '05 #5

P: n/a
Ivan Vecerina wrote:

"Denis Remezov" <fi***************@yahoo.removethis.ca> wrote in message
news:41***************@yahoo.removethis.ca...
For vector<Subdata> sublists, you /might/ benefit from an allocator
using its own dedicated heap. Alternatively, you could implement and
use a different kind of container (which, e.g., would allocate its
elements in pages).

Such a container exists in the standard library: it's called std::deque,
and offers pretty much the same interface as std::vector.
std::deque<Subdata> is likely to be a good alternative here.


Thanks for noting that. My only gripe is that the standard deque doesn't
provide you any way of controlling the "page" (or "node") size, but it might
be in vain (for many cases including this, it's probably not at all critical,
and the default may be just good enough).

Denis
Jul 22 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.