473,383 Members | 1,846 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

management of many small objects

Dear List,

It is not clear what the title means. :-) Here is the details:

I need to manage a big bunch of small objects, like

struct{
int a;
int b[size_undetermined];
}obj;

I do not want to use the usual int*, new, delete method since
1. a million new and delete would be slow
2. I need to access elements of b cross all objects. There might be
better ways to loop through obj[i].b[2].
3. the size of b is usually small so the overhead of an additional int *
is too big.

I am currently using vector<int>( num_objects*(1+size_of_b) ) to store
the objects and vector<int>::iterator to access elements. But the
problems are:

1. I would like to pass reference of obj to other functions but I only
have vector<int>::iterator, not vector<obj>::iter.
2. If I would like to expand obj (e.g. adding another variable), my code
has to be rewritten because of index changes.

Is there a better way to do it? Can I somehow create a dynamic type
struct{ int a, int b[size]; } and apply to the existing vector?
boost::memorypool seems to be a good idea but there is still a waste of
memory for using pointers.

Many thanks in advance.
Bo
Jul 22 '05 #1
11 1689

"Bo Peng" <bp***@rice.edu> wrote in message
news:cc**********@joe.rice.edu...
Dear List,

It is not clear what the title means. :-) Here is the details:

I need to manage a big bunch of small objects, like

struct{
I'll give your type a name for use below

struct T {
int a;
int b[size_undetermined];
If the size can not be determined at compile time,
then you can't use this syntax. You'll have to
use a pointer and dynamic allocation. However,
I suggest using a container here.
}obj;

I do not want to use the usual int*, new, delete method since
1. a million new and delete would be slow
2. I need to access elements of b cross all objects. There might be
better ways to loop through obj[i].b[2].
3. the size of b is usually small so the overhead of an additional int *
is too big.
Well, an int is often the same size as a pointer, so there's not
that much overhead unless your array usually has only zero to three
elements or so.

I am currently using vector<int>( num_objects*(1+size_of_b) )

Huh? Why? Why not simply:

vector<struct T>(num_objects);

to store
the objects and vector<int>::iterator
You're not storing any object of your struct type, just ints.

Don't store an iterator in a container. There's no need.
to access elements. But the
problems are:

1. I would like to pass reference of obj to other functions but I only
have vector<int>::iterator, not vector<obj>::iter.
'obj' is not a type name, so you can't do that. But again,
why not:

vector<struct T>

2. If I would like to expand obj (e.g. adding another variable), my code
has to be rewritten because of index changes.
Adding another member to your struct won't change any indexes,
it will only increase the size of your struct.

Is there a better way to do it? Can I somehow create a dynamic type
struct{ int a, int b[size]; }
struct T
{
int a;
std::vector<int> b;
T(std::vector<int>::size_type size) : b(size) {}
};

int main()
{
T cont(how_many);
return 0;
}
and apply to the existing vector?
boost::memorypool seems to be a good idea but there is still a waste of
memory for using pointers.

Many thanks in advance.
Bo


-Mike
Jul 22 '05 #2
Hi, Mike,

Thanks for your reply. I certainly know all the grammar details and what
I showed was just some kind of psudo-code.

The core of the problem is that I will have a million or so small
objects with some fixed elements and an array. Most of the data is
actually unsigned char (not int) so a pointer is 4 times bigger than it.
Suppose the length of b is 8 and there will be 1000 objects. The actual
datasize is (4+8)*1000=12000. Compare:

solution 1: Most convenient.

The implmentation of vector in gcc uses three pointers.

struct T{
int a; // 4 bytes
vector<unsigned char> b; // 3 pointers + 8 real data = 20 bytes
};

vector<T> myarray(size) will use 1000 new and delete (unless STL uses
something else in their allocator) + 24*1000 memory usage (50% waste).

Solution 2: Medium difficulty.

struct T{
int a; // 4 bytes
unsigned char * b; // b = new unsigned char[8] , 12 bytes
}

vector<T> myarray(size) will use 1000 new and delete + 16*1000 memory
usage ( 25% waste )

Solution 3: Most difficult to control. Putting all data in a big block.

const int blocksize = sizeof(int)/sizeof(unsigned char) + 8;
vector<unsigned char> ab( blocksize*size)

2 new and delete + 12*1000 memory usage (0% waste)

Because of the nature of the program (memory hungry), I can not tolerate
50% memory waste. I dislike solution 2 since there are too many new and
deletes and I guess

for (int i=0; i< 1000; i++)
myarray[i].b[2] = 2 // de-reference two pointers each time.

is slower than that in solution 3.

for (int i=0; i<1000*blocksize; i+=blocksize)
ab[i] = 2;
However, managing a block of memory without data structure is really
painful so what I was asking is (almost without hope) a way to apply
data structure to a block of data. For example

T* myobj = reinterpret_cast<T*> (&ab[i]) ;
myobj->a = 5;
myobj->b[2] = 2; // or someway else

or even better
vector<T>::iterator it = (some kind of cast of) ab.begin();

Of course type T here can not be defined since the size of T can not be
pre-determined......

There are other solutions hanging around (user-defined fixed-size
vector, memory pool) but I have not find one with acceptable trade-off
between memory-usage, performance (less new/delete, quick data access)
and easy-to-use ...

Thanks.

Bo
Jul 22 '05 #3
"Bo Peng" <bp***@rice.edu> wrote:
struct{
int a;
int b[size_undetermined];
}obj;


Yuck. That's not a very good construct. I think the
following kind of thing will work much better for you:

// MyFile.cpp:

#include <iostream>
#include <vector>

using std::vector;

struct MyStruct
{
int a;
vector<int> b;
};

int main(void)
{
using std::cout;
using std::endl;

// InitSize is initial size of container of objects:
const int InitSize = 10;

vector<MyStruct> Objs (InitSize);

{ // Limit scope
MyStruct Splat; // Temporary object
Splat.a = 13;
Splat.b.push_back(22);
Splat.b.push_back(10);
Objs[0] = Splat;
} // Splat is uncreated here

{ // Limit scope
MyStruct Splat; // Temporary object
Splat.a = 97;
Splat.b.push_back(33);
Splat.b.push_back(84);
Splat.b.push_back(27);
Objs[1] = Splat;
} // Splat is uncreated here

vector<MyStruct>::size_type i; // Index to Objs
vector<int>::iterator j; // Iterator to elements of b

for (i = 0; i <2; ++i)
{
cout << "Object Number: " << i << endl;
cout << "Value of a: " << Objs[i].a << endl;
cout << "Elements of b:" << endl;
for (j = Objs[i].b.begin(); j != Objs[i].b.end(); ++j)
{
cout << (*j) << endl;
}
cout << endl << endl;
}
return 0;
}

Of course, you'll need to handle issues such as whether to
start with empty or pre-sized containers, and whether to use
vectors or some other containers. If you change sizes
or insert/delete elements often, then a list or deque may work
better. Depends on the details of your application.

Hope the above gives you some useful ideas.

--
Cheers,
Robbie Hatley
Tustin, CA, USA
email: lonewolfintj at pacbell dot net
web: home dot pacbell dot net slant earnur slant


----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
Jul 22 '05 #4
* Bo Peng:
Dear List,

It is not clear what the title means. :-) Here is the details:

I need to manage a big bunch of small objects, like

struct{
int a;
int b[size_undetermined];
}obj;

I do not want to use the usual int*, new, delete method since
1. a million new and delete would be slow
2. I need to access elements of b cross all objects. There might be
better ways to loop through obj[i].b[2].
3. the size of b is usually small so the overhead of an additional int *
is too big.
I think you should regard the above struct as an _external
representation_ (serialization) of an object, not as the object itself.

That means essentially using the FlyWeight pattern (Google).
I am currently using vector<int>( num_objects*(1+size_of_b) ) to store
the objects and vector<int>::iterator to access elements.
Goodie.

But the problems are:

1. I would like to pass reference of obj to other functions but I only
have vector<int>::iterator, not vector<obj>::iter.
Construct FlyWeight data from reference to data, pass that object
around.
2. If I would like to expand obj (e.g. adding another variable), my code
has to be rewritten because of index changes.


Yes, that is a problem. Proper solution depends on access patterns.
For example, if you only use sequential access (forward and backward)
then the cursor gap technique (Google, if no ref ask again here) would
be ideal. Cursor gap also allows random read/write access but not for
insertion, which is what expanding data entails. If you need random
access with insertion then a B-tree (Google) seems to be indicated. But
this is not a C++ problem, so please do ask further questions in a more
appropriate groups such as e.g. [comp.programming].

By the way, nowadays "a million" is not very large. Even a "Hello,
world!" program of 512 byte (byte, not KiB) executable might use a few
MiB's of memory when loaded. Have you had a look at the small-object
allocator in Andrei's "Modern C++ Design"?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Jul 22 '05 #5
On Fri, 02 Jul 2004 22:18:09 -0500, Bo Peng <bp***@rice.edu> wrote:
Hi, Mike,

Thanks for your reply. I certainly know all the grammar details and what
I showed was just some kind of psudo-code.

The core of the problem is that I will have a million or so small
objects with some fixed elements and an array. Most of the data is
actually unsigned char (not int) so a pointer is 4 times bigger than it.
Suppose the length of b is 8 and there will be 1000 objects. The actual
datasize is (4+8)*1000=12000. Compare:

solution 1: Most convenient.

The implmentation of vector in gcc uses three pointers.

struct T{
int a; // 4 bytes
vector<unsigned char> b; // 3 pointers + 8 real data = 20 bytes
};

vector<T> myarray(size) will use 1000 new and delete (unless STL uses
something else in their allocator) + 24*1000 memory usage (50% waste).

Solution 2: Medium difficulty.

struct T{
int a; // 4 bytes
unsigned char * b; // b = new unsigned char[8] , 12 bytes
}

vector<T> myarray(size) will use 1000 new and delete + 16*1000 memory
usage ( 25% waste )

Solution 3: Most difficult to control. Putting all data in a big block.

const int blocksize = sizeof(int)/sizeof(unsigned char) + 8;
vector<unsigned char> ab( blocksize*size)

2 new and delete + 12*1000 memory usage (0% waste)

Because of the nature of the program (memory hungry), I can not tolerate
50% memory waste. I dislike solution 2 since there are too many new and
deletes and I guess

for (int i=0; i< 1000; i++)
myarray[i].b[2] = 2 // de-reference two pointers each time.

is slower than that in solution 3.

for (int i=0; i<1000*blocksize; i+=blocksize)
ab[i] = 2;
However, managing a block of memory without data structure is really
painful so what I was asking is (almost without hope) a way to apply
data structure to a block of data. For example

T* myobj = reinterpret_cast<T*> (&ab[i]) ;
myobj->a = 5;
myobj->b[2] = 2; // or someway else

or even better
vector<T>::iterator it = (some kind of cast of) ab.begin();

Of course type T here can not be defined since the size of T can not be
pre-determined......


When I first read your post I assumed that the size of the block varied
between objects, but since you seem to be putting all these objects in a
vector that doesn't seem to be the case. Niether does it seem that
blocksize only varies each time you compile the code, because if it did
you would have a simple template based solution.

So I'm assuming that blocksize is a run time variable, but is fixed for
all the objects within a particular run. If so then the following
non-standard but highly portable code would seem to do what you want.

struct FakeObject
{
int a;
unsigned char b[1];
};

// allocate 1000 objects
size_t blocksize = 8;
size_t objsize = sizeof(int) + blocksize;
vector<unsigned char> ab(1000*objsize);

// get object 100 (say)
FakeObject* myobj = reinterpret_cast<FakeObject*>(&ab[100*objsize]);
myobj->b[2] = 1;

No doubt others can tell you the many ways in which this breaks the
C++ standard but hopefully you are more interested in the fact that it
works.

john
Jul 22 '05 #6
So I'm assuming that blocksize is a run time variable, but is fixed for
all the objects within a particular run.
Exactly. I should have been clearer about this.
if so then the following non-standard but highly portable code
would seem to do what you want.

struct FakeObject
{
int a;
unsigned char b[1];
};

// allocate 1000 objects
size_t blocksize = 8;
size_t objsize = sizeof(int) + blocksize;
vector<unsigned char> ab(1000*objsize);

// get object 100 (say)
FakeObject* myobj = reinterpret_cast<FakeObject*>(&ab[100*objsize]);
myobj->b[2] = 1;


This indeed partially works. It would be better if myobj has the correct
size. I.e. myobj++ can work as expected.

So, how about the following code?

class ObjIterator{
unsigned char * ptr;
size_t objsize;
public:
// return FakeObject * for *ObjIterator
FakeObject* operator * (){
return reinterpret_cast<FakeObject*>(ptr);
}

// advance
ObjIterator& operator++(int){ ptr+= objsize; return *this; }

// and other ++, --(int), --, +=, -=, ==, <=, >=... chore.
}

for(ObjIterator it= ( a begin() function) ; it != ( a end() function); it++)
it->b[2] = 2;

I would however test

1. if this is much slower than

for(i=5; i<... ; i+=objsize)
ab[i] = 2;

With everything inline, they hopefully have similar performance.

2. if STL algorithms work as expected. I mean, if

ObjIterator it = ab.begin();
copy( it, it+3, destination)

will copy 3 blocks of data correctly.
About other replies:

I read something about flyweight pattern. It is a wonderful idea but
does not fit in my case since my objects have identities even orders.
Also, accessing an object through a key and a map container might be
very slow.

It seems that I am approaching a good solution. Thanks!
Bo

Jul 22 '05 #7
>
This indeed partially works. It would be better if myobj has the correct
size. I.e. myobj++ can work as expected.

So, how about the following code?

class ObjIterator{
unsigned char * ptr;
size_t objsize;
public:
// return FakeObject * for *ObjIterator
FakeObject* operator * (){
return reinterpret_cast<FakeObject*>(ptr);
}
I think it should be this

FakeObject* operator->(){
return reinterpret_cast<FakeObject*>(ptr);
}
FakeObject& operator*(){
return *reinterpret_cast<FakeObject*>(ptr);
}

otherwise you get some ugly syntax when you use your iterator

(*i)->b[2] = 2;

instead of

i->b[2] = 2;
(*i).b[2] = 2;


// advance
ObjIterator& operator++(int){ ptr+= objsize; return *this; }

// and other ++, --(int), --, +=, -=, ==, <=, >=... chore.
}

for(ObjIterator it= ( a begin() function) ; it != ( a end() function);
it++)
it->b[2] = 2;

I would however test

1. if this is much slower than

for(i=5; i<... ; i+=objsize)
ab[i] = 2;

With everything inline, they hopefully have similar performance.

2. if STL algorithms work as expected. I mean, if

ObjIterator it = ab.begin();
copy( it, it+3, destination)

will copy 3 blocks of data correctly.


No, I don't think that will work. You have to make some compromises
somewhere.

john
Jul 22 '05 #8
John Harrison wrote:
2. if STL algorithms work as expected. I mean, if

ObjIterator it = ab.begin();
copy( it, it+3, destination)

will copy 3 blocks of data correctly.


No, I don't think that will work. You have to make some compromises
somewhere.


I checked the implementation of copy, it uses something like
*it=*destination,
maybe I can create a correct copy constructor and/or operator= for
FakeObject.

// global, size of object
int objsize;

class FakeObj{
public:
int a;
int b[1];

FakeObj& operator=(FakeObj& rhs){
a = rhs.a;
memcpy(b, rhs.b, objsize);
return *this;
}

FakeObj(FakeObj& rhs){
a = rhs.a;
memcpy(b, rhs.b, objsize);
}
}

Again, no check for grammar.

Things are getting more and more complicated but this is no more work
than the copy constructor of Obj if we use unsigned char * and
new/delete. Anyway, fake is fake. FakeObject might still fail in some
way. :-(

Bo
Jul 22 '05 #9
On Sat, 03 Jul 2004 11:04:51 -0500, Bo Peng <bp***@rice.edu> wrote:
John Harrison wrote:
2. if STL algorithms work as expected. I mean, if

ObjIterator it = ab.begin();
copy( it, it+3, destination)

will copy 3 blocks of data correctly.

No, I don't think that will work. You have to make some compromises
somewhere.


I checked the implementation of copy, it uses something like
*it=*destination,
maybe I can create a correct copy constructor and/or operator= for
FakeObject.

// global, size of object
int objsize;

class FakeObj{
public:
int a;
int b[1];

FakeObj& operator=(FakeObj& rhs){
a = rhs.a;
memcpy(b, rhs.b, objsize);
return *this;
}

FakeObj(FakeObj& rhs){
a = rhs.a;
memcpy(b, rhs.b, objsize);
}
}

Again, no check for grammar.

Things are getting more and more complicated but this is no more work
than the copy constructor of Obj if we use unsigned char * and
new/delete. Anyway, fake is fake. FakeObject might still fail in some
way. :-(

Bo


That looks OK, but now the compromise is the global variable (and the fact
that we've left standard C++ far behind).

john
Jul 22 '05 #10
Bo Peng wrote:

Hi, Mike,

Thanks for your reply. I certainly know all the grammar details and what
I showed was just some kind of psudo-code.

The core of the problem is that I will have a million or so small
objects with some fixed elements and an array.


Get yourself a copy of
Scott Meyers 'Effective C++'

In Item 10 he describes exactly what you seem to be after:
custom versions of operator new and delete.
In short: When new is called, it allocates not a single object but
a whole array of such objects and hands out pointers into the array,
when needed. Only when this pool of objects is used up, the customized
version of new walks through the hassle of system memory management
again by allocating another pool of objects.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 22 '05 #11
Karl Heinz Buchegger wrote:

Bo Peng wrote:

Hi, Mike,

Thanks for your reply. I certainly know all the grammar details and what
I showed was just some kind of psudo-code.

The core of the problem is that I will have a million or so small
objects with some fixed elements and an array.


Get yourself a copy of
Scott Meyers 'Effective C++'

In Item 10 he describes exactly what you seem to be after:
custom versions of operator new and delete.
In short: When new is called, it allocates not a single object but
a whole array of such objects and hands out pointers into the array,
when needed. Only when this pool of objects is used up, the customized
version of new walks through the hassle of system memory management
again by allocating another pool of objects.


Sorry. After reading the follo ups, it seems you are
heading for a different thing.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 22 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: MRe | last post by:
Hi, I'm new to PHP programming (coming from C++); and all is well, except when it comes to memory management. Particularly, with passing objects/arrays in and out of functions. I've had so many...
6
by: Victor Nazarov | last post by:
So seems to be slightly wrong group, but... The question is: Is there any extentions for C or well known libraries to implement heap managment, for example if I want to allocate a randome data...
7
by: Dan Nilsen | last post by:
Hi! I'm writing a small piece of software that basically runs on an embedded system with a Power-PC cpu. This runs on a stripped down version of Linux - Busybox. As I'm writing a piece of...
12
by: ira2402 | last post by:
Hi All, We are developing sw for a small embedded OS and we have limited memory. We are looking for algorithms, links, and articles about this. The goal is efficient utilization of small amount...
44
by: craig | last post by:
I am wondering if there are some best practices for determining a strategy for using try/catch blocks within an application. My current thoughts are: 1. The code the initiates any high-level...
1
by: Rob Nicholson | last post by:
We're developing our first large scale ASP.NET web application and I'm a little concerned over memory usage of aspnet_wp.exe on the development server during testing. The application appears to use...
2
by: Rob Panosh | last post by:
Hello, I have a small VB.Net application that instantiates many objects (about 200), when I look at task manager the application is using about 50 meg. Now if I minimize the window then restore...
1
by: trialproduct2004 | last post by:
Hi all, I am having slight confusion regarding memory management in .net. Say suppose i have two application one is in C# and other is in MFC(VC++). Both of this application are using lots...
4
by: =?Utf-8?B?VzFsZDBuZTc0?= | last post by:
When one architects a new project one of the first steps in the decision is to decide on the layers. (In my opinion anyway) One architecture that I have used before is to go solid OO and create...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.