Dear List,
It is not clear what the title means. :-) Here is the details:
I need to manage a big bunch of small objects, like
struct{
int a;
int b[size_undetermined];
}obj;
I do not want to use the usual int*, new, delete method since
1. a million new and delete would be slow
2. I need to access elements of b cross all objects. There might be
better ways to loop through obj[i].b[2].
3. the size of b is usually small so the overhead of an additional int *
is too big.
I am currently using vector<int>( num_objects*(1+size_of_b) ) to store
the objects and vector<int>::iterator to access elements. But the
problems are:
1. I would like to pass reference of obj to other functions but I only
have vector<int>::iterator, not vector<obj>::iter.
2. If I would like to expand obj (e.g. adding another variable), my code
has to be rewritten because of index changes.
Is there a better way to do it? Can I somehow create a dynamic type
struct{ int a, int b[size]; } and apply to the existing vector?
boost::memorypool seems to be a good idea but there is still a waste of
memory for using pointers.
Many thanks in advance.
Bo 11 1689
"Bo Peng" <bp***@rice.edu> wrote in message
news:cc**********@joe.rice.edu... Dear List,
It is not clear what the title means. :-) Here is the details:
I need to manage a big bunch of small objects, like
struct{
I'll give your type a name for use below
struct T {
int a; int b[size_undetermined];
If the size can not be determined at compile time,
then you can't use this syntax. You'll have to
use a pointer and dynamic allocation. However,
I suggest using a container here.
}obj;
I do not want to use the usual int*, new, delete method since 1. a million new and delete would be slow 2. I need to access elements of b cross all objects. There might be better ways to loop through obj[i].b[2]. 3. the size of b is usually small so the overhead of an additional int * is too big.
Well, an int is often the same size as a pointer, so there's not
that much overhead unless your array usually has only zero to three
elements or so. I am currently using vector<int>( num_objects*(1+size_of_b) )
Huh? Why? Why not simply:
vector<struct T>(num_objects);
to store the objects and vector<int>::iterator
You're not storing any object of your struct type, just ints.
Don't store an iterator in a container. There's no need.
to access elements. But the problems are:
1. I would like to pass reference of obj to other functions but I only have vector<int>::iterator, not vector<obj>::iter.
'obj' is not a type name, so you can't do that. But again,
why not:
vector<struct T>
2. If I would like to expand obj (e.g. adding another variable), my code has to be rewritten because of index changes.
Adding another member to your struct won't change any indexes,
it will only increase the size of your struct. Is there a better way to do it? Can I somehow create a dynamic type struct{ int a, int b[size]; }
struct T
{
int a;
std::vector<int> b;
T(std::vector<int>::size_type size) : b(size) {}
};
int main()
{
T cont(how_many);
return 0;
}
and apply to the existing vector? boost::memorypool seems to be a good idea but there is still a waste of memory for using pointers.
Many thanks in advance. Bo
-Mike
Hi, Mike,
Thanks for your reply. I certainly know all the grammar details and what
I showed was just some kind of psudo-code.
The core of the problem is that I will have a million or so small
objects with some fixed elements and an array. Most of the data is
actually unsigned char (not int) so a pointer is 4 times bigger than it.
Suppose the length of b is 8 and there will be 1000 objects. The actual
datasize is (4+8)*1000=12000. Compare:
solution 1: Most convenient.
The implmentation of vector in gcc uses three pointers.
struct T{
int a; // 4 bytes
vector<unsigned char> b; // 3 pointers + 8 real data = 20 bytes
};
vector<T> myarray(size) will use 1000 new and delete (unless STL uses
something else in their allocator) + 24*1000 memory usage (50% waste).
Solution 2: Medium difficulty.
struct T{
int a; // 4 bytes
unsigned char * b; // b = new unsigned char[8] , 12 bytes
}
vector<T> myarray(size) will use 1000 new and delete + 16*1000 memory
usage ( 25% waste )
Solution 3: Most difficult to control. Putting all data in a big block.
const int blocksize = sizeof(int)/sizeof(unsigned char) + 8;
vector<unsigned char> ab( blocksize*size)
2 new and delete + 12*1000 memory usage (0% waste)
Because of the nature of the program (memory hungry), I can not tolerate
50% memory waste. I dislike solution 2 since there are too many new and
deletes and I guess
for (int i=0; i< 1000; i++)
myarray[i].b[2] = 2 // de-reference two pointers each time.
is slower than that in solution 3.
for (int i=0; i<1000*blocksize; i+=blocksize)
ab[i] = 2;
However, managing a block of memory without data structure is really
painful so what I was asking is (almost without hope) a way to apply
data structure to a block of data. For example
T* myobj = reinterpret_cast<T*> (&ab[i]) ;
myobj->a = 5;
myobj->b[2] = 2; // or someway else
or even better
vector<T>::iterator it = (some kind of cast of) ab.begin();
Of course type T here can not be defined since the size of T can not be
pre-determined......
There are other solutions hanging around (user-defined fixed-size
vector, memory pool) but I have not find one with acceptable trade-off
between memory-usage, performance (less new/delete, quick data access)
and easy-to-use ...
Thanks.
Bo
"Bo Peng" <bp***@rice.edu> wrote: struct{ int a; int b[size_undetermined]; }obj;
Yuck. That's not a very good construct. I think the
following kind of thing will work much better for you:
// MyFile.cpp:
#include <iostream>
#include <vector>
using std::vector;
struct MyStruct
{
int a;
vector<int> b;
};
int main(void)
{
using std::cout;
using std::endl;
// InitSize is initial size of container of objects:
const int InitSize = 10;
vector<MyStruct> Objs (InitSize);
{ // Limit scope
MyStruct Splat; // Temporary object
Splat.a = 13;
Splat.b.push_back(22);
Splat.b.push_back(10);
Objs[0] = Splat;
} // Splat is uncreated here
{ // Limit scope
MyStruct Splat; // Temporary object
Splat.a = 97;
Splat.b.push_back(33);
Splat.b.push_back(84);
Splat.b.push_back(27);
Objs[1] = Splat;
} // Splat is uncreated here
vector<MyStruct>::size_type i; // Index to Objs
vector<int>::iterator j; // Iterator to elements of b
for (i = 0; i <2; ++i)
{
cout << "Object Number: " << i << endl;
cout << "Value of a: " << Objs[i].a << endl;
cout << "Elements of b:" << endl;
for (j = Objs[i].b.begin(); j != Objs[i].b.end(); ++j)
{
cout << (*j) << endl;
}
cout << endl << endl;
}
return 0;
}
Of course, you'll need to handle issues such as whether to
start with empty or pre-sized containers, and whether to use
vectors or some other containers. If you change sizes
or insert/delete elements often, then a list or deque may work
better. Depends on the details of your application.
Hope the above gives you some useful ideas.
--
Cheers,
Robbie Hatley
Tustin, CA, USA
email: lonewolfintj at pacbell dot net
web: home dot pacbell dot net slant earnur slant
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==---- http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
* Bo Peng: Dear List,
It is not clear what the title means. :-) Here is the details:
I need to manage a big bunch of small objects, like
struct{ int a; int b[size_undetermined]; }obj;
I do not want to use the usual int*, new, delete method since 1. a million new and delete would be slow 2. I need to access elements of b cross all objects. There might be better ways to loop through obj[i].b[2]. 3. the size of b is usually small so the overhead of an additional int * is too big.
I think you should regard the above struct as an _external
representation_ (serialization) of an object, not as the object itself.
That means essentially using the FlyWeight pattern (Google).
I am currently using vector<int>( num_objects*(1+size_of_b) ) to store the objects and vector<int>::iterator to access elements.
Goodie.
But the problems are:
1. I would like to pass reference of obj to other functions but I only have vector<int>::iterator, not vector<obj>::iter.
Construct FlyWeight data from reference to data, pass that object
around.
2. If I would like to expand obj (e.g. adding another variable), my code has to be rewritten because of index changes.
Yes, that is a problem. Proper solution depends on access patterns.
For example, if you only use sequential access (forward and backward)
then the cursor gap technique (Google, if no ref ask again here) would
be ideal. Cursor gap also allows random read/write access but not for
insertion, which is what expanding data entails. If you need random
access with insertion then a B-tree (Google) seems to be indicated. But
this is not a C++ problem, so please do ask further questions in a more
appropriate groups such as e.g. [comp.programming].
By the way, nowadays "a million" is not very large. Even a "Hello,
world!" program of 512 byte (byte, not KiB) executable might use a few
MiB's of memory when loaded. Have you had a look at the small-object
allocator in Andrei's "Modern C++ Design"?
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
On Fri, 02 Jul 2004 22:18:09 -0500, Bo Peng <bp***@rice.edu> wrote: Hi, Mike,
Thanks for your reply. I certainly know all the grammar details and what I showed was just some kind of psudo-code.
The core of the problem is that I will have a million or so small objects with some fixed elements and an array. Most of the data is actually unsigned char (not int) so a pointer is 4 times bigger than it. Suppose the length of b is 8 and there will be 1000 objects. The actual datasize is (4+8)*1000=12000. Compare:
solution 1: Most convenient.
The implmentation of vector in gcc uses three pointers.
struct T{ int a; // 4 bytes vector<unsigned char> b; // 3 pointers + 8 real data = 20 bytes };
vector<T> myarray(size) will use 1000 new and delete (unless STL uses something else in their allocator) + 24*1000 memory usage (50% waste).
Solution 2: Medium difficulty.
struct T{ int a; // 4 bytes unsigned char * b; // b = new unsigned char[8] , 12 bytes }
vector<T> myarray(size) will use 1000 new and delete + 16*1000 memory usage ( 25% waste )
Solution 3: Most difficult to control. Putting all data in a big block.
const int blocksize = sizeof(int)/sizeof(unsigned char) + 8; vector<unsigned char> ab( blocksize*size)
2 new and delete + 12*1000 memory usage (0% waste)
Because of the nature of the program (memory hungry), I can not tolerate 50% memory waste. I dislike solution 2 since there are too many new and deletes and I guess
for (int i=0; i< 1000; i++) myarray[i].b[2] = 2 // de-reference two pointers each time.
is slower than that in solution 3.
for (int i=0; i<1000*blocksize; i+=blocksize) ab[i] = 2;
However, managing a block of memory without data structure is really painful so what I was asking is (almost without hope) a way to apply data structure to a block of data. For example
T* myobj = reinterpret_cast<T*> (&ab[i]) ; myobj->a = 5; myobj->b[2] = 2; // or someway else
or even better vector<T>::iterator it = (some kind of cast of) ab.begin();
Of course type T here can not be defined since the size of T can not be pre-determined......
When I first read your post I assumed that the size of the block varied
between objects, but since you seem to be putting all these objects in a
vector that doesn't seem to be the case. Niether does it seem that
blocksize only varies each time you compile the code, because if it did
you would have a simple template based solution.
So I'm assuming that blocksize is a run time variable, but is fixed for
all the objects within a particular run. If so then the following
non-standard but highly portable code would seem to do what you want.
struct FakeObject
{
int a;
unsigned char b[1];
};
// allocate 1000 objects
size_t blocksize = 8;
size_t objsize = sizeof(int) + blocksize;
vector<unsigned char> ab(1000*objsize);
// get object 100 (say)
FakeObject* myobj = reinterpret_cast<FakeObject*>(&ab[100*objsize]);
myobj->b[2] = 1;
No doubt others can tell you the many ways in which this breaks the
C++ standard but hopefully you are more interested in the fact that it
works.
john So I'm assuming that blocksize is a run time variable, but is fixed for all the objects within a particular run.
Exactly. I should have been clearer about this.
if so then the following non-standard but highly portable code would seem to do what you want.
struct FakeObject { int a; unsigned char b[1]; };
// allocate 1000 objects size_t blocksize = 8; size_t objsize = sizeof(int) + blocksize; vector<unsigned char> ab(1000*objsize);
// get object 100 (say) FakeObject* myobj = reinterpret_cast<FakeObject*>(&ab[100*objsize]); myobj->b[2] = 1;
This indeed partially works. It would be better if myobj has the correct
size. I.e. myobj++ can work as expected.
So, how about the following code?
class ObjIterator{
unsigned char * ptr;
size_t objsize;
public:
// return FakeObject * for *ObjIterator
FakeObject* operator * (){
return reinterpret_cast<FakeObject*>(ptr);
}
// advance
ObjIterator& operator++(int){ ptr+= objsize; return *this; }
// and other ++, --(int), --, +=, -=, ==, <=, >=... chore.
}
for(ObjIterator it= ( a begin() function) ; it != ( a end() function); it++)
it->b[2] = 2;
I would however test
1. if this is much slower than
for(i=5; i<... ; i+=objsize)
ab[i] = 2;
With everything inline, they hopefully have similar performance.
2. if STL algorithms work as expected. I mean, if
ObjIterator it = ab.begin();
copy( it, it+3, destination)
will copy 3 blocks of data correctly.
About other replies:
I read something about flyweight pattern. It is a wonderful idea but
does not fit in my case since my objects have identities even orders.
Also, accessing an object through a key and a map container might be
very slow.
It seems that I am approaching a good solution. Thanks!
Bo
> This indeed partially works. It would be better if myobj has the correct size. I.e. myobj++ can work as expected.
So, how about the following code?
class ObjIterator{ unsigned char * ptr; size_t objsize; public: // return FakeObject * for *ObjIterator FakeObject* operator * (){ return reinterpret_cast<FakeObject*>(ptr); }
I think it should be this
FakeObject* operator->(){
return reinterpret_cast<FakeObject*>(ptr);
}
FakeObject& operator*(){
return *reinterpret_cast<FakeObject*>(ptr);
}
otherwise you get some ugly syntax when you use your iterator
(*i)->b[2] = 2;
instead of
i->b[2] = 2;
(*i).b[2] = 2; // advance ObjIterator& operator++(int){ ptr+= objsize; return *this; }
// and other ++, --(int), --, +=, -=, ==, <=, >=... chore. }
for(ObjIterator it= ( a begin() function) ; it != ( a end() function); it++) it->b[2] = 2;
I would however test
1. if this is much slower than
for(i=5; i<... ; i+=objsize) ab[i] = 2;
With everything inline, they hopefully have similar performance.
2. if STL algorithms work as expected. I mean, if
ObjIterator it = ab.begin(); copy( it, it+3, destination)
will copy 3 blocks of data correctly.
No, I don't think that will work. You have to make some compromises
somewhere.
john
John Harrison wrote: 2. if STL algorithms work as expected. I mean, if
ObjIterator it = ab.begin(); copy( it, it+3, destination)
will copy 3 blocks of data correctly.
No, I don't think that will work. You have to make some compromises somewhere.
I checked the implementation of copy, it uses something like
*it=*destination,
maybe I can create a correct copy constructor and/or operator= for
FakeObject.
// global, size of object
int objsize;
class FakeObj{
public:
int a;
int b[1];
FakeObj& operator=(FakeObj& rhs){
a = rhs.a;
memcpy(b, rhs.b, objsize);
return *this;
}
FakeObj(FakeObj& rhs){
a = rhs.a;
memcpy(b, rhs.b, objsize);
}
}
Again, no check for grammar.
Things are getting more and more complicated but this is no more work
than the copy constructor of Obj if we use unsigned char * and
new/delete. Anyway, fake is fake. FakeObject might still fail in some
way. :-(
Bo
On Sat, 03 Jul 2004 11:04:51 -0500, Bo Peng <bp***@rice.edu> wrote: John Harrison wrote:
2. if STL algorithms work as expected. I mean, if
ObjIterator it = ab.begin(); copy( it, it+3, destination)
will copy 3 blocks of data correctly. No, I don't think that will work. You have to make some compromises somewhere.
I checked the implementation of copy, it uses something like *it=*destination, maybe I can create a correct copy constructor and/or operator= for FakeObject.
// global, size of object int objsize;
class FakeObj{ public: int a; int b[1];
FakeObj& operator=(FakeObj& rhs){ a = rhs.a; memcpy(b, rhs.b, objsize); return *this; }
FakeObj(FakeObj& rhs){ a = rhs.a; memcpy(b, rhs.b, objsize); } }
Again, no check for grammar.
Things are getting more and more complicated but this is no more work than the copy constructor of Obj if we use unsigned char * and new/delete. Anyway, fake is fake. FakeObject might still fail in some way. :-(
Bo
That looks OK, but now the compromise is the global variable (and the fact
that we've left standard C++ far behind).
john
Bo Peng wrote: Hi, Mike,
Thanks for your reply. I certainly know all the grammar details and what I showed was just some kind of psudo-code.
The core of the problem is that I will have a million or so small objects with some fixed elements and an array.
Get yourself a copy of
Scott Meyers 'Effective C++'
In Item 10 he describes exactly what you seem to be after:
custom versions of operator new and delete.
In short: When new is called, it allocates not a single object but
a whole array of such objects and hands out pointers into the array,
when needed. Only when this pool of objects is used up, the customized
version of new walks through the hassle of system memory management
again by allocating another pool of objects.
--
Karl Heinz Buchegger kb******@gascad.at
Karl Heinz Buchegger wrote: Bo Peng wrote: Hi, Mike,
Thanks for your reply. I certainly know all the grammar details and what I showed was just some kind of psudo-code.
The core of the problem is that I will have a million or so small objects with some fixed elements and an array.
Get yourself a copy of Scott Meyers 'Effective C++'
In Item 10 he describes exactly what you seem to be after: custom versions of operator new and delete. In short: When new is called, it allocates not a single object but a whole array of such objects and hands out pointers into the array, when needed. Only when this pool of objects is used up, the customized version of new walks through the hassle of system memory management again by allocating another pool of objects.
Sorry. After reading the follo ups, it seems you are
heading for a different thing.
--
Karl Heinz Buchegger kb******@gascad.at This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: MRe |
last post by:
Hi,
I'm new to PHP programming (coming from C++); and all is well, except when
it comes to memory management. Particularly, with passing objects/arrays in
and out of functions. I've had so many...
|
by: Victor Nazarov |
last post by:
So seems to be slightly wrong group, but...
The question is: Is there any extentions for C or well known libraries
to implement heap managment, for example if I want to allocate a randome
data...
|
by: Dan Nilsen |
last post by:
Hi!
I'm writing a small piece of software that basically runs on an
embedded system with a Power-PC cpu. This runs on a stripped down
version of Linux - Busybox.
As I'm writing a piece of...
|
by: ira2402 |
last post by:
Hi All,
We are developing sw for a small embedded OS and we have limited
memory.
We are looking for algorithms, links, and articles about this.
The goal is efficient utilization of small amount...
|
by: craig |
last post by:
I am wondering if there are some best practices for determining a strategy
for using try/catch blocks within an application.
My current thoughts are:
1. The code the initiates any high-level...
|
by: Rob Nicholson |
last post by:
We're developing our first large scale ASP.NET web application and I'm a
little concerned over memory usage of aspnet_wp.exe on the development
server during testing. The application appears to use...
|
by: Rob Panosh |
last post by:
Hello,
I have a small VB.Net application that instantiates many objects (about
200), when I look at task manager the application is using about 50 meg.
Now if I minimize the window then restore...
|
by: trialproduct2004 |
last post by:
Hi all,
I am having slight confusion regarding memory management in .net.
Say suppose i have two application one is in C# and other is in
MFC(VC++).
Both of this application are using lots...
|
by: =?Utf-8?B?VzFsZDBuZTc0?= |
last post by:
When one architects a new project one of the first steps in the decision is
to decide on the layers. (In my opinion anyway)
One architecture that I have used before is to go solid OO and create...
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: ryjfgjl |
last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
| |