Data Structure Issue

Alexander Adam

Hello folks,

I got a few question on some basic data structure stuff. Sorry if
those questions might sound too easy or such but after googling a lot
I didn't find a real answer to all those questions so appreciate any
help.

I need to keep a very efficient data-tree structure (parent, previous,
next, first child, last child) within the memory which can be filled
up with billions of records. A struct place into the tree (I am using
the tree implementation of kp) is looking like this:

struct Node {
unsigned char type;
unsigned char code;
??? data;
??? unsigned char* content;
}

Now the first question is this -- the *data* property can be of a
single byte property or taking a full unsigned int or even more..
what'd be the best way to always keep the smallest size in memory?
(E.g. when having only a byte value then allocate only one byte in
memory for this property).

The next question is if it makes sense to allocate and deallocate the
content property within the constructor/destructor of the struct but I
fear that a constructor and destructor will add another couple of
bytes to each record in the memory.

The next issue I have is this -- the implementation I am using is
returning an iterator object, a function looks like this:

iterator parent(...) {}

Now what is this return value all about? I mean, its not a pointer so
it should be only valid until it goes out of scope within the parent
caller, is that assumptation correct? What I need is to keep the
iterator during lifetime and assign it to my objects so what's the
best way to keep it valid all time, assuming the std::list or tree
(whatever, they're compatible) gets inserts and delete statements a
lot?

Also there's a point I don't really understand. Say if I have
something like

std::list<NodemyList;

then what's the real difference in accessing an item either by

Node& node = *myList.begin()

or by

Node node = *myList.begin() ?

I've always thought that the first one would just get the reference to
the original struct the iterator is pointing to and the second one
creates a copy of it in the heap? If so, the first method should be
more efficient is that correct, too?

Basically, I need to obtain a most perfomant tree-like data structure
keeping a struct like the one mentioned before. What'd be the best way
to go, assuming that I have a lot of inserts, deletes and traversions
and that I need the parent, prev sibling, next sibling and first
child / last child access by hand without any time loss anytime?

Hopefully someone would be so kind to answer those questions in
detail, I'd appreciate very much to gather as much experience as
possible in good C++ Development.

Thanks & warm regards
Alexander

Jul 4 '07 #1

Subscribe Post Reply

1829

Alexander Adam

Hi!

Sorry, there's one thing I've forgotten to ask -- how can it be
avoided to have an additional allocation/deallocation when adding a
new record to my std::list or whatever container I am using? I mean,
doing something like

Node myNode;
....
myList.push_back(myNode);

will lead to create a myNode var on the heap, then making a copy of it
for the list and finally let it go out of scope. This seems quite too
inefficient to me though so is there a better way around except
creating your own pointer to Node?

Thanks!
Alexander

Jul 4 '07 #2

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

On 2007-07-04 06:45, Alexander Adam wrote:

Hello folks,

I got a few question on some basic data structure stuff. Sorry if
those questions might sound too easy or such but after googling a lot
I didn't find a real answer to all those questions so appreciate any
help.

I need to keep a very efficient data-tree structure (parent, previous,
next, first child, last child) within the memory which can be filled
up with billions of records. A struct place into the tree (I am using
the tree implementation of kp) is looking like this:

struct Node {
unsigned char type;
unsigned char code;
??? data;
??? unsigned char* content;
}

Type should probably be an enum, perhaps code also.

Now the first question is this -- the *data* property can be of a
single byte property or taking a full unsigned int or even more..
what'd be the best way to always keep the smallest size in memory?
(E.g. when having only a byte value then allocate only one byte in
memory for this property).

If all the nodes in the tree will have the same size of data then you
can parameterise Node to that type:

template<typename T>
struct Node {
// ...
T data;
// ...
};

By the way, what's the difference between data and content?

The next question is if it makes sense to allocate and deallocate the
content property within the constructor/destructor of the struct but I
fear that a constructor and destructor will add another couple of
bytes to each record in the memory.

Adding a constructor/destructor might add a few extra bytes to the code,
but should not affect the runtime size of a Node-object.

The next issue I have is this -- the implementation I am using is
returning an iterator object, a function looks like this:

iterator parent(...) {}

Now what is this return value all about? I mean, its not a pointer so
it should be only valid until it goes out of scope within the parent
caller, is that assumptation correct? What I need is to keep the
iterator during lifetime and assign it to my objects so what's the
best way to keep it valid all time, assuming the std::list or tree
(whatever, they're compatible) gets inserts and delete statements a
lot?

I guess that the iterator is a standard iterator to either an object in
a list or a map (or such) in which case the iterator will be valid so
long as the Node is still in the list. For standard containers which are
node-based the validity of iterators is not affected by any operations
on the container unless the operation affects the node the iterator
refers to.

Also there's a point I don't really understand. Say if I have
something like

std::list<NodemyList;

then what's the real difference in accessing an item either by

Node& node = *myList.begin()

or by

Node node = *myList.begin() ?

I've always thought that the first one would just get the reference to
the original struct the iterator is pointing to and the second one
creates a copy of it in the heap? If so, the first method should be
more efficient is that correct, too?

Yes.

Basically, I need to obtain a most perfomant tree-like data structure
keeping a struct like the one mentioned before. What'd be the best way
to go, assuming that I have a lot of inserts, deletes and traversions
and that I need the parent, prev sibling, next sibling and first
child / last child access by hand without any time loss anytime?

I'd probably do something like this:

template<typename T>
struct Node {
NodeType type; // enum
NodeCode code; // enum
Node* parent;
Node* previous;
Node* next;
Node* left; // left child
Node* right; // right child
T data;
unsigned char* content; // ???
};

and store them in a std::list<Node*>, this way inserts and deletes are
very cheap. However you still have to create the objects using new and
delete them when you are done. Also don't forget to fix any pointers in
other nodes when inserting and deleting. You might also want to use a
specialised allocator for the nodes.

For some usages this scheme might not be possible to use, but I don't
know how you are going to use the tree so I can't give anything but
general advice. It might also be possible to get rid of the std::list,
and just have the Nodes.

--
Erik Wikström

Jul 4 '07 #3

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

On 2007-07-04 06:47, Alexander Adam wrote:

Hi!

Sorry, there's one thing I've forgotten to ask -- how can it be
avoided to have an additional allocation/deallocation when adding a
new record to my std::list or whatever container I am using? I mean,
doing something like

Node myNode;
...
myList.push_back(myNode);

will lead to create a myNode var on the heap, then making a copy of it
for the list and finally let it go out of scope. This seems quite too
inefficient to me though so is there a better way around except
creating your own pointer to Node?

Actually this first creates a Node-object on the stack, then a copy will
be made (probably on the heap) by the list. And then the object on the
stack will go out of scope and destroyed. To avoid this insert just
pointers:

Node* myNode = new Node();

myList.push_back(myNode);

This way only a pointer will be copied. There are some drawbacks with
this scheme though, and you might want to use some kind of smart pointer.

--
Erik Wikström

Jul 4 '07 #4

Jon Harrop

Alexander Adam wrote:

I need to keep a very efficient data-tree structure (parent, previous,
next, first child, last child) within the memory which can be filled
up with billions of records.

That's a graph. If you want memory efficiency, remove all of the non-tree
pointers.

--
Dr Jon D Harrop, Flying Frog Consultancy
The OCaml Journal
http://www.ffconsultancy.com/product...ournal/?usenet

Jul 4 '07 #5

by: Steve Jorgensen | last post by:

Hi folks, I'm posting this message because it's an issue I come up against relatively often, but I can't find any writings on the subject, and I haven't been able to figure out even what key...

Microsoft Access / VBA

Data Correlation

by: Les | last post by:

Hi can anyone help me, I have just graduated and have started working for a building developer which deals with a large quantity of drawings. As you can imagine these all need documenting...

Microsoft Access / VBA

initializing data at compile time

by: Bart Goeman | last post by:

Hi, I have a question about how to put redundant information in data structures, initialized at compile time. This is often necessary for performance reasons and can't be done at run time (data...

C / C++

fastest way to parse a file; Most efficient way to store the data?

by: hoopsho | last post by:

Hi Everyone, I am trying to write a program that does a few things very fast and with efficient use of memory... a) I need to parse a space-delimited file that is really large, upwards fo a...

C# / C Sharp

char data[0]

by: aarklon | last post by:

Hi all, this is a question which i saw in a book typedef struct mall_li_header_ { int refcnt; uchar pool; uchar flag; ushort magic_no; char data;

C / C++

Want form to show changing data. But it could be closed, or closing, during update.

by: Zytan | last post by:

I have public methods in a form. The main form calls them, to update that form's display. This form is like a real-time view of data that is changing. But, the form may not exist (it is...

C# / C Sharp

Global Data declaration : "error: initializer element is not constant"

by: hankypan1 | last post by:

Hi All, I need a tree data structure for my application. It is the non - cyclic simple tree where i can have any number of children node and each child can recursively become a sub tree like a...

C / C++

data persistence - best practice advice

by: mirandacascade | last post by:

Assume the following: 1) multi-user environment 2) when user opens app, want to run some code that retrieves some information specific to the user...retrieving this information is somewhat i/o...

Microsoft Access / VBA

char[4] data type in c for storing IPs - VB.NET equivalence

by: =?Utf-8?B?ZW1pdG9qbGV5ZXM=?= | last post by:

Hi everyone: i read from the documentation of a dll that there is a struct that uses char for storing an IP address. How can it be? and how come i can get to representate the same value in a...

Visual Basic .NET

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Data Structure Issue

Similar topics