By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,305 Members | 1,704 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,305 IT Pros & Developers. It's quick & easy.

Tree in C++

P: n/a
In scanning a drive and comparing file content, I need to remember
filepaths to every single file I encounter so I can take actions on
certain files later on.

Rather than having a huge list enumerating the complete filepath to
every file it seems the smarter way (faster, more memmory efficient) is
to model the filesystem treestructure in a abstract tree - and having
only the filenames & node pointer in an Array.

struct tree {
struct tree *parent;
struct tree *child;
CString filename;
};

struct file {
CString name;
struct tree *path;
};

Does this sound like a good idea? Specifically, can I utilize an
existing data structure? What about stack/heap issues and tradeoffs?

Not being awfully experienced with trees and pointers, why won't this run:

struct tree *root;
root = ( struct tree * ) malloc ( sizeof( struct tree ) );
root->filename = "Test";

---
Unhandled exception at 0x0041b07b in DubWiz.exe: 0xC0000005: Access
violation reading location 0xcdcdcdc1.
In "atlsimpstr.h", GetLength() method (VC++ 7.0 compiler using MFC/ATL)
---

Thanks in advance,
Casper
Jul 22 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Ian
Casper wrote:
In scanning a drive and comparing file content, I need to remember
filepaths to every single file I encounter so I can take actions on
certain files later on.

Rather than having a huge list enumerating the complete filepath to
every file it seems the smarter way (faster, more memmory efficient) is
to model the filesystem treestructure in a abstract tree - and having
only the filenames & node pointer in an Array.

struct tree {
struct tree *parent;
struct tree *child;
CString filename;
};
What's a CString?
struct file {
CString name;
struct tree *path;
};

Does this sound like a good idea? Specifically, can I utilize an
existing data structure? What about stack/heap issues and tradeoffs?

Not being awfully experienced with trees and pointers, why won't this run:

struct tree *root;
root = ( struct tree * ) malloc ( sizeof( struct tree ) );
root->filename = "Test";

Why use malloc?

tree* root = new tree;

would be better.

Ian
Jul 22 '05 #2

P: n/a
"Casper" <ca****@jbr.dk> wrote in message
news:eA********************@wagner.videotron.net.. .
In scanning a drive and comparing file content, I need to remember
filepaths to every single file I encounter so I can take actions on
certain files later on.

Rather than having a huge list enumerating the complete filepath to
every file it seems the smarter way (faster, more memmory efficient) is
to model the filesystem treestructure in a abstract tree - and having
only the filenames & node pointer in an Array.

struct tree {
struct tree *parent;
struct tree *child;
CString filename;
};

struct file {
CString name;
struct tree *path;
};
This doesn't make much sense to me. Perhaps you meant something like this:

struct dir_or_file {
// Note: "struct" not necessary for C++
dir_or_file *parent;
int numChildren;
// pointer to first element of array of size numChildren
dir_or_file *child;
CString name;

// maybe some flag here indicating whether or not it is a file or a
directory
};
Does this sound like a good idea? Specifically, can I utilize an
existing data structure? What about stack/heap issues and tradeoffs?

Not being awfully experienced with trees and pointers, why won't this run:

struct tree *root;
root = ( struct tree * ) malloc ( sizeof( struct tree ) );
root->filename = "Test";


Your mistake is that you used malloc. The object to which "root" points did
not have its constructor executed before you attempted to assign the value
"Test" to one of its members. It it important to execute constructors
before attempting to use the object, so you should have instead used "new":

root = new tree();

This will cause the (compiler-generated) constructor for the object to be
executed. Now, the object will be properly initialized and the assignment
should not cause any problems.

--
David Hilsee
Jul 22 '05 #3

P: n/a
David Hilsee wrote:
This doesn't make much sense to me. Perhaps you meant something like this:

struct dir_or_file {
// Note: "struct" not necessary for C++
dir_or_file *parent;
int numChildren;
// pointer to first element of array of size numChildren
dir_or_file *child;
CString name;

// maybe some flag here indicating whether or not it is a file or a
directory
};
Actually that would mix path and filename which is not optimum for my
problem - but I can understand why my proposed structures seem weird to
the uninvited.
Basically what I do is to collect ALL filenames and its size in a list.
The list uses insertion sort so that after scanning, I will be able to
determine which files are *exactly* the same size. When I have similar
sized files I do a binary compare (or CRC not sure yet) to verify they
are identical. When dublets are found I will delete all but one copy and
create hard links for the remaining copies. But to be able to delete
(and know where to point the hard link at) I need the path part of this
list item - and that's where the tree comes in - the list item holds a
pointer to a node in my directory tree.

Its all part pf a space saver experiment of mine.
Your mistake is that you used malloc. The object to which "root" points did
not have its constructor executed before you attempted to assign the value
"Test" to one of its members. It it important to execute constructors
before attempting to use the object, so you should have instead used "new":

root = new tree();

This will cause the (compiler-generated) constructor for the object to be
executed. Now, the object will be properly initialized and the assignment
should not cause any problems.


Indeed that did the trick. I will continue my experiment then, thanks
for the feedback! :)

Casper

Jul 22 '05 #4

P: n/a
"Casper" <ca****@jbr.dk> wrote in message
news:o0********************@wagner.videotron.net.. .
David Hilsee wrote:
This doesn't make much sense to me. Perhaps you meant something like this:
struct dir_or_file {
// Note: "struct" not necessary for C++
dir_or_file *parent;
int numChildren;
// pointer to first element of array of size numChildren
dir_or_file *child;
CString name;

// maybe some flag here indicating whether or not it is a file or a
directory
};


Actually that would mix path and filename which is not optimum for my
problem - but I can understand why my proposed structures seem weird to
the uninvited.
Basically what I do is to collect ALL filenames and its size in a list.
The list uses insertion sort so that after scanning, I will be able to
determine which files are *exactly* the same size. When I have similar
sized files I do a binary compare (or CRC not sure yet) to verify they
are identical. When dublets are found I will delete all but one copy and
create hard links for the remaining copies. But to be able to delete
(and know where to point the hard link at) I need the path part of this
list item - and that's where the tree comes in - the list item holds a
pointer to a node in my directory tree.

Its all part pf a space saver experiment of mine.
Your mistake is that you used malloc. The object to which "root" points did not have its constructor executed before you attempted to assign the value "Test" to one of its members. It it important to execute constructors
before attempting to use the object, so you should have instead used "new":
root = new tree();

This will cause the (compiler-generated) constructor for the object to be executed. Now, the object will be properly initialized and the assignment should not cause any problems.


Indeed that did the trick. I will continue my experiment then, thanks
for the feedback! :)

Casper


Why not store the file size information in the tree as well, and sort by
that? The performance should be much better than either a linked list or an
array (insertion sort); the tree should be reasonably well balanced.
Jul 22 '05 #5

P: n/a
Why not store the file size information in the tree as well, and sort by
that? The performance should be much better than either a linked list or an
array (insertion sort); the tree should be reasonably well balanced.

Sounds very very interesting! I am not 100% sure I understand how a
fixed/immutable nested tree structure can be sorted by a node data
member?! Would it work as a secondard access path to the structure? I
would love to see a prototype example of the struct you have in mind!

Sincerely,
Casper

Jul 22 '05 #6

P: n/a
"Bill Thompson" wrote:
Basically what I do is to collect ALL filenames and its size in a list.
The list uses insertion sort so that after scanning, I will be able to
determine which files are *exactly* the same size. When I have similar
sized files I do a binary compare (or CRC not sure yet) to verify they
are identical. When dublets are found I will delete all but one copy and
create hard links for the remaining copies. But to be able to delete
(and know where to point the hard link at) I need the path part of this
list item - and that's where the tree comes in - the list item holds a
pointer to a node in my directory tree.
Why not store the file size information in the tree as well, and sort by
that? The performance should be much better than either a linked list or

an array (insertion sort); the tree should be reasonably well balanced.


Sounds like a job for a priority queue keyed on the file size ...

No need for post-sorting afterwards (and the hard link could even be created
during the creation of the priority queue, whenever a duplicate key is
found). Needs to handle multiple entries with the same file size, though -
maybe a priority queue of linked lists ? Pointing into another (tree)
structure which stores the filenames in a memory effecient way ?

http://www.dinkumware.com/manuals/re...priority_queue

David Fisher
HSA Systems
Jul 22 '05 #7

P: n/a
"Casper" <ca****@jbr.dk> wrote in message
news:9a********************@wagner.videotron.net.. .
Why not store the file size information in the tree as well, and sort by
that? The performance should be much better than either a linked list or an array (insertion sort); the tree should be reasonably well balanced.

Sounds very very interesting! I am not 100% sure I understand how a
fixed/immutable nested tree structure can be sorted by a node data
member?! Would it work as a secondard access path to the structure? I
would love to see a prototype example of the struct you have in mind!

Sincerely,
Casper


My apologies for the poorly worded statement: when building the tree, use
the filesize as the sort field.

You can find C++ code for simple binary trees very easily on the internet.
Without looking too hard I found this link:
http://www.cs.bu.edu/teaching/cs112/...0/binary-tree/

The code would have to be modified to deal with duplicate file sizes.

It appears that you aren't particularly interested in sorting by the
filename. After building the tree, you can traverse it in order by file
size; as you traverse you can pull out sets of files with the same size and
deal with them as you see fit.

Another possibility is finding and using a database api. Or, easier still,
writing a program to translate the file system into a file that can be
imported into a database, processing the data, export a file back out, and
write another program to deal with the exported data.

As you've indicated, you could also build two trees, one sorted by name, the
other by size; all refering to the same bank of file data.

Jul 22 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.