By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,418 Members | 1,104 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,418 IT Pros & Developers. It's quick & easy.

canonical way for handling raw data

P: n/a
Hi!

Whats the canonical way for handling raw data. I want to read a file without
making any assumption about its structure and store portions of it in memory
and compare ranges with constant byte sequences. _I_ would read it
into arrays of unsigned char and use C's memcmp(), but as you see Im a
novice C++ programmer and think that theres some better, typically used,
way.

Regards
lal
Jul 19 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Matthias Czapla wrote:
Hi!

Whats the canonical way for handling raw data. I want to read a file without
making any assumption about its structure and store portions of it in memory
and compare ranges with constant byte sequences. _I_ would read it
into arrays of unsigned char and use C's memcmp(), but as you see Im a
novice C++ programmer and think that theres some better, typically used,
way.


I've seen all kinds of messes when handling raw data !

Before you go down writing memcmp everywhere, ask yourself, what do
these "chunks of raw data" do ?

Do you:
- concatenate them
- do you write to them
- do you convert them
- do you break them up into smaller chunks

..... write a list of operations you do with them.

Sometimes you'll benefit from using a regular vector<char> and sometimes
you need somthing a little fancier.

I tend to write code that avoids copying data and so I usually have a
"Buffer" class where I can create create chunks of raw data and
reference chunks within those chunks .... etc The idea is that data is
not copied.


Jul 19 '05 #2

P: n/a
Gianni Mariani wrote:
Matthias Czapla wrote:
Hi!

Whats the canonical way for handling raw data. I want to read a file without
making any assumption about its structure and store portions of it in memory
and compare ranges with constant byte sequences. _I_ would read it
into arrays of unsigned char and use C's memcmp(), but as you see Im a
novice C++ programmer and think that theres some better, typically used,
way.


I've seen all kinds of messes when handling raw data !

Before you go down writing memcmp everywhere, ask yourself, what do
these "chunks of raw data" do ?

Do you:
- concatenate them
- do you write to them
- do you convert them
- do you break them up into smaller chunks

.... write a list of operations you do with them.


Ok, I have an image file of some smartcard used in a digital camera which was
accidentally deleted/formatted. I want to search in this file for occurences
of one of several byte sequences which indicate the start of a JPEG picture.
So Im interested in the position of these sequences in the file.

I already wrote a pure C program which works seemingly well but Im currently
in the process of gronking C++ and want to reimplement the program the C++ way.

Regards
lal
Jul 19 '05 #3

P: n/a
Matthias Czapla wrote:
Hi!

Whats the canonical way for handling raw data. I want to read a file without
making any assumption about its structure and store portions of it in memory
and compare ranges with constant byte sequences. _I_ would read it
into arrays of unsigned char and use C's memcmp(), but as you see Im a
novice C++ programmer and think that theres some better, typically used,
way.

Regards
lal


The method for handling raw unstructured data is to read it into a
buffer, then parse the buffer.

One process that I use is to have classes for each datum type and have
the classes provide a "load from buffer" and "store to buffer"
methods. I then pass a pointer to the buffer and call the load
methods of the class. The load method would advance the buffer
pointer:
class MyClass
{
public:
void load_from_buffer(unsigned char * & buffer_pointer);
};

void
MyClass ::
load_from_buffer(unsigned char * & buffer_pointer)
{
my_item = *((/* type of my_item */ *) buffer_pointer);
buffer_pointer += sizeof /* type of my item */;
// ...
return;
}

also:
template <class AnyType>
AnyTtype load_from_buffer(unsigned char * & buffer_pointer)
{
return *((AnyType *) buffer_pointer);
}

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 19 '05 #4

P: n/a
Thomas Matthews wrote:
The method for handling raw unstructured data is to read it into a
buffer, then parse the buffer.

One process that I use is to have classes for each datum type and have
the classes provide a "load from buffer" and "store to buffer"
methods. I then pass a pointer to the buffer and call the load
methods of the class. The load method would advance the buffer
pointer:
class MyClass
{
public:
void load_from_buffer(unsigned char * & buffer_pointer);
};

void
MyClass ::
load_from_buffer(unsigned char * & buffer_pointer)
{
my_item = *((/* type of my_item */ *) buffer_pointer);
buffer_pointer += sizeof /* type of my item */;
// ...
return;
}

also:
template <class AnyType>
AnyTtype load_from_buffer(unsigned char * & buffer_pointer)
{
return *((AnyType *) buffer_pointer);
}


Tanks for your reply. I thought about using a separate class for I/O too.
The most important point for me in your explanation is the use of unsigned
char to hold the data. Mind you asking me whats the advantage of using
unsigned over signed char? Do you agree to using std::ifstream::read() for
reading the data?
Jul 19 '05 #5

P: n/a
Matthias Czapla wrote:
Thomas Matthews wrote:
Tanks for your reply. I thought about using a separate class for I/O too.
The most important point for me in your explanation is the use of unsigned
char to hold the data. Mind you asking me whats the advantage of using
unsigned over signed char? Do you agree to using std::ifstream::read() for
reading the data?


Unsigned char allows usage of all the bits, without any worries about
overflow and signing. I just want a simple 'byte' or smallest
accessible unit. The 'signed' quantities have issues when it comes
to bitmanipulation (such as shifting).

I guess it's just my style. You can find good discussions about
signed and unsigned integral types in this newsgroup and
our neighbor news:comp.lang.c++.

You can use ifstream::read() as long as the file is opened in
binary mode. The binary mode tells the compiler/platform to
_NOT_ perform any translations on the data.

There are also claims that fread() is simpler and faster.
However, since developer time and quality is more important
than speed, go with ifstream::read().

In my Binary_Stream class, I have a pure virtual function:
unsigned long size_on_stream() const = 0;
All classes that use the Binary_Stream interface must provide
the size that they occupy on the stream. This allows one to
query an object about the size of data it requires in order
to allocate a buffer for reading:
unsigned long buffer_size = my_msg.size_on_stream();
unsigned char * buffer = new unsigned char[buffer_size];
my_data_file.read(buffer, buffer_size);
unsigned char * buf_ptr(buffer);
my_msg.load_from_buffer(buf_ptr);
delete [] buffer;
One nice benefit is that objects can be written to and read
from a stream without knowing any details about the object!

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 19 '05 #6

P: n/a
Thomas Matthews wrote:
Matthias Czapla wrote:
Thomas Matthews wrote:

I guess it's just my style. You can find good discussions about
signed and unsigned integral types in this newsgroup and
our neighbor news:comp.lang.c++.


That should be news:comp.lang.c.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 19 '05 #7

P: n/a
Thomas Matthews wrote:
char to hold the data. Mind you asking me whats the advantage of using
unsigned over signed char? Do you agree to using std::ifstream::read() for
reading the data?
Unsigned char allows usage of all the bits, without any worries about
overflow and signing. I just want a simple 'byte' or smallest
accessible unit. The 'signed' quantities have issues when it comes
to bitmanipulation (such as shifting).


I see.
I guess it's just my style. You can find good discussions about
signed and unsigned integral types in this newsgroup and
our neighbor news:comp.lang.c++.

You can use ifstream::read() as long as the file is opened in
binary mode. The binary mode tells the compiler/platform to
_NOT_ perform any translations on the data.
Ill remember that.
There are also claims that fread() is simpler and faster.
However, since developer time and quality is more important
than speed, go with ifstream::read().
And as I stated elsewhere I want to do it the "C++ way".
In my Binary_Stream class, I have a pure virtual function:
unsigned long size_on_stream() const = 0;
All classes that use the Binary_Stream interface must provide
the size that they occupy on the stream. This allows one to
query an object about the size of data it requires in order
to allocate a buffer for reading:
unsigned long buffer_size = my_msg.size_on_stream();
unsigned char * buffer = new unsigned char[buffer_size];
my_data_file.read(buffer, buffer_size);
unsigned char * buf_ptr(buffer);
my_msg.load_from_buffer(buf_ptr);
delete [] buffer;
One nice benefit is that objects can be written to and read
from a stream without knowing any details about the object!


Very nice. That has given me an idea about the topic. As it seems raw data
handling isnt too different from Cs and when I think about it this is
logical since this is very low level. Thank you for your help.

Regards
lal
Jul 19 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.