printing binary data?

Steven T. Hatton

I'm trying to write a program like hexel. I guess I could fish out the
source for hexel and look at that, but for now I'm trying to figure out how
I can do with with std::stringstream and std::string. I had something
working with std::string. I simply treated it as an STL container, and
iterated over its elements. The results were a bit confusing to me. Some
of the stuff was printing out as 1 or 2 characters hex numbers, as I
expected. Other characters were printing out in what looks to me to be
representative of a larger data size than a byte. For example:

00 04 00 fffffff1 ffffffff

I decided to try fetching the std::string::data() representation, and then
to use regular char pointers, but that didn't work as I naively expected.
For example, I was trying to add the size of the string to the pointer
returned from std::string::data; The result was 0;

Is there a better approach to working with bytes of raw data than using
strings? I mean using tools from the Standard Library?

I'm thinking my problem is comming from the fact that the locale is set to
en_US.UTF-8, but I really don't know how that might impact the behavior of
std::string;

--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 26 '05 #1

Subscribe Post Reply

7764

red floyd

Steven T. Hatton wrote:

I'm trying to write a program like hexel. I guess I could fish out the
source for hexel and look at that, but for now I'm trying to figure out how
I can do with with std::stringstream and std::string. I had something
working with std::string. I simply treated it as an STL container, and
iterated over its elements. The results were a bit confusing to me. Some
of the stuff was printing out as 1 or 2 characters hex numbers, as I
expected. Other characters were printing out in what looks to me to be
representative of a larger data size than a byte. For example:

00 04 00 fffffff1 ffffffff

I decided to try fetching the std::string::data() representation, and then
to use regular char pointers, but that didn't work as I naively expected.
For example, I was trying to add the size of the string to the pointer
returned from std::string::data; The result was 0;

Is there a better approach to working with bytes of raw data than using
strings? I mean using tools from the Standard Library?

I'm thinking my problem is comming from the fact that the locale is set to
en_US.UTF-8, but I really don't know how that might impact the behavior of
std::string;

Make sure your data is unsigned (the leading 'f's are sign extension).
Also, if you need to, mask it with 0xff just in your output.

i.e.: instead of

os << *p;

use:

os << (*p & 0xff); // worst case scenario

Jul 26 '05 #2

Stephen Howe

>> Other characters were printing out in what looks to me to be

representative of a larger data size than a byte. For example:

00 04 00 fffffff1 ffffffff

A pure guess: characters are being converted signed ints and that is the
source of your 8-digit hex values.
Is there a better approach to working with bytes of raw data than using
strings? I mean using tools from the Standard Library?

We have no idea what you did as there is _NO_ example code
or an example what data output you wanted.

Stephen Howe

Jul 26 '05 #3

Steven T. Hatton

"Stephen Howe" <sjhoweATdialDOTpipexDOTcom> wrote:

Other characters were printing out in what looks to me to be
representative of a larger data size than a byte. For example:

00 04 00 fffffff1 ffffffff
A pure guess: characters are being converted signed ints and that is the
source of your 8-digit hex values.
That was my supposition. The problem seems to be that std::string and
std::ifstream, etc., are using signed char; which is one of the more
annoying aspects of the C++ Standard.

I read the data using std::ifstream, then I used a std::ostringstrm to
convert it to std::string.
Is there a better approach to working with bytes of raw data than using
strings? I mean using tools from the Standard Library?

We have no idea what you did as there is _NO_ example code
or an example what data output you wanted.

I thought it was fairly clear that I wanted two character hex
representations of each unit of data. I was asking if there were
components of the Standard Library better suited to working with data in
binary form. Perhaps something similar to Java's
java.io.ByteArrayInputStream:

http://java.sun.com/j2se/1.5.0/docs/...putStream.html

This stuff's pretty nice to work with:
http://java.sun.com/j2se/1.5.0/docs/...ocketImpl.html
http://java.sun.com/j2se/1.5.0/docs/...verSocket.html

Ironically, one of the primary design features which makes it so viable is
taken direction from TC++PL. Even the naming convention is the one
Stroustrup introduced. Whith C++ products, I often find myself spending
more time trying to second guess macros and understand the idiosyncracies
of the particular implementation. It's a shame so few C++ programmers
really understand what I'm talking about. There's really not that much
wrong with C++, per se. The problems result from people failing to
understand how little things add up to big problems.

I started the code listed below based on a 280 line program that used all
kinds of typical C-style convolutions. There is a small bit of the
original functionality missing, but I can restor that with about five lines
of code. The solution I came up with for the negative char values is quite
obvious. I now understand that the 8-place hex values are the result of
converting a negative char to an unsigned int. Casting to int is necessary
because the implementation tries to print char data as characters, whereas
it prints ints as numbers.

#include <iostream>
#include <fstream>
#include <iomanip>
#include <sstream>
#include <string>

namespace hexlite {
using namespace std;
typedef string::const_iterator c_itr;

ostream& printline(c_itr start, c_itr stop, ostream& out) {
while(start<stop) out<<setw(2)<<(128 + static_cast<int>(*start++))<<" ";
return out;
}

ostream& dump(const string& dataString, ostream& out) {

ostream hexout(out.rdbuf());
hexout.setf(ios::hex, ios::basefield);
hexout.fill('0');

c_itr from (dataString.begin());
c_itr dataEnd (from + dataString.size());
c_itr end (dataEnd - (dataString.size()%16));

for(c_itr start = from; start < end; start += 16)
printline(start, start + 16, hexout)<<endl;

printline(end, dataEnd, hexout)<<endl;
return out;
}
}

int main(int argc, char* argv[]) {
if (argc < 1) { std::cerr<<"enter a file name"<<std::endl; return -1; }

std::ifstream inf(argv[1]);
if(inf) {
std::ostringstream oss;
oss << inf.rdbuf();
hexlite::dump(oss.str(), std::cout);
return 0;
}
std::cerr <<"\nCan't open file:"<<argv[1]<<std::endl;
return -1;
}
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 27 '05 #4

Karl Heinz Buchegger

"Steven T. Hatton" wrote:

Ironically, one of the primary design features which makes it so viable is
taken direction from TC++PL. Even the naming convention is the one
Stroustrup introduced. Whith C++ products, I often find myself spending
more time trying to second guess macros and understand the idiosyncracies
of the particular implementation. It's a shame so few C++ programmers
really understand what I'm talking about. There's really not that much
wrong with C++, per se. The problems result from people failing to
understand how little things add up to big problems.

The problem with people like you is, that they continue to think that
your view of the world is the only correct one. If you would stop
to do that but instead start playing the game the C++ way, you would
have fewer problems.

You are simply using the wrong tools for your attempt. std::string
and stringstreams are not ment to be used for manipulating binary
data. If you want to do that, then eg. std::vector< unsigned char >
is your tool.
--
Karl Heinz Buchegger
kb******@gascad.at

Jul 27 '05 #5

Tobias Blomkvist

Steven T. Hatton sade:

That was my supposition. The problem seems to be that std::string and
std::ifstream, etc., are using signed char; which is one of the more
annoying aspects of the C++ Standard.

Try

typedef std::basic_ifstream<unsigned char> uifstream;

Tobias
--
IMPORTANT: The contents of this email and attachments are confidential
and may be subject to legal privilege and/or protected by copyright.
Copying or communicating any part of it to others is prohibited and may
be unlawful.

Jul 27 '05 #6

Dietmar Kuehl

Tobias Blomkvist wrote:

Try

typedef std::basic_ifstream<unsigned char> uifstream;

It is definitely not that easy. To create stream objects operating
on a different character type than 'char' and 'wchar_t' you have
to do quite a lot of work although it is mostly relatively trivial.
However, I don't think you really need to do this at all because
you don't want all those formatting functions for binary data
anyway. The easiest approach for binary data is, IMO, to create a
a "formatting" layer similar to the text formatting layer which
uses stream buffers underneath. In this context it is acceptable
that the stream buffer actually uses 'char' objects and to cast
them to 'unsigned char' where necessary.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

Jul 27 '05 #7

Steven T. Hatton

Dietmar Kuehl wrote:

Tobias Blomkvist wrote:
Try

typedef std::basic_ifstream<unsigned char> uifstream;
It is definitely not that easy.

I had to think for a moment to determine of that has slipped past my edits.
It is exactly what appeared in my code at one point.
To create stream objects operating
on a different character type than 'char' and 'wchar_t' you have
to do quite a lot of work although it is mostly relatively trivial.
However, I don't think you really need to do this at all because
you don't want all those formatting functions for binary data
anyway. The easiest approach for binary data is, [see below] In this context it is acceptable
that the stream buffer actually uses 'char' objects and to cast
them to 'unsigned char' where necessary.
This is something that has me a bit confused. If I read in data using a
std::ifstream that has signed char as its character set, then cast it to
unsigned char, will that guarantee me that the content of the
representative storage locations faithfully represents the file? Is that
something I can rely on being portable?

Take the example of converting unsigned char to int. When the char is
negative, the int has, on my system, (if I understand correctly) the 128th
bit of the integer representation set. Therefore -127 would look like this:
1000000...00001111111. Now, if that were cast to unsigned char, we might
expect it to be truncated, rather than having the sign bit preserved.

One question becomes; where to cast? IOW, should I cast signed char to
unsigned char one byte at a time as I pull them out of the input buffer? I
might accomplish that by using a back inserter and copy from the istream
into std::vector<unsigned char>.

I'm currently working on creating a numeric type descriptor template that
will print a description of the numeric_limits class associated with a
numeric type. It looks like this in my edit buffer:

template<typename T, const char[] TypeName>
struct numeric_descriptor: public std::numeric_limits<T>{
static const std::string sc_typeName;
numeric_descriptor{

}
virtual std::ostream& print(std::ostream& out) const {

}
};
// Be aware that the above is purely scratch code, and not expected to be
// useful or even to compile.
IMO, to create a
a "formatting" layer similar to the text formatting layer which
uses stream buffers underneath. In this context it is acceptable
that the stream buffer actually uses 'char' objects and to cast
them to 'unsigned char' where necessary.

I'm rather surprized there isn't a byte (or 'octet') input stream in the
Standard Library. I mean to say a stream of unsigned integral type with a
guaranteed number of bits per unit of data. Perhaps that seemed too
trivial for the designers to consider.

I like the "formatting layer" suggestion. That could come in handy for lots
of representations. After thinking about this a bit more, I believe what I
should be doing is adding 256 only to the negative valued sign char
instances. The way I did things last night, 0 is represented as 0x80,
which is pretty silly.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 27 '05 #8

Steven T. Hatton

Karl Heinz Buchegger wrote:

"Steven T. Hatton" wrote:

Ironically, one of the primary design features which makes it so viable
is
taken direction from TC++PL. Even the naming convention is the one
Stroustrup introduced. Whith C++ products, I often find myself spending
more time trying to second guess macros and understand the idiosyncracies
of the particular implementation. It's a shame so few C++ programmers
really understand what I'm talking about. There's really not that much
wrong with C++, per se. The problems result from people failing to
understand how little things add up to big problems.
The problem with people like you is, that they continue to think that
your view of the world is the only correct one.

I am not the subject of this newsgroup.
If you would stop
to do that but instead start playing the game the C++ way, you would
have fewer problems.
The One True C++ Way[TM]? What I did was based on suggestions from
authoritative C++ experts - or, at least my understanding of such. That
is, using string to hold non-text data. I honestly wish more C++
programmers would do things the C++ way, not the "C with BCPL comment
syntax" way. Ironically, my original post in this thread specifically
asked if there was a better way to accomplish what I am attempting.
You are simply using the wrong tools for your attempt. std::string
and stringstreams are not ment to be used for manipulating binary
data. If you want to do that, then eg. std::vector< unsigned char >
is your tool.

It's not quite that simple. Using std::vector<unsigned char> was one of the
options which crossed my mind, as was using the std::stringbuf inside of
std::ostringstream, rather than spitting it out as std::string.
std::ostringstream seemed to be the easiest way to get the contents of the
file into an in-memory object I could work on. I didn't have to mess with
allocators, or extractors. I'm still not convinced it's a bad idea to use
std::ostringstream to allocate the storage. I might be able to cast its
stringbuf to std::vector<unsigned char> in one step. It may also be
perfectly usable as-is. It does provide many ways of accessing the data.

Fortunately, the code I wrote is fairly generic, and follows STL
conventions, for the most part, so changing the data container should be
relatively easy. The biggest problem I was having not is do to the
underlying data type being signed char. It is due to the fact that
std::ostream derivatives try to print char data as characters rather than
Hindu-Arabic numeric characters. Having the data in a std::vector<unsigned
char> doesn't solve that problem.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 27 '05 #9

Steven T. Hatton

Steven T. Hatton wrote:

After thinking about this a bit more, I believe
what I should be doing is adding 256 only to the negative valued sign char
instances. The way I did things last night, 0 is represented as 0x80,
which is pretty silly.

ostream& printline(c_itr start, c_itr stop, ostream& out) {
while(start<stop) out
<<setw(2)
<<(static_cast<unsigned int>(static_cast<unsigned char>(*start++)))<<"
";
Duh!
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 27 '05 #10

Old Wolf

Steven T. Hatton wrote:

"Stephen Howe" <sjhoweATdialDOTpipexDOTcom> wrote:
Other characters were printing out in what looks to me to be
representative of a larger data size than a byte. For example:

00 04 00 fffffff1 ffffffff
A pure guess: characters are being converted signed ints and that is the
source of your 8-digit hex values.

That was my supposition. The problem seems to be that std::string and
std::ifstream, etc., are using signed char; which is one of the more
annoying aspects of the C++ Standard.

They use plain char. Most compilers have a switch that decides
whether plain char is signed or not. The standard allows plain
char to be unsigned.

Unfortunately there is so much existing code that would break
if plain char were unsigned, that it would be suicidal for a
compiler vendor to make that the default for IA32. We're
stuck with signed char for the foreseeable future.
I read the data using std::ifstream, then I used a std::ostringstrm to
convert it to std::string.
Recall that streams are FORMATTERS. If you don't want to reformat
any data, do not use '>>' or '<<'.

Is there a better approach to working with bytes of raw data than using
strings? I mean using tools from the Standard Library?

std::vector<unsigned char> is well suited.
I write a helper function for appending one such buffer to another,
and then they are convenient to use as well.
I thought it was fairly clear that I wanted two character hex
representations of each unit of data. I was asking if there were
components of the Standard Library better suited to working with data in
binary form. Perhaps something similar to Java's
java.io.ByteArrayInputStream:
To read raw data, use istream::get() and put it in a byte vector.
Ironically, one of the primary design features which makes it so viable is
taken direction from TC++PL. Even the naming convention is the one
Stroustrup introduced. Whith C++ products, I often find myself spending
more time trying to second guess macros and understand the idiosyncracies
of the particular implementation. It's a shame so few C++ programmers
really understand what I'm talking about. There's really not that much
wrong with C++, per se. The problems result from people failing to
understand how little things add up to big problems.
A poor workman blames his tools.
The solution I came up with for the negative char values is quite
obvious. I now understand that the 8-place hex values are the result of
converting a negative char to an unsigned int. Casting to int is necessary
because the implementation tries to print char data as characters, whereas
it prints ints as numbers.

ostream& printline(c_itr start, c_itr stop, ostream& out) {
while(start<stop) out<<setw(2)<<(128 + static_cast<int>(*start++))<<" ";
return out;
}

Firstly, the static_cast<int> is superfluous, because when you
add a char to an int (128 in this case), the char is converted
to int implicitly.

This seems a slightly bizarre solution, as you will print ' '
as 0xA0 instead of 0x20 etc., unless I'm missing something.
My preferred way would be:

out << int((unsigned char)*start++)

unless you have a wide screen and want to write out two
static_casts :)

Another way is:

out << (0xFFU & *start++)

which works in 2's complement (which is all known C++ systems).

Jul 27 '05 #11

Tobias Blomkvist

Steven T. Hatton sade:

Take the example of converting unsigned char to int. When the char is
negative, the int has, on my system, (if I understand correctly) the 128th
bit of the integer representation set. Therefore -127 would look like this:
1000000...00001111111. Now, if that were cast to unsigned char, we might
expect it to be truncated, rather than having the sign bit preserved.

-127 = 0x81 = 1000001

sign extended to 4 byte int

11111111 11111111 11111111 10000001

Tobias
--
IMPORTANT: The contents of this email and attachments are confidential
and may be subject to legal privilege and/or protected by copyright.
Copying or communicating any part of it to others is prohibited and may
be unlawful.

Jul 27 '05 #12

Steven T. Hatton

Old Wolf wrote:

Steven T. Hatton wrote:
That was my supposition. The problem seems to be that std::string and
std::ifstream, etc., are using signed char; which is one of the more
annoying aspects of the C++ Standard.

They use plain char. Most compilers have a switch that decides
whether plain char is signed or not. The standard allows plain
char to be unsigned.

Exactly my point. I can, to some extent, appreciate why things are as they
are, but I have to wonder if people have not taken things to extremes.

One thing I have running around in the back of my mind is the idea of
formalizing the idea of an abstract execution host environment. But there
may still be issues of whose machine is closes to the abstraction, and
therefore, unfairly favored, etc.. I found this interesting bit of usenet
traffic in my SuSE 9.3 distro.

http://gcc.gnu.org/onlinedocs/libstd...eams_kuehl.txt
Unfortunately there is so much existing code that would break
if plain char were unsigned, that it would be suicidal for a
compiler vendor to make that the default for IA32. We're
stuck with signed char for the foreseeable future.
A byte-oriented, or, perhaps even larger, unsigned "raw data" stream "out of
the box" would be nice to have.

I read the data using std::ifstream, then I used a std::ostringstrm to
convert it to std::string.

Recall that streams are FORMATTERS.

But stream buffers aren't.
If you don't want to reformat
any data, do not use '>>' or '<<'.
This is one way to get the data without messing with the format:
std::ostringstream oss<< somestream.rdbuf();
std::string somestring(oss.str());

I haven't tried to create a std::vector<unsigned char> directly from the
std::ostringstream::string_buf. It seems doable, but there may be a few
tricks involved.
To read raw data, use istream::get() and put it in a byte vector.
What is not clear to me is whether there is a reliable (or, perhaps I should
say 'standardized') way to get the file size. Ideally, I want a way to
read a file regardless of its location, e.g., local harddrive, network,
etc. One advantage to the approach I've taken is that it works for the
current situation. I also have the ability to use both

Ironically, one of the primary design features which makes it so viable
is
taken direction from TC++PL. Even the naming convention is the one
Stroustrup introduced. Whith C++ products, I often find myself spending
more time trying to second guess macros and understand the idiosyncracies
of the particular implementation. It's a shame so few C++ programmers
really understand what I'm talking about. There's really not that much
wrong with C++, per se. The problems result from people failing to
understand how little things add up to big problems.

A poor workman blames his tools.

Not completely sure what you mean here. There are a lot of people ready to
jump over to C++/CLI without a lot of hesitation.

"Stan Lippman's BLog C++/CLI"

http://blogs.msdn.com/slippman/

There _are_ problems with C++ code bases. There _are_ currently some
significant limitations to what can be done easily with C++. There are also
many examples of things which have evolved over the years to become
horrific big balls of mud

http://www.laputan.org/mud/mud.html#BigBallOfMud
out << int((unsigned char)*start++) Ha! That's basically what I ended up doing.
unless you have a wide screen and want to write out two
static_casts :)
I do have a wide screen, and I use it. However, I'm not sure if using the
static cast is of any real value. I suppose it's a way to document intent.
Another way is:

out << (0xFFU & *start++)

which works in 2's complement (which is all known C++ systems).

OK. If I get that, you are basically converting to unsigned int by masking
the whole char. 0000...000111111 & 10101010 == 000...000010101010. Unless
there's a performance gain to be had, I find that a bit too esoteric.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 28 '05 #13

Steven T. Hatton

Tobias Blomkvist wrote:

Steven T. Hatton sade:
Take the example of converting unsigned char to int. When the char is
negative, the int has, on my system, (if I understand correctly) the
128th bit of the integer representation set. Therefore -127 would look
like this:
1000000...00001111111. Now, if that were cast to unsigned char, we might
expect it to be truncated, rather than having the sign bit preserved.

-127 = 0x81 = 1000001

sign extended to 4 byte int

11111111 11111111 11111111 10000001

Tobias

Whoops!
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell

Jul 28 '05 #14

pillbug

not to detract from the utility and type-safety of <string> and
<sstream>, but sometimes the old ways can present a clarity of
intention unrivaled by modern constructs:

int bytes;
unsigned char data [16];
int fd = open ("data.bin", O_RDONLY | O_BINARY);

while ((bytes = read (fd, data, 16)) == 16)
{
printf ("%02X %02X %02X %02X" \
"%02X %02X %02X %02X" \
"%02X %02X %02X %02X" \
"%02X %02X %02X %02X\n",
data [0],data [1],data [2],data [3],
data [4],data [5],data [6],data [7],
data [8],data [9],data [10],data [11],
data [12],data [13],data [14],data [15]);
}

if (bytes > 0 && bytes != 16)
{
// handle partial line
}

close (fd)

alternatively, if you are enamored of iostreams, you could try this:

typedef std::basic_string<unsigned char> unsigned_string;
typedef std::basic_stringstream<unsigned char> unsigned_stringstream;

sorry if i'm way off here, back to lurking :)

Jul 28 '05 #15

printing binary data?

Similar topics