473,386 Members | 1,860 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

What does "formatted" I/O really mean?

I'm still not completely sure what's going on with C++ I/O regarding the
extractors and inserters. The following document seems a bit inconsistent:
http://gcc.gnu.org/onlinedocs/libstd...o/howto.html#1

Copying a file:

WRONG WAY:
#include <fstream>
std::ifstream IN ("input_file");
std::ofstream OUT ("output_file");
OUT << IN; // undefined behavior

RIGHT WAY:
//[T]he easiest way to copy the file is:
OUT << IN.rdbuf();

HOWEVER:
"First, ios::binary has exactly one defined effect, no more and no less.
Normal text mode has to be concerned with the newline characters, and the
runtime system will translate between (for example) '\n' and the
appropriate end-of-line sequence (LF on Unix, CRLF on DOS, CR on Macintosh,
etc)....

Second, using << to write and >> to read isn't going to work with the
standard file stream classes, even if you use skipws during reading. Why
not? Because ifstream and ofstream exist for the purpose of formatting, not
reading and writing. Their job is to interpret the data into text
characters, and that's exactly what you don't want to happen during binary
I/O.

BUT IT SAID:
[T]he easiest way to copy the file is:
OUT << IN.rdbuf();

Does that only apply to "text" files?

"Third, using the get() and put()/write() member functions still aren't
guaranteed to help you. These are "unformatted" I/O functions, but still
character-based. (This may or may not be what you want, see below.)"

I saw below, but don't know what I was supposed to see. Is it the endian
stuff?

If I open a file in binary mode, then f.rdbuf() >> stringstrm, is the entire
file going to be faithfully represented bit-for-bit in the
std::stringstream? If not, how will it have been changed? Note that I
made no mention of unsetting skipws here.

You may think I'm just had-headed, and can't understand that I shouldn't use
the overloaded shift operators for unformatted data. Well, suppose someone
else were to do that, and it worked for them. Is there a potential that it
could cause problems for me? I certainly have used the above method to
read "raw" data in the past.

Also, I _do_ want to "format" the data. I want to parse and ELF file into
its elementary components.
--
If our hypothesis is about anything and not about some one or more
particular things, then our deductions constitute mathematics. Thus
mathematics may be defined as the subject in which we never know what we
are talking about, nor whether what we are saying is true.-Bertrand Russell
Aug 1 '05 #1
2 3487

Steven T. Hatton wrote:
HOWEVER:
"First, ios::binary has exactly one defined effect, no more and no less.
Normal text mode has to be concerned with the newline characters, and the
runtime system will translate between (for example) '\n' and the
appropriate end-of-line sequence (LF on Unix, CRLF on DOS, CR on Macintosh,
etc)....
Yes, that is indeed the case. It does not mean that if you stream with
operator<< or operator>> it will write in "binary" format.
Second, using << to write and >> to read isn't going to work with the
standard file stream classes, even if you use skipws during reading. Why
not? Because ifstream and ofstream exist for the purpose of formatting, not
reading and writing. Their job is to interpret the data into text
characters, and that's exactly what you don't want to happen during binary
I/O.
Correct, but streambuf is there underneath as no more than an array of
characters.
BUT IT SAID:
[T]he easiest way to copy the file is:
OUT << IN.rdbuf();
If you can get the length of the buffer you can also use write() which
is used for binary I/O. You must beware of one thing though - those
nasty char_traits. I was using a basic_streambuf< unsigned char > for
binary I/O and found some characters missing. It turned out it was
randomly removing 0xff characters after interpreting them as "EOF". So
I had to write my own char_traits for unsigned char and attach that to
my stream as my second template parameter (thus basic_iostream<
unsigned char, uchtraits > where uchtraits is my own "traits" class).
Then it worked.
"Third, using the get() and put()/write() member functions still aren't
guaranteed to help you. These are "unformatted" I/O functions, but still
character-based. (This may or may not be what you want, see below.)"
They are based in characters that have traits. You are not forced to
use char_traits<char>

I saw below, but don't know what I was supposed to see. Is it the endian
stuff?
Nothing to do with endian stuff, except that if you used basic_fstream
(basic_iostream) on a character type of 2 bytes or more to write
integers then endian stuff might come into play. (One reason why
wchar_t is generally not used as a character. Instead one-byte
characters and codepages are used).
If I open a file in binary mode, then f.rdbuf() >> stringstrm, is the entire
file going to be faithfully represented bit-for-bit in the
std::stringstream? If not, how will it have been changed? Note that I
made no mention of unsetting skipws here.
There is no operator>> overload for basic_streambuf/filebuf.
You may think I'm just had-headed, and can't understand that I shouldn't use
the overloaded shift operators for unformatted data. Well, suppose someone
else were to do that, and it worked for them. Is there a potential that it
could cause problems for me? I certainly have used the above method to
read "raw" data in the past.
Your own objects can use operator>> and operator<< in whatever way they
want, writing in binary format if they choose. They do not need to be
humanly readable.
Also, I _do_ want to "format" the data. I want to parse and ELF file into
its elementary components.


Then get your objects to format binary data. How is STL supposed to
know your format? If you format your data to be a fixed size then use
read() and write(). If a variable size then put a "header" section
inside and resolve any endian issues by enforcing one particular endian
notation. (Normally I would choose big-endian unless you are going to
primarily be working on a little-endian system and can optimise for
that system).

Aug 1 '05 #2
Steven T. Hatton wrote:
Second, using << to write and >> to read isn't going to work with the
standard file stream classes, even if you use skipws during reading. Why
not? Because ifstream and ofstream exist for the purpose of formatting,
not reading and writing. Their job is to interpret the data into text
characters, and that's exactly what you don't want to happen during binary
I/O.

BUT IT SAID:
[T]he easiest way to copy the file is:
OUT << IN.rdbuf();

Does that only apply to "text" files?
No, it applies to all files. The thing here is that this output operator
considers the whole sequences of characters produced by 'IN.rdbuf()' as
one (unaltered) sequence of text characters.
"Third, using the get() and put()/write() member functions still aren't
guaranteed to help you. These are "unformatted" I/O functions, but still
character-based. (This may or may not be what you want, see below.)"

I saw below, but don't know what I was supposed to see. Is it the endian
stuff?
I don't know for sure what they wanted to refer at, except maybe the
discussion about binary formatted I/O. The actual issue which I haven't
seen addressed on this page is that the bytes in the file are converted
into characters by processing them in a locale specific way. If you want
to do binary I/O you need this conversion to have no effect. This is
done by selecting the "C" locale.
If I open a file in binary mode, then f.rdbuf() >> stringstrm, is the
entire file going to be faithfully represented bit-for-bit in the
std::stringstream?
No, it is not: First of all, there is no overload for 'operator>>()'
taking a stream buffer as first argument and a stream as second
argument. Assuming you wanted to write 'f >> stringstream.rdbuf()'
this still does not work because formatted input operators start by
skipping white space unless 'skipws' is turned off. It works the other
way around, though: 'stringstream << f.rdbuf()' (assuming 'stringstream'
is a variable of an appropriate type, e.g. 'std::ostringstream').
You may think I'm just had-headed, and can't understand that I shouldn't
use the overloaded shift operators for unformatted data. Well, suppose
someone else were to do that, and it worked for them.
You can use essentially just one of the predefined shift operators
reasonably for binary I/O and this is the output operator taking a
stream buffer pointer. Everything else is too error prone, IMO, to
be used reasonably although you can go a long way to come close to a
working implementation. For example, you could even make the inserters
and extractors for numeric types work on a binary format by creating
appropriate 'num_put' and 'num_get' facets. This is, however, not their
intended purpose and there are still sufficient problems left. It is
easier to create a new stream hierarchy for binary I/O and it avoids
a bunch of pitfalls (e.g. to forget to unset 'skipws' or accidental
use of operators for text formatted I/O).
I certainly have used the above method to read "raw" data in the past.
If I had to read the whole file into a container, I would read "raw"
data like this:

std::vector<char> data((std::istreambuf_iterator<char>(in)),
std::istreambuf_iterator<char>());

Of course, this tends to be pretty slow because it requires a certain
optimization to be in place which is typically not implemented. Thus,
I'm using streams for a binary format.
Also, I _do_ want to "format" the data. I want to parse and ELF file
into its elementary components.


Yes, the distinction between formatted and unformatted does not really
fit well for this. The difference is between text formatted and binary
formatted. You want a binary format. This is not directly supported by
the IOStreams hierarchy although you can still use the stream buffer
hierarchy for the actual reading/writing from a file. You just have to
take care of opening the files in binary mode and suppressing any
character conversions.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence
Aug 1 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Jeff Sandler | last post by:
I have a web page. It uses JavaScript to test the user's input before sending it to the server. It frequently tests using isNaN() with some very interesting results. The statements in question...
8
by: John Dalberg | last post by:
What happens when a cookie expires? Does it mean that when the browser or sessions ends, it doesn't get saved? I am using Opera and looking at available cookies and I can some cookies that have...
2
by: Robin | last post by:
I have set up a form with some basic input data, e.g. Name, Address, Telephone number and a few selections from drop-down menus; and a submit button. I am using "mailto" to send the contents of...
28
by: john_sips_tea | last post by:
Just tried Ruby over the past two days. I won't bore you with the reasons I didn't like it, however one thing really struck me about it that I think we (the Python community) can learn from. ...
18
by: Martin Jørgensen | last post by:
Hi, Today I got a really strange problem... I've made myself a data-file and I read in data from that file.... When I read something like this line: 03 04 05, 00 04 01, 05 03 07, 08 03...
20
by: Frank Millman | last post by:
Hi all This is probably old hat to most of you, but for me it was a revelation, so I thought I would share it in case someone has a similar requirement. I had to convert an old program that...
0
by: Jon | last post by:
If anyone can help...Whenevr I go into a form and use the ctrl+F to find something with the binoculars the "Search field as formatted" is checked as default. This seems to slow down the find...
25
by: Peng Yu | last post by:
Hi, It is possible to change the length of "\t" to a number other than 8. std::cout << "\t"; Thanks, Peng
19
by: maya | last post by:
hi, so what is "modern" javascript?? the same as "DOM-scripting"? i.e., editing content (or changing appearance of content) dynamically by massaging javascript objects, html elements, etc? ...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.