By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,864 Members | 1,727 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,864 IT Pros & Developers. It's quick & easy.

Streaming file IO and binary files

P: n/a
Hi,

Kindly excuse my novice question. In all the literature on ifstream
that I have seen, nowhere have I read what happens if you try to read
a binary file using the ">>" operator. I ran into the two problems
while trying to read a binary file.

1). All whitespace characters were skipped
2). Certain binary files gave a core dump

The problems went away when I used the read() member function on the
input file stream instead. Is this the right way to go about?

I was able to recreate my problem using simple sample source as below:

Thanks,
Saleem


#include <iostream>
#include <fstream>

using namespace std;
main(int argc, char* argv[])
{
if(argc < 2)
{
cerr << "usage: " << argv[0] << " <input-file>\n";
return 1;
}

ifstream ifs(argv[1], ios::in|ios::binary);

char ch;
size_t bytesRead = 0;
while(ifs)
{
ifs >ch;
//ifs.read(&ch, 1);
bytesRead ++;
}

cout << "Successfully read " << bytesRead << " bytes\n";
return 0;
}

Jul 25 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
On 25 ec, 08:51, masood.iq...@lycos.com wrote:
Hi,

Kindly excuse my novice question. In all the literature on ifstream
that I have seen, nowhere have I read what happens if you try to read
a binary file using the ">>" operator. I ran into the two problems
while trying to read a binary file.

1). All whitespace characters were skipped
2). Certain binary files gave a core dump

The problems went away when I used the read() member function on the
input file stream instead. Is this the right way to go about?

I was able to recreate my problem using simple sample source as below:

Thanks,
Saleem

#include <iostream>
#include <fstream>

using namespace std;

main(int argc, char* argv[])
{
if(argc < 2)
{
cerr << "usage: " << argv[0] << " <input-file>\n";
return 1;
}

ifstream ifs(argv[1], ios::in|ios::binary);

char ch;
size_t bytesRead = 0;
while(ifs)
{
ifs >ch;
//ifs.read(&ch, 1);
bytesRead ++;
}

cout << "Successfully read " << bytesRead << " bytes\n";
return 0;

}
Hi.

read and write methods are methods you need, so you're right with your
question.

Data are read from/written to binary files as they are. There is no
change. In text files may be done some conversions. It depends on
paltform. For example on DOS/Windows platform it translates line
nedings to CRLF characters (and vice versa). As far as I know there
may be made more translation on some platforms, although I am not able
to give any example.

operator >is used for reading value in text form, however read
method is for reading of value in binary form (same applies to
operator>and write method). Usualy is better to use text form,
because it is more portable among different platforms.

Jul 25 '07 #2

P: n/a
On Jul 25, 8:51 am, masood.iq...@lycos.com wrote:
Kindly excuse my novice question. In all the literature on ifstream
that I have seen, nowhere have I read what happens if you try to read
a binary file using the ">>" operator. I ran into the two problems
while trying to read a binary file.
Attention. The ">>" operator means "parse the next characters
in the file into the target type, interpreting them as text".
More generally, the abstraction of istream is that of a
transparent stream of characters (not raw bytes). All binary
does is control the interface with the OS.
1). All whitespace characters were skipped
Did you reset the skipws flag? If not, that's what you asked it
to do.
2). Certain binary files gave a core dump
Then there's a bug in your library. Good code never core dumps,
regardless of input. (Of course, the bug may simply be that you
forgot to replace the new handler, and aren't catching
bad_alloc. If you're reading into a string, for example, and
the input data contains a couple of GB without any white space,
something is going to give.)
The problems went away when I used the read() member function on the
input file stream instead. Is this the right way to go about?
It depends what you want to do. Read is good when you know you
have a block bytes of fixed size, with some special, possibly
non-text, format.
I was able to recreate my problem using simple sample source as below:
#include <iostream>
#include <fstream>
using namespace std;
main(int argc, char* argv[])
Just a nit, but "implicit int" was removed from C++ a long, long
time ago. This shouldn't compile without a return type for
main.
{
if(argc < 2)
{
cerr << "usage: " << argv[0] << " <input-file>\n";
return 1;
}
ifstream ifs(argv[1], ios::in|ios::binary);
char ch;
size_t bytesRead = 0;
while(ifs)
{
ifs >ch;
//ifs.read(&ch, 1);
bytesRead ++;
}
If the above loop ever core dumps, you should file a bug report.

If you want to just read characters, I'd use get:

while ( ifs.get( ch ) ) {
++ bytesRead ;
}

(Note too that your loop also counts one too many. For an empty
file, for example, it will count 1.)

Read is really for buffers which you will later unformat
yourself.
cout << "Successfully read " << bytesRead << " bytes\n";
return 0;

}
--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jul 25 '07 #3

P: n/a
On Jul 25, 9:06 am, Ondra Holub <ondra.ho...@post.czwrote:
On 25 ec, 08:51, masood.iq...@lycos.com wrote:
Data are read from/written to binary files as they are. There
is no change. In text files may be done some conversions.
It's a bit more subtle than that. Especially with the standard
streams, which do code translation using the codecvt facet of
the imbued locale regardless of whether the file is binary or
text.

Text mode only guarantees textual integrity: you're only
guaranteed to read what you've written if what you've written
consisted only in printable characters, and even then, there are
exceptions. (You're not guaranteed to be able to read trailing
white space, for example. And it's not specified what happens
if the last character written wasn't a '\n'.) On the other
hand, you're guaranteed that a '\n' will result in whatever the
system normally uses as a line separator (e.g. the two character
sequence 0x0D, 0x0A under Windows, or a new record on systems
with record oriented files). And that there are no extra
characters at the end. Also, you can only seek in a limited
number of cases. In binary mode, you'll also get the bytes you
wrote. All of them, not just printable characters. You can
legally write anything, and will reread exactly what you have
written; '\n' will result in one byte being written, with
whatever the encoding of '\n' is on your system. And you can
seek anyway. But you might read extra 0's that you didn't write
at the end of the file.

Also, on some systems, files written in text mode cannot be read
in binary, and vice versa.
It depends on paltform. For example on DOS/Windows platform it
translates line nedings to CRLF characters (and vice versa).
As far as I know there may be made more translation on some
platforms, although I am not able to give any example.
Even on DOS/Windows, a 0x1A in a text input stream is treated as
EOF, and you won't see anything else after it.
operator >is used for reading value in text form, however read
method is for reading of value in binary form (same applies to
operator>and write method).
Operator >formats, as text. Regardless of file mode. Read
extracts char's from the stream, regardless of file mode. I
regularly use >on files opened in binary mode, and there are
cases where it is reasonable to use read on files opened in text
mode.
Usualy is better to use text form,
because it is more portable among different platforms.
It depends what you mean by "portable". If you're writing files
to be read on the same system, or reading files that were
written as text on the same system, text mode gives you a larger
degree of source code portability; a new line will always be the
single character '\n', regardless of how it is represented on
the system. If you're writing files that will be read by many
different systems, you should define a "portable" format for
them, which most likely will require that they be accessed in
binary mode.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jul 25 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.