473,396 Members | 2,010 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

peek() and tellg()

Is the any reason according to the standard that calling tellg() on an
std::ifstream after a call to peek() could place the filebuf in an
inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and I've
reported to them as such), but the following code

std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::wifstream ifs("test.txt");
std::wcout << wchar_t(ifs.peek());
ifs.tellg();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << std::endl;

Prints out 00246, when I would expect 00123. Remove the tellg() (or
move it to after a get) and it prints exaclty that.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 28 '05 #1
9 5413
<wi******@hotmail.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
Is the any reason according to the standard that calling tellg() on an
std::ifstream after a call to peek() could place the filebuf in an
inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and I've
reported to them as such), but the following code

std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::wifstream ifs("test.txt");
std::wcout << wchar_t(ifs.peek());
ifs.tellg();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << std::endl;

Prints out 00246, when I would expect 00123. Remove the tellg() (or
move it to after a get) and it prints exaclty that.


Actually, I would expect 01234, and that's what our latest
library gives, both in our shipped product and VC++ V8
(Whidbey) which is soon to be formally released. V7.0
and earlier "fail" in a different way than V7.0.

I put "fail" in quotes because the above code is asking
for trouble. First, it writes a text line with no
terminating newline. That's not a problem here, but it
generally causes trouble. More important, it mixes two
different ways of accessing a stream:

-- as a one-pass input stream with limited pushback

-- as a random-access sequence with bookmarks

It has been known for decades that trying to access
the same stream both ways is fraught with peril.
Whether you call the resulting surprising behavior
buggy or regrettable is a matter of taste.

The biggest stress point in the code above is the
initial peek followed by a tell. It's hard enough
pushing back a character and still generating a
proper seek offset; if you push back a character
at the beginning of a file it's way harder to get
"right". The C I/O model, which underlies C++,
permits the implementation to discard any pushed
back characters when determining a seek offset.
That's why we read the "0" only once. It may still
not be what you want, but I believe that it's
defensible.

FWIW, you'll find this code terribly nonportable.
Other Standard C++ library implementations go off
in all sorts of interesting directions in this
area. If you want robust code, don't mix peek
and seek/tell.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 28 '05 #2
In article <11**********************@o13g2000cwo.googlegroups .com>,
wi******@hotmail.com wrote:
Is the any reason according to the standard that calling tellg() on an
std::ifstream after a call to peek() could place the filebuf in an
inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and I've
reported to them as such), but the following code

std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::wifstream ifs("test.txt");
std::wcout << wchar_t(ifs.peek());
ifs.tellg();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << std::endl;

Prints out 00246, when I would expect 00123. Remove the tellg() (or
move it to after a get) and it prints exaclty that.


Fwiw, the CodeWarrior/Freescale implementation outputs:

0101234567

The reason it looks so strange is because you create the file with
narrow char, and then read it back with wide wchar_t's. The default
wide character "encoding" for this product is to I/O all bytes of the
wchar_t in native byte order. If I change your example to read the file
back in as narrow characters:

std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::ifstream ifs("test.txt");
std::cout << char(ifs.peek());
ifs.tellg();
std::cout << char(ifs.peek()); ifs.get();
std::cout << char(ifs.peek()); ifs.get();
std::cout << char(ifs.peek()); ifs.get();
std::cout << char(ifs.peek()); ifs.get();
std::cout << std::endl;

the output is then:

00123

-Howard
Sep 28 '05 #3
Ok, but what about if you create the file using wofstream? I know the
problem doesn't happen with narrow streams, and I'm not terribly
concerned about narrow streams. In actual fact I'm using my own UTF8
codecvt, but that seems not to be the problem.

Sep 28 '05 #4
In article <11**********************@g49g2000cwa.googlegroups .com>,
wi******@hotmail.com wrote:
Ok, but what about if you create the file using wofstream? I know the
problem doesn't happen with narrow streams, and I'm not terribly
concerned about narrow streams. In actual fact I'm using my own UTF8
codecvt, but that seems not to be the problem.


This:

#include <iostream>
#include <fstream>

int main()
{
std::wofstream ofs("test.txt");
ofs << L"0123456789";
ofs.close();
std::wifstream ifs("test.txt");
std::wcout << char(ifs.peek());
ifs.tellg();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << std::endl;
}

Outputs:

00123

for me.

-Howard
Sep 29 '05 #5
Howard Hinnant wrote:
In article <11**********************@g49g2000cwa.googlegroups .com>,
wi******@hotmail.com wrote:
Ok, but what about if you create the file using wofstream? I know the
problem doesn't happen with narrow streams, and I'm not terribly
concerned about narrow streams. In actual fact I'm using my own UTF8
codecvt, but that seems not to be the problem.
This:

std::wofstream ofs("test.txt");
ofs << L"0123456789";
ofs.close();

<snip> std::wcout << std::endl;

Outputs:

00123

for me.

Ok, well that's exactly what I would expect.
Quite surprised that CodeWarrior defaults to storing the file as wide
characters though - is it big or little endian, 16-bit or 32-bit
depending on the platform? Does it have any simple ways of controlling
this?
I assume then there's no requirements in the standard regarding the
default behaviour when converting wchar_t's to char's for file output.
I pretty much always use UTF8 these days: at least until Chinese
becomes the lingua france of cyberspace it seems about the best choice!

Sep 29 '05 #6
In article <11********************@g14g2000cwa.googlegroups.c om>,
wi******@hotmail.com wrote:
Quite surprised that CodeWarrior defaults to storing the file as wide
characters though - is it big or little endian, 16-bit or 32-bit
depending on the platform?
Yes. That is, it is whatever wchar_t/platform is (an image on disk of
what was in memory).
Does it have any simple ways of controlling
this?
I assume then there's no requirements in the standard regarding the
default behaviour when converting wchar_t's to char's for file output.
I pretty much always use UTF8 these days: at least until Chinese
becomes the lingua france of cyberspace it seems about the best choice!


You can of course write your own codecvt and install/imbue it into your
streams. It also comes with several prewritten codecvt's, including one
for UTF8.

std::__ucs_2
std::__jis
std::__shift_jis
std::__euc
std::__utf_8

All of these prewritten codecvt's are templated on the internal
character type and will work with either 16 or 32 bit internal character
types - which need not be a wchar_t.

These can simply be picked up and installed the same way you would
install your own codecvt. There is also a locale data file format that
can be used to control which prewritten codecvt gets installed into a
locale.

You are correct in your assumption about the requirements on the default
encoding scheme. It is supposed to be whatever is appropriate for the
vendor's customers on a given platform.

For better or worse I decided years ago that storing the whole wchar_t
on disk, unencoded, was the most obvious thing to do, and thus a good
default. At the time I was aware of the popular "drop the high byte(s)"
encoding but the resultant loss of information made me nervous. I also
worked with the standards committee to try to ensure that a "don't
encode" default behavior was conforming.

http://www.open-std.org/jtc1/sc22/wg...fects.html#305

Fyi, CodeWarrior for Windows is no longer for sale, at least not from
Metrowerks/Freescale.

-Howard
Sep 29 '05 #7
wi******@hotmail.com wrote:
Is the any reason according to the standard that calling
tellg() on an std::ifstream after a call to peek() could place
the filebuf in an inconsistent state? I think it's a bug in the VC7 dinkumware implementation (and
I've reported to them as such), but the following code std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::wifstream ifs("test.txt");
Careful. You're reading a file written with narrow characters
as if it contained wide characters. Any results will depend on
the locale; whether they're useful or sensible is almost pure
luck.

On most modern machines, narrow characters use an encoding in
which all of the characters in the basic character set are
ASCII. If this is the case, imbuing a UTF-8 locale should allow
reading them correctly. Still, IMHO, if you want the file to
contain UTF-8, you should write it with an wofstream imbued with
a UTF-8 locale.

The "C" locale depends on a lot of things; I don't think the
standard actually says what it should be in this case. And of
course, most programs will have done a std::locale::globale(
std::locale( "" ) ) as the first thing in main; under Unix, at
least, this sets the locale to a value determined by environment
variables.

Off hand, from what little I know of Windows, I would expect the
default to use UTF-16LE, not UTF-8. In which case, you're likely
to get some very strange results: letters from strange
alphabets, or illegal characters.
std::wcout << wchar_t(ifs.peek());
ifs.tellg();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << wchar_t(ifs.peek()); ifs.get();
std::wcout << std::endl; Prints out 00246, when I would expect 00123. Remove the
tellg() (or move it to after a get) and it prints exaclty
that.


I'm not sure what effect the tellg() has -- that part seems
strange. But for the rest, I'd say that you're playing with
undefined, or poorly defined behavior. (Not necessarily
undefined behavior in the sense of the standard, but in the
sense that there really isn't any requirements as to what the
results should be.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Sep 29 '05 #8

kanze wrote:
wi******@hotmail.com wrote:
Is the any reason according to the standard that calling
tellg() on an std::ifstream after a call to peek() could place
the filebuf in an inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and
I've reported to them as such), but the following code

std::ofstream ofs("test.txt");
ofs << "0123456789";
ofs.close();
std::wifstream ifs("test.txt");


Careful. You're reading a file written with narrow characters
as if it contained wide characters. Any results will depend on
the locale; whether they're useful or sensible is almost pure
luck.


Yes, I should have provided the example using std::wofstream (and
L"0123456789"). Exactly the same problem occurs. (But not when using
narrow streams for both in and output).

Off hand, from what little I know of Windows, I would expect the
default to use UTF-16LE, not UTF-8. In which case, you're likely
to get some very strange results: letters from strange
alphabets, or illegal characters.


The Dinkumware implementation, and indeed I think all the others I've
used default to simply converting wchar_t's to char's (thus any values
over 255 cannot be stored).
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Oct 1 '05 #9
<wi******@hotmail.com> wrote in message
news:11*********************@g49g2000cwa.googlegro ups.com...
Off hand, from what little I know of Windows, I would expect the
default to use UTF-16LE, not UTF-8. In which case, you're likely
to get some very strange results: letters from strange
alphabets, or illegal characters.


The Dinkumware implementation, and indeed I think all the others I've
used default to simply converting wchar_t's to char's (thus any values
over 255 cannot be stored).


Yes, we use the same conversions as the Standard C library by default.
But we also supply just about every conversion you can imagine for
C++ with our codecvt library in our CoreX product.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Oct 2 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Fred Ma | last post by:
I'm using the stringstreams to get the numerical values of string tokens (the strings result from tokenizing a line of input elsewhere): #include <iostream> #include <sstream> using namespace...
6
by: Eivind Grimsby Haarr | last post by:
I know that this has been posted before on several other newsgroups, but I need to make sure I got this right, so I hope you can forgive me for posting this. In MVSC6.0, and also in several...
16
by: kate | last post by:
salve. per favore rispondete alla mia domanda: Come faccio a ottenere le dimensioni di un file?(con c/c++) risp presto grazie
7
by: Hamburgpear | last post by:
Dear All, Is it possible to reset the value of xxx.peek() after it reachs EOF ? Regards HP
0
by: Chris | last post by:
I am reading in image files in a program and I read in the header in ascii mode. The problem is, sometimes tellg () gives me a completely incorrect result and sometimes it is just fine. Here is...
1
by: Chris | last post by:
I am reading in image files in a program and I read in the header in ascii mode and the data in binary mode. The problem is, sometimes tellg() gives me a completely incorrect result and sometimes...
12
by: toton | last post by:
Hi, I am reading a big file , and need to have a flag for current file position so that I can store the positions for later direct access. However it looks tellg is a very costly function ! But...
14
by: Bob Nelson | last post by:
After completing a good book on C (KNK's 2nd edition), I dusted off an oldie for a good laugh or two. It's Traister's ``Mastering C Pointers'' and I am aware of just how bad this book it. See my...
2
by: Terry Reedy | last post by:
Luis Zarrabeitia wrote: Interesting observation. Iterators are intended for 'iterate through once and discard' usages. To zip a long sequence with several short sequences, either use...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.