peek() and tellg()

wizofaus

Is the any reason according to the standard that calling tellg() on an
std::ifstream after a call to peek() could place the filebuf in an
inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and I've
reported to them as such), but the following code

std::ofstream ofs("test.txt") ;
ofs << "0123456789 ";
ofs.close();
std::wifstream ifs("test.txt") ;
std::wcout << wchar_t(ifs.pee k());
ifs.tellg();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << std::endl;

Prints out 00246, when I would expect 00123. Remove the tellg() (or
move it to after a get) and it prints exaclty that.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]

Sep 28 '05 #1

Subscribe Reply

5446

P.J. Plauger

<wi******@hotma il.com> wrote in message
news:11******** **************@ o13g2000cwo.goo glegroups.com.. .

Is the any reason according to the standard that calling tellg() on an
std::ifstream after a call to peek() could place the filebuf in an
inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and I've
reported to them as such), but the following code

std::ofstream ofs("test.txt") ;
ofs << "0123456789 ";
ofs.close();
std::wifstream ifs("test.txt") ;
std::wcout << wchar_t(ifs.pee k());
ifs.tellg();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << std::endl;

Prints out 00246, when I would expect 00123. Remove the tellg() (or
move it to after a get) and it prints exaclty that.

Actually, I would expect 01234, and that's what our latest
library gives, both in our shipped product and VC++ V8
(Whidbey) which is soon to be formally released. V7.0
and earlier "fail" in a different way than V7.0.

I put "fail" in quotes because the above code is asking
for trouble. First, it writes a text line with no
terminating newline. That's not a problem here, but it
generally causes trouble. More important, it mixes two
different ways of accessing a stream:

-- as a one-pass input stream with limited pushback

-- as a random-access sequence with bookmarks

It has been known for decades that trying to access
the same stream both ways is fraught with peril.
Whether you call the resulting surprising behavior
buggy or regrettable is a matter of taste.

The biggest stress point in the code above is the
initial peek followed by a tell. It's hard enough
pushing back a character and still generating a
proper seek offset; if you push back a character
at the beginning of a file it's way harder to get
"right". The C I/O model, which underlies C++,
permits the implementation to discard any pushed
back characters when determining a seek offset.
That's why we read the "0" only once. It may still
not be what you want, but I believe that it's
defensible.

FWIW, you'll find this code terribly nonportable.
Other Standard C++ library implementations go off
in all sorts of interesting directions in this
area. If you want robust code, don't mix peek
and seek/tell.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]

Sep 28 '05 #2

Howard Hinnant

In article <11************ **********@o13g 2000cwo.googleg roups.com>,
wi******@hotmai l.com wrote:

Is the any reason according to the standard that calling tellg() on an
std::ifstream after a call to peek() could place the filebuf in an
inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and I've
reported to them as such), but the following code

std::ofstream ofs("test.txt") ;
ofs << "0123456789 ";
ofs.close();
std::wifstream ifs("test.txt") ;
std::wcout << wchar_t(ifs.pee k());
ifs.tellg();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << std::endl;

Prints out 00246, when I would expect 00123. Remove the tellg() (or
move it to after a get) and it prints exaclty that.

Fwiw, the CodeWarrior/Freescale implementation outputs:

0101234567

The reason it looks so strange is because you create the file with
narrow char, and then read it back with wide wchar_t's. The default
wide character "encoding" for this product is to I/O all bytes of the
wchar_t in native byte order. If I change your example to read the file
back in as narrow characters:

std::ofstream ofs("test.txt") ;
ofs << "0123456789 ";
ofs.close();
std::ifstream ifs("test.txt") ;
std::cout << char(ifs.peek() );
ifs.tellg();
std::cout << char(ifs.peek() ); ifs.get();
std::cout << char(ifs.peek() ); ifs.get();
std::cout << char(ifs.peek() ); ifs.get();
std::cout << char(ifs.peek() ); ifs.get();
std::cout << std::endl;

the output is then:

00123

-Howard

Sep 28 '05 #3

wizofaus

Ok, but what about if you create the file using wofstream? I know the
problem doesn't happen with narrow streams, and I'm not terribly
concerned about narrow streams. In actual fact I'm using my own UTF8
codecvt, but that seems not to be the problem.

Sep 28 '05 #4

Howard Hinnant

In article <11************ **********@g49g 2000cwa.googleg roups.com>,
wi******@hotmai l.com wrote:

Ok, but what about if you create the file using wofstream? I know the
problem doesn't happen with narrow streams, and I'm not terribly
concerned about narrow streams. In actual fact I'm using my own UTF8
codecvt, but that seems not to be the problem.

This:

#include <iostream>
#include <fstream>

int main()
{
std::wofstream ofs("test.txt") ;
ofs << L"0123456789 ";
ofs.close();
std::wifstream ifs("test.txt") ;
std::wcout << char(ifs.peek() );
ifs.tellg();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << std::endl;
}

Outputs:

00123

for me.

-Howard

Sep 29 '05 #5

wizofaus

Howard Hinnant wrote:

In article <11************ **********@g49g 2000cwa.googleg roups.com>,
wi******@hotmai l.com wrote:
Ok, but what about if you create the file using wofstream? I know the
problem doesn't happen with narrow streams, and I'm not terribly
concerned about narrow streams. In actual fact I'm using my own UTF8
codecvt, but that seems not to be the problem.
This:

std::wofstream ofs("test.txt") ;
ofs << L"0123456789 ";
ofs.close();

<snip> std::wcout << std::endl;

Outputs:

00123

for me.

Ok, well that's exactly what I would expect.
Quite surprised that CodeWarrior defaults to storing the file as wide
characters though - is it big or little endian, 16-bit or 32-bit
depending on the platform? Does it have any simple ways of controlling
this?
I assume then there's no requirements in the standard regarding the
default behaviour when converting wchar_t's to char's for file output.
I pretty much always use UTF8 these days: at least until Chinese
becomes the lingua france of cyberspace it seems about the best choice!

Sep 29 '05 #6

Howard Hinnant

In article <11************ ********@g14g20 00cwa.googlegro ups.com>,
wi******@hotmai l.com wrote:

Quite surprised that CodeWarrior defaults to storing the file as wide
characters though - is it big or little endian, 16-bit or 32-bit
depending on the platform?
Yes. That is, it is whatever wchar_t/platform is (an image on disk of
what was in memory).
Does it have any simple ways of controlling
this?
I assume then there's no requirements in the standard regarding the
default behaviour when converting wchar_t's to char's for file output.
I pretty much always use UTF8 these days: at least until Chinese
becomes the lingua france of cyberspace it seems about the best choice!

You can of course write your own codecvt and install/imbue it into your
streams. It also comes with several prewritten codecvt's, including one
for UTF8.

std::__ucs_2
std::__jis
std::__shift_ji s
std::__euc
std::__utf_8

All of these prewritten codecvt's are templated on the internal
character type and will work with either 16 or 32 bit internal character
types - which need not be a wchar_t.

These can simply be picked up and installed the same way you would
install your own codecvt. There is also a locale data file format that
can be used to control which prewritten codecvt gets installed into a
locale.

You are correct in your assumption about the requirements on the default
encoding scheme. It is supposed to be whatever is appropriate for the
vendor's customers on a given platform.

For better or worse I decided years ago that storing the whole wchar_t
on disk, unencoded, was the most obvious thing to do, and thus a good
default. At the time I was aware of the popular "drop the high byte(s)"
encoding but the resultant loss of information made me nervous. I also
worked with the standards committee to try to ensure that a "don't
encode" default behavior was conforming.

http://www.open-std.org/jtc1/sc22/wg...fects.html#305

Fyi, CodeWarrior for Windows is no longer for sale, at least not from
Metrowerks/Freescale.

-Howard

Sep 29 '05 #7

kanze

wi******@hotmai l.com wrote:

Is the any reason according to the standard that calling
tellg() on an std::ifstream after a call to peek() could place
the filebuf in an inconsistent state? I think it's a bug in the VC7 dinkumware implementation (and
I've reported to them as such), but the following code std::ofstream ofs("test.txt") ;
ofs << "0123456789 ";
ofs.close();
std::wifstream ifs("test.txt") ;
Careful. You're reading a file written with narrow characters
as if it contained wide characters. Any results will depend on
the locale; whether they're useful or sensible is almost pure
luck.

On most modern machines, narrow characters use an encoding in
which all of the characters in the basic character set are
ASCII. If this is the case, imbuing a UTF-8 locale should allow
reading them correctly. Still, IMHO, if you want the file to
contain UTF-8, you should write it with an wofstream imbued with
a UTF-8 locale.

The "C" locale depends on a lot of things; I don't think the
standard actually says what it should be in this case. And of
course, most programs will have done a std::locale::gl obale(
std::locale( "" ) ) as the first thing in main; under Unix, at
least, this sets the locale to a value determined by environment
variables.

Off hand, from what little I know of Windows, I would expect the
default to use UTF-16LE, not UTF-8. In which case, you're likely
to get some very strange results: letters from strange
alphabets, or illegal characters.
std::wcout << wchar_t(ifs.pee k());
ifs.tellg();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << wchar_t(ifs.pee k()); ifs.get();
std::wcout << std::endl; Prints out 00246, when I would expect 00123. Remove the
tellg() (or move it to after a get) and it prints exaclty
that.

I'm not sure what effect the tellg() has -- that part seems
strange. But for the rest, I'd say that you're playing with
undefined, or poorly defined behavior. (Not necessarily
undefined behavior in the sense of the standard, but in the
sense that there really isn't any requirements as to what the
results should be.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]

Sep 29 '05 #8

wizofaus

kanze wrote:

wi******@hotmai l.com wrote:
Is the any reason according to the standard that calling
tellg() on an std::ifstream after a call to peek() could place
the filebuf in an inconsistent state?
I think it's a bug in the VC7 dinkumware implementation (and
I've reported to them as such), but the following code

std::ofstream ofs("test.txt") ;
ofs << "0123456789 ";
ofs.close();
std::wifstream ifs("test.txt") ;

Careful. You're reading a file written with narrow characters
as if it contained wide characters. Any results will depend on
the locale; whether they're useful or sensible is almost pure
luck.

Yes, I should have provided the example using std::wofstream (and
L"0123456789 "). Exactly the same problem occurs. (But not when using
narrow streams for both in and output).

Off hand, from what little I know of Windows, I would expect the
default to use UTF-16LE, not UTF-8. In which case, you're likely
to get some very strange results: letters from strange
alphabets, or illegal characters.

The Dinkumware implementation, and indeed I think all the others I've
used default to simply converting wchar_t's to char's (thus any values
over 255 cannot be stored).
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]

Oct 1 '05 #9

P.J. Plauger

<wi******@hotma il.com> wrote in message
news:11******** *************@g 49g2000cwa.goog legroups.com...

Off hand, from what little I know of Windows, I would expect the
default to use UTF-16LE, not UTF-8. In which case, you're likely
to get some very strange results: letters from strange
alphabets, or illegal characters.

The Dinkumware implementation, and indeed I think all the others I've
used default to simply converting wchar_t's to char's (thus any values
over 255 cannot be stored).

Yes, we use the same conversions as the Standard C library by default.
But we also supply just about every conversion you can imagine for
C++ with our codecvt library in our CoreX product.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]

Oct 2 '05 #10

Similar topics

2694

Interpret istringstream::tellg as character position?

by: Fred Ma | last post by:

I'm using the stringstreams to get the numerical values of string tokens (the strings result from tokenizing a line of input elsewhere): #include <iostream> #include <sstream> using namespace std; int main(void) { istringstream iss("1.23DOG");

C / C++

8346

The tellg bug

by: Eivind Grimsby Haarr | last post by:

I know that this has been posted before on several other newsgroups, but I need to make sure I got this right, so I hope you can forgive me for posting this. In MVSC6.0, and also in several Borland c++ compilers from what I can see from newsgroup postings, ifstream::tellg() alters the position of the file reading pointer when reading UNIX files (only LF character, not CRLF) in text mode. I can see why it does this, keeping consistency...

C / C++

667

sizeof

by: kate | last post by:

salve. per favore rispondete alla mia domanda: Come faccio a ottenere le dimensioni di un file?(con c/c++) risp presto grazie

C / C++

5633

Reset ifstream peek value?

by: Hamburgpear | last post by:

Dear All, Is it possible to reset the value of xxx.peek() after it reachs EOF ? Regards HP

C / C++

1796

std::ifstream::tellg() error

by: Chris | last post by:

I am reading in image files in a program and I read in the header in ascii mode. The problem is, sometimes tellg () gives me a completely incorrect result and sometimes it is just fine. Here is an example: char input; std::ifstream fin("blocks.pgm", std::ios::in); std::ifstream::pos_type position = fin.tellg(); fin>>input; position = fin.tellg();

.NET Framework

4353

ifstream::tellg() error

by: Chris | last post by:

I am reading in image files in a program and I read in the header in ascii mode and the data in binary mode. The problem is, sometimes tellg() gives me a completely incorrect result and sometimes it is just fine. It is quite annoying because there is no other good way to read these file and I think it is a problem with Visual C++, since it works just fine when I compile it using g++ on a Linux system. Here is an example: char input;

.NET Framework

4796

complexity for tellg()

by: toton | last post by:

Hi, I am reading a big file , and need to have a flag for current file position so that I can store the positions for later direct access. However it looks tellg is a very costly function ! But it's code says it should just return the current buffer position , thus should be a very low cost function. To explain, { boost::progress_timer t; std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");

C / C++

5290

C history - peek()?

by: Bob Nelson | last post by:

After completing a good book on C (KNK's 2nd edition), I dusted off an oldie for a good laugh or two. It's Traister's ``Mastering C Pointers'' and I am aware of just how bad this book it. See my posting from 10 years ago here in c.l.c. Knowing the dubious quality of the book, what he writes on page 78 may be fiction (but there's a chance it may have been true): ``The original C programming language contained a peek() function ''

C / C++

2290

Re: Peek inside iterator (is there a PEP about this?)

by: Terry Reedy | last post by:

Luis Zarrabeitia wrote: Interesting observation. Iterators are intended for 'iterate through once and discard' usages. To zip a long sequence with several short sequences, either use itertools.chain(short sequences) or put the short sequences as the first zip arg. To test without consuming, wrap the iterator in a trivial-to-write one_ahead or peek class such as has been posted before.

Python

9480

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10313

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10147

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9946

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8968

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7494

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5511

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3643

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2875

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General