473,383 Members | 1,822 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

The tellg bug


I know that this has been posted before on several other newsgroups, but I
need to make sure I got this right, so I hope you can forgive me for
posting this.

In MVSC6.0, and also in several Borland c++ compilers from what I can see
from newsgroup postings, ifstream::tellg() alters the position of the file
reading pointer when reading UNIX files (only LF character, not CRLF) in
text mode. I can see why it does this, keeping consistency while treating
CRLF as a single character.

Using subsequent getline(...)-calls, no problems arises, but once I need
to save a position with tellg, to be able to seek back to this position
with seekg later, problems arises if the file accidentially has been
converted to UNIX LF-format. I know I can solve this by opening the file
in binary mode, but then I have to write my own code handling the
reading of lines and different newline characters.

My questions are:
* Is this compiler-dependent, or a general problem with text-mode file
reading? Does the standard specify anything about this?
* Is it impossible to write a program using only standard library
functions, that handles tellg/seekg positioning with both UNIX/DOS files
in text mode? (Not to mention Mac-files...)

I know I'm not the first one that has encountered this problem, so I would
expect that somewhere someone has solved this before...

Finally, another question: Do anyone know about a good online
tutorial/reference for Windows programming with C++? Or can
someone alternatively tell me which newsgroup I rather should have posted
that question to...
- Eivind Grimsby Haarr

"Trying is the first step towards failure."
- Homer Simpson
Jul 22 '05 #1
6 8322

"Eivind Grimsby Haarr" <ha***@stud.ntnu.no> wrote in message
news:Pi*******************************@leopard.stu d.ntnu.no...

I know that this has been posted before on several other newsgroups, but I
need to make sure I got this right, so I hope you can forgive me for
posting this.

In MVSC6.0, and also in several Borland c++ compilers from what I can see
from newsgroup postings, ifstream::tellg() alters the position of the file
reading pointer when reading UNIX files (only LF character, not CRLF) in
text mode. I can see why it does this, keeping consistency while treating
CRLF as a single character.

Using subsequent getline(...)-calls, no problems arises, but once I need
to save a position with tellg, to be able to seek back to this position
with seekg later, problems arises if the file accidentially has been
converted to UNIX LF-format. I know I can solve this by opening the file
in binary mode, but then I have to write my own code handling the
reading of lines and different newline characters.

My questions are:
* Is this compiler-dependent, or a general problem with text-mode file
reading? Does the standard specify anything about this?
* Is it impossible to write a program using only standard library
functions, that handles tellg/seekg positioning with both UNIX/DOS files
in text mode? (Not to mention Mac-files...)

I know I'm not the first one that has encountered this problem, so I would
expect that somewhere someone has solved this before...
Since I have little experience with 'tellg()', I'll let
someone else address that issue.
Finally, another question: Do anyone know about a good online
tutorial/reference for Windows programming with C++?
I like the tutorials at www.relisoft.com
YMMV. In any case, I'd recommend going through the Petzold book
(5th edition) first (which uses C) for learning the fundamentals.
Or can
someone alternatively tell me which newsgroup I rather should have posted
that question to...


Good advice r.e. Windows programming is available at newsgroup
comp.os.ms-windows.programmer.win32

-Mike
Jul 22 '05 #2

"Eivind Grimsby Haarr" <ha***@stud.ntnu.no> wrote in message
news:Pi*******************************@leopard.stu d.ntnu.no...

I know that this has been posted before on several other newsgroups, but I
need to make sure I got this right, so I hope you can forgive me for
posting this.

In MVSC6.0, and also in several Borland c++ compilers from what I can see
from newsgroup postings, ifstream::tellg() alters the position of the file
reading pointer when reading UNIX files (only LF character, not CRLF) in
text mode. I can see why it does this, keeping consistency while treating
CRLF as a single character.

Using subsequent getline(...)-calls, no problems arises, but once I need
to save a position with tellg, to be able to seek back to this position
with seekg later, problems arises if the file accidentially has been
converted to UNIX LF-format. I know I can solve this by opening the file
in binary mode, but then I have to write my own code handling the
reading of lines and different newline characters.

My questions are:
* Is this compiler-dependent, or a general problem with text-mode file
reading? Does the standard specify anything about this?
The standard specfies that if you open a file in text mode then only four
versions of seekg are going to work.

1) Seek to the start of a file
2) Seek to the end of a file
3) Seek to the current position
4) Seek to a position previously saved with tellg.

This last one seems to be the one you are interested in. Although I don't
get the bit about 'accidentally converted to UNIX LF-format'. If you're
writing the program you should be able to stop anything being accidentally
converted.

One some systems with some compilers you may get other possibilites to work,
but these are the only ones guaranteed by the standard.
* Is it impossible to write a program using only standard library
functions, that handles tellg/seekg positioning with both UNIX/DOS files
in text mode? (Not to mention Mac-files...)


It's prefectly possible provided you stick to the four possibilites above.

john
Jul 22 '05 #3

I can see I did not explain the problem thoroughly enough in the previous
posting.

The problem arises when reading a UNIX text file, where line feeds are
represented by the line feed character (one byte, '\n' or LF) only. In
DOS text files, the line feeds are represented by two characters ("\r\n",
carriage return and line feed).

An example:

If I have a file in UNIX text format, whith line feed represented by a
single character, e.g:

Line 1 in file\n
Line 2 in file\n
Line 3 in file

Using this code:

--------------

std::ifstream fstrm("filename.txt");
std::ios::pos_type tellg_result(0);
std::string str("");

// Save position in file before reading the line
tellg_result = fstrm.tellg();
getline(fstrm, str);
std::cout << str << std::endl;
// Save position again
tellg_result = fstrm.tellg();
getline(fstrm, str);
std::cout << str << std::endl;

--------------

This code would output:
Line 1 in file
ine 2 in file

Without the calls to tellg(), the ouput would be correct, similar to
the file. Since the stream expects line feed to consist of two characters,
tellg() actually moves the internal file pointer one byte when
encountering the UNIX type single line feed character.

Usually, somewhere internally in the stream classes, the two-character
line-feed in DOS files is converted to the single line feed character '\n'
when writing and reading. I guess this is done for portability, and it
also suggests that it should be possible to enable/disable this feature.

I'm reading a big set of text files that is shared on the net among many
users, and it often occurs that the files are converted to and from UNIX
and DOS formats, some files ending up in UNIX format on my Windows system.
It seems very bothersome to have to write my own binary mode
read-functions, especially since I want my classes to be general-purpose,
accepting only an istream-reference, leaving to the client to open the
file. Without knowing if the istream is an ifstream or something else, it
is impossible to test whether it is opened in binary mode or text mode.
(Or is it?)

I hope this made more sense, and I appreciate feedback of any type.
-eivind

On Thu, 2 Sep 2004, John Harrison wrote:

"Eivind Grimsby Haarr" <ha***@stud.ntnu.no> wrote in message
news:Pi*******************************@leopard.stu d.ntnu.no...

I know that this has been posted before on several other newsgroups, but I
need to make sure I got this right, so I hope you can forgive me for
posting this.

In MVSC6.0, and also in several Borland c++ compilers from what I can see
from newsgroup postings, ifstream::tellg() alters the position of the file
reading pointer when reading UNIX files (only LF character, not CRLF) in
text mode. I can see why it does this, keeping consistency while treating
CRLF as a single character.

Using subsequent getline(...)-calls, no problems arises, but once I need
to save a position with tellg, to be able to seek back to this position
with seekg later, problems arises if the file accidentially has been
converted to UNIX LF-format. I know I can solve this by opening the file
in binary mode, but then I have to write my own code handling the
reading of lines and different newline characters.

My questions are:
* Is this compiler-dependent, or a general problem with text-mode file
reading? Does the standard specify anything about this?


The standard specfies that if you open a file in text mode then only four
versions of seekg are going to work.

1) Seek to the start of a file
2) Seek to the end of a file
3) Seek to the current position
4) Seek to a position previously saved with tellg.

This last one seems to be the one you are interested in. Although I don't
get the bit about 'accidentally converted to UNIX LF-format'. If you're
writing the program you should be able to stop anything being accidentally
converted.

One some systems with some compilers you may get other possibilites to work,
but these are the only ones guaranteed by the standard.
* Is it impossible to write a program using only standard library
functions, that handles tellg/seekg positioning with both UNIX/DOS files
in text mode? (Not to mention Mac-files...)


It's prefectly possible provided you stick to the four possibilites above.

john

Jul 22 '05 #4

"Eivind Grimsby Haarr" <ha***@stud.ntnu.no> wrote in message
news:Pi*******************************@leopard.stu d.ntnu.no...

I can see I did not explain the problem thoroughly enough in the previous
posting.

The problem arises when reading a UNIX text file, where line feeds are
represented by the line feed character (one byte, '\n' or LF) only. In
DOS text files, the line feeds are represented by two characters ("\r\n",
carriage return and line feed).

An example:

If I have a file in UNIX text format, whith line feed represented by a
single character, e.g:

Line 1 in file\n
Line 2 in file\n
Line 3 in file

Using this code:

--------------

std::ifstream fstrm("filename.txt");
std::ios::pos_type tellg_result(0);
std::string str("");

// Save position in file before reading the line
tellg_result = fstrm.tellg();
getline(fstrm, str);
std::cout << str << std::endl;
// Save position again
tellg_result = fstrm.tellg();
getline(fstrm, str);
std::cout << str << std::endl;

--------------

This code would output:
Line 1 in file
ine 2 in file

Without the calls to tellg(), the ouput would be correct, similar to
the file. Since the stream expects line feed to consist of two characters,
tellg() actually moves the internal file pointer one byte when
encountering the UNIX type single line feed character.
My compiler does not do that. Its smart enough to treat this case correctly.
However you have a file without correct line endings, which you are trying
to read as if it did have correct line endings, so I think all bets are off
and you shouldn't be too surprised that things don't work. So I'm not sure
I'd call this a bug but I'd certainly call it a deficiency in your library.

Usually, somewhere internally in the stream classes, the two-character
line-feed in DOS files is converted to the single line feed character '\n'
when writing and reading. I guess this is done for portability, and it
also suggests that it should be possible to enable/disable this feature.

That's correct (assuming that you are working on a DOS system of course).
And of course you disable it by opening the file in binary mode.
I'm reading a big set of text files that is shared on the net among many
users, and it often occurs that the files are converted to and from UNIX
and DOS formats, some files ending up in UNIX format on my Windows system.
It seems very bothersome to have to write my own binary mode
read-functions, especially since I want my classes to be general-purpose,
accepting only an istream-reference, leaving to the client to open the
file. Without knowing if the istream is an ifstream or something else, it
is impossible to test whether it is opened in binary mode or text mode.
(Or is it?)


It is impossible in standard C++.

I think you are going to have to write you own version of a getline routine.
One that can cope with different line ending styles and/or files open in
binary or text mode. It also wouldn't hurt to document to your clients that
they should open files in binary mode. You might also need to use a
different compiler and/or C++ library, I don't like the way yours is
behaving.

john
Jul 22 '05 #5
On Fri, 3 Sep 2004 06:47:13 -0400, "P.J. Plauger" <pj*@dinkumware.com>
wrote in comp.lang.c++:
"John Harrison" <jo*************@hotmail.com> wrote in message
news:2p************@uni-berlin.de...
I'm reading a big set of text files that is shared on the net among many
users, and it often occurs that the files are converted to and from UNIX
and DOS formats, some files ending up in UNIX format on my Windows system. It seems very bothersome to have to write my own binary mode
read-functions, especially since I want my classes to be general-purpose, accepting only an istream-reference, leaving to the client to open the
file. Without knowing if the istream is an ifstream or something else, it is impossible to test whether it is opened in binary mode or text mode.
(Or is it?)


It is impossible in standard C++.


Nonsense.
I think you are going to have to write you own version of a getline

routine.
One that can cope with different line ending styles and/or files open in
binary or text mode. It also wouldn't hurt to document to your clients

that
they should open files in binary mode. You might also need to use a
different compiler and/or C++ library, I don't like the way yours is
behaving.


Whoa, there. He's trying to deal with two kinds of "text" files:

1) those that end each line with CR/LF (standard DOS format)

2) those that end each line with LF (standard Unix format)

If he reads all files in binary mode, each will have an LF at the
end, which is the standard internal line terminator in C/C++
('\n'). Existing getline, etc. will work fine. The only issues I
see are:

1) Do any CRs at the end of lines matter, or can they just be carried
along? Worst case is you delete all CRs and hope that no text plays
overstrike games with embedded CRs.

2) Do you want to produce canonical (CR/LF terminated) output from
such arbitrary input? In that case CRs *do* matter and you have to
be sure to write new files in text mode.

No big deal.


I've had to deal with this quite a bit in communications routines in
the old days.

The simplest solution I found was to consider every '\r' as a newline.
Any '\n' immediately proceeded by a '\r' is ignored, any '\n'
proceeded by any other character is considered a newline.

Works quite well for '\r\n' (was CP/M in those days, MS-DOS wasn't
around yet), '\r' only (Apple and some others, the others mostly
defunct now), and Unix '\n' only. Even handled files produced by a
few perverse utilities on '\r\n' that would skip the '\r' on repeated
blank lines. That is:

line1
line2

line3

....would appear as:

"line1\r\nline2\r\n\nline3\n"

This would not correctly handle something that used '\n\r' to end
lines, but I knew of no such systems and never heard from any users
that ran into one.

In any case, this logic is quite simple to perform on files opened in
binary mode.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Jul 22 '05 #6
On Fri, 03 Sep 2004 23:50:45 -0500, Jack Klein wrote:
The simplest solution I found was to consider every '\r' as a newline. Any
'\n' immediately proceeded by a '\r' is ignored, any '\n' proceeded by any
other character is considered a newline.

Works quite well for '\r\n' (was CP/M in those days, MS-DOS wasn't around
yet), '\r' only (Apple and some others, the others mostly defunct now),
and Unix '\n' only. Even handled files produced by a few perverse
utilities on '\r\n' that would skip the '\r' on repeated blank lines.
That is:

line1
line2

line3

...would appear as:

"line1\r\nline2\r\n\nline3\n"


That's only perverse if you're not familiar with the origins of "carriage
return" versus "line feed". (It is perverse in the modern sense of "line
break" as a separator between lines, but that's newer than ASCII.)

--
Some say the Wired doesn't have political borders like the real world,
but there are far too many nonsense-spouting anarchists or idiots who
think that pranks are a revolution.

Jul 22 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Fred Ma | last post by:
I'm using the stringstreams to get the numerical values of string tokens (the strings result from tokenizing a line of input elsewhere): #include <iostream> #include <sstream> using namespace...
8
by: Randy Yates | last post by:
I'm using mingw/g++ 3.3.3. When I use pos = tellg(), getline(), setg(pos), then the next getline() does NOT get from the original position. I've tried doing a clear() before the seekg() to no...
16
by: kate | last post by:
salve. per favore rispondete alla mia domanda: Come faccio a ottenere le dimensioni di un file?(con c/c++) risp presto grazie
2
by: philippe.deneve | last post by:
Hi all, I'm trying to retrieve the filesize of a file which size is larger then 4G. Old C functions are insufficient for they return a long. The standard c++ methods tellg() and seekg() also...
9
by: wizofaus | last post by:
Is the any reason according to the standard that calling tellg() on an std::ifstream after a call to peek() could place the filebuf in an inconsistent state? I think it's a bug in the VC7...
0
by: Chris | last post by:
I am reading in image files in a program and I read in the header in ascii mode. The problem is, sometimes tellg () gives me a completely incorrect result and sometimes it is just fine. Here is...
1
by: Chris | last post by:
I am reading in image files in a program and I read in the header in ascii mode and the data in binary mode. The problem is, sometimes tellg() gives me a completely incorrect result and sometimes...
1
by: Jacek Dziedzic | last post by:
Hello! This is my first time dealing with Very Large Files. I have vector of strings representing numbers and I need to extract bytes in binary mode from a Large File that correspond to...
12
by: toton | last post by:
Hi, I am reading a big file , and need to have a flag for current file position so that I can store the positions for later direct access. However it looks tellg is a very costly function ! But...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.