By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,481 Members | 3,078 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,481 IT Pros & Developers. It's quick & easy.

help needed using ifstream::seekg with windows text file

P: n/a
Hello.
I've searched all over and haven't seen another thread with this
problem. Please bear with me as I try to explain. thanks. :)

I have some programs that need to be cross-platform compatible (unix
and windowsXP). The first program parses a text file and records where
snippets are in terms of where it begins (char offset from begin of
the file) and length (number of chars).

One can almost use "byte" and "char" interchangeably here, given that
sizeof(char) is 1, however it doesn't quite work that way.

The second program tries to get things from this text file with the
information collected from program1. program2 tries to use seekg and
read. something like:

char* snippet = new char[length + 1];
ifstream read(file); // OR ifstream read(file, ios::binary);
read.seekg(offset, ios::beg);
read.read(snippet,length);
read.close();:
snippet[length] = '\0';

The problem is that when the code is as above in text mode, while read
actually reads in the number of characters, seekg seeks to the number
of bytes. So, if I seek to 0, I read in exactly what I need. If I need
to seek to any char length > 0, it doesn't seek to my next offset
correctly, and it appears that the number of characters it's missing
(short) is correlated with the number of newlines there are previous
to that point in the file. (read still reads in the correct number of
characters from that point)

I know that windows and unix treat newlines differently. But I can't
quite understand the behavior on windows to get it to do what I want.
It behaves as if a newline is 1 char with sizeof 2 bytes. (although if
I go through it character by character and print out sizeof(*c) I
never get anything that is 2, it's always 1.)

If I set it to binary mode, both seekg and read work in terms of
bytes. Successive seeks read in snippets one right after the other as
they appear in the text file with no overlap (actually, it drops 1
character in between for which I have no explanation). And each
snippet has fewer visible number of characters than its length.

This problem does not occur on unix and does not occur on windows if I
transfer a text file from unix in "binary" mode so that the \n's don't
get replaced by windows. In these instances, program2 behaves as
expected regardless of whether it's in binary or text mode.

Is there any way to get a file pointer to seek to a place in a text
file according to the number of characters, like the way read behaves
in text mode? This would be the simplest solution.

I guess one could say get program1 to store the snippet information in
terms of bytes and not number of characters. I'm not sure how to do
that. What it does is count characters, and on windows the newline is
still counting as 1 character. I guess I could add an extra byte every
time there is a windows-style newline, but I'm not even sure my
assessment that a windows newline = 1 char of 2 bytes is actually
true.

What would be a more elegant solution that would work on both
platforms?

Thank you for your help.
Jul 22 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
"wtnt" <wt**@konzoo.com> wrote in message
news:4f**************************@posting.google.c om...
Hello.
I've searched all over and haven't seen another thread with this
problem. Please bear with me as I try to explain. thanks. :)

I have some programs that need to be cross-platform compatible (unix
and windowsXP). The first program parses a text file and records where snippets are in terms of where it begins (char offset from begin of
the file) and length (number of chars).

One can almost use "byte" and "char" interchangeably here, given that sizeof(char) is 1, however it doesn't quite work that way.


You simply can't store offsets into files portably in text mode. In
binary mode you can, if you remember that there may be an arbitrary
number of null characters added at the end of the file. See the
Dinkumware online documentation for an explanation.
(http://www.dinkumware.com/refxcpp.html. Go to the C++ table of
contents, and look under "Files and Streams"). Alternatively, see
P.J.Plauger's book on the C standard library.

Jonathan
Jul 22 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.