By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,409 Members | 962 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,409 IT Pros & Developers. It's quick & easy.

Dos vs Unix style text files

P: n/a
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I
will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to
signal the end of a line, while UNIX uses only one (line-feed). When using
getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file
into a string, there is still a carriage return on the end of it. This then
causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of
the terminator characters are removed and added as needed, but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general
technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.

TIA,

Dave Moore
Jul 23 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
Dave Moore wrote:
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I
will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to
signal the end of a line, while UNIX uses only one (line-feed). When using
getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file
into a string, there is still a carriage return on the end of it. This then
causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of
the terminator characters are removed and added as needed, but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general
technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.


If you open a file that you know _may_ contain \r, just discard them
from the lines before you process your lines further.

V
Jul 23 '05 #2

P: n/a

Dave Moore wrote:
Perhaps Windoze-based compilers automatically set things up so that both of the terminator characters are removed and added as needed, but I am using g++ on cygwin, and I have to deal with this myself.
The OS should be doing it. I believe there is hackary with mounting
mode in cygwin.

So, is there a general technique for dealing with this?


Usually you open your file in text mode. With cygwin I believe that
folder or whatever has to be 'mounted' in text mode as well...or
something of that order. Read docs in cygwin about mounting.

Jul 23 '05 #3

P: n/a

"Dave Moore" <dt*****@email.unc.edu> wrote in message
news:37*************@individual.net...
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to signal the end of a line, while UNIX uses only one (line-feed). When using getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file into a string, there is still a carriage return on the end of it. This then causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of the terminator characters are removed and added as needed, but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.


Standard C++ defines a single (abstract) type 'char' value which
denotes 'newline' ('\n'). It does not specify its numeric value
or a mapping to a particular character set. The implementation is
responsible for translating between an external 'end-of-line'
indicator and '\n'. (This happens for streams opened in 'text mode'
(the default).)

If a stream is opened in 'binary mode', no such translation occurs
(however, there may still be a conversion from the 'external' to
'internal' [i.e. in-memory] encoding). IOW in 'binary mode',
'newline' has no meaning.

If you're opening your streams in text mode, and your compiler
is failing to do the proper translations to/from '\n', then
it's non-compliant, broken, or not configured correctly.

Everything You Ever Wanted To Know About C++ Streams:
http://www.langer.camelot.de/iostreams.html
-Mike

Jul 23 '05 #4

P: n/a
"Mike Wahler" <mk******@mkwahler.net> wrote in message
news:PM*****************@newsread1.news.pas.earthl ink.net...
If you're opening your streams in text mode, and your compiler
is failing to do the proper translations to/from '\n', then
it's non-compliant, broken, or not configured correctly.


I think I spoke too soon. Rereading your message, I see
you're trying to read a 'foreign' file format. This means
you'll have to manage the translations yourself. Or alternatively
there exist utilities which can convert files between "DOS text"
and "UNIX text" formats. That might make things easier for you.
Check google.

-Mike
Jul 23 '05 #5

P: n/a
Dave Moore wrote:
I realize this is a somewhat platform specific question,
but I think it is still of general enough interest to ask it here.
This is a perfectly valid C++ question.
If I am wrong, I guess I will find out 8*). As we all know, DOS uses two characters (carriage-return and line-feed),
to signal the end of a line, while UNIX uses only one (line-feed).
When using getline in C++, (default is '\n'),
one can only specify a single character as the terminator
so, if you read a line of text from a DOS-style text file into a string,
there is still a carriage return on the end of it.
This, then, causes problems,
particularly if I want to later concatenate two strings read in this way.

Perhaps Windoze-based compilers automatically set things up
so that both of the terminator characters are removed and added as needed,
but I am using g++ on cygwin, and I have to deal with this myself.
No! The GNU C++ compiler on cygwin will do this for you too.
So, is there a general technique for dealing with this?
Open the file in text mode. This converts
the carriage-return/line-feed sequence to a line-feed on input and
the line-feed or a carriage-return/line-feed sequence on output.
I don't really want to have to check the last character
each time I read in a string with getline
and remove it if it is a carriage-return.
Actually, I don't even know how I would do that offhand.
I guess look up ^CR in an ASCII table and check it using the octal value?
Any help would be appreciated.


If you need to see the carriage-return/linefeed sequence
in your program, open the file in binary mode:

std::ifstream input("input_file_name", std::ios::binary);
Jul 23 '05 #6

P: n/a

"Dave Moore" <dt*****@email.unc.edu> wrote in message
news:37*************@individual.net...
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to signal the end of a line, while UNIX uses only one (line-feed). When using getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file into a string,
using a UNIX implementation
there is still a carriage return on the end of it. This then
causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of the terminator characters are removed and added as needed,
Using a Windows implementation, 'end of line' indicators
in the file are automatically translated to '\n' (which
C++ does not assign a specific value).
but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.


#include <fstream>
#include <iostream>
#include <istream>
#include <string>

/*
Extracts a string from the stream 'is', using
default terminator '\n', and stores the string
in 'line'. If the last character of the extracted
string is equal to 'rem', removes it. Returns a
reference to 'is'.
*/
std::istream& get_xlate_line(std::istream& is,
std::string& line,
char rem = '\r')
{
std::getline(is, line);

if(!line.empty())
{
std::string::iterator e(line.end() - 1);
if(*e == rem)
line.erase(e);
}

return is;
}

/* extract and output strings from a file */
int main()
{
std::ifstream ifs("filename");
std::string line;

while(get_xlate_line(ifs, line))
std::cout << line << '\n';

return 0;
}

-Mike
Jul 23 '05 #7

P: n/a
"Noah Roberts" <nr******@stmartin.edu> wrote in message
news:11*********************@c13g2000cwb.googlegro ups.com...

Dave Moore wrote:
Perhaps Windoze-based compilers automatically set things up so that

both of
the terminator characters are removed and added as needed, but I am

using
g++ on cygwin, and I have to deal with this myself.


The OS should be doing it. I believe there is hackary with mounting
mode in cygwin.

So, is there a general
technique for dealing with this?


Usually you open your file in text mode. With cygwin I believe that
folder or whatever has to be 'mounted' in text mode as well...or
something of that order. Read docs in cygwin about mounting.


It seems that something a bit different is going on, but your reply led me
in the right direction. I was compiling my executable to use the cygwin
run-time environment (cygwin.dll), rather than the windows environment. I
am pretty sure I set up my cygwin installation to use unix-style text files,
so that might well explain the confusion.

Once I compiled my program to use the windows environment
(using -mno-cygwin, as specified in the cygwin FAQ), everything was groovy.
Thanks for the suggestion!

Dave Moore
Jul 23 '05 #8

P: n/a
Noah Roberts wrote:
Dave Moore wrote:

Perhaps Windoze-based compilers automatically set things up so that


both of
the terminator characters are removed and added as needed, but I am


using
g++ on cygwin, and I have to deal with this myself.

The OS should be doing it. I believe there is hackary with mountin
mode in cygwin.

The OS might do it, but I rarely see so. The expansion is done in the
language runtime library. What most likely is confused here is that
the CYGWIN environment has the compiler thinking that there is no conversion
needed, but he's giving it files from the DOS world.
Jul 23 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.