473,396 Members | 1,859 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Dos vs Unix style text files

I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I
will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to
signal the end of a line, while UNIX uses only one (line-feed). When using
getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file
into a string, there is still a carriage return on the end of it. This then
causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of
the terminator characters are removed and added as needed, but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general
technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.

TIA,

Dave Moore
Jul 23 '05 #1
8 3391
Dave Moore wrote:
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I
will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to
signal the end of a line, while UNIX uses only one (line-feed). When using
getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file
into a string, there is still a carriage return on the end of it. This then
causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of
the terminator characters are removed and added as needed, but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general
technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.


If you open a file that you know _may_ contain \r, just discard them
from the lines before you process your lines further.

V
Jul 23 '05 #2

Dave Moore wrote:
Perhaps Windoze-based compilers automatically set things up so that both of the terminator characters are removed and added as needed, but I am using g++ on cygwin, and I have to deal with this myself.
The OS should be doing it. I believe there is hackary with mounting
mode in cygwin.

So, is there a general technique for dealing with this?


Usually you open your file in text mode. With cygwin I believe that
folder or whatever has to be 'mounted' in text mode as well...or
something of that order. Read docs in cygwin about mounting.

Jul 23 '05 #3

"Dave Moore" <dt*****@email.unc.edu> wrote in message
news:37*************@individual.net...
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to signal the end of a line, while UNIX uses only one (line-feed). When using getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file into a string, there is still a carriage return on the end of it. This then causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of the terminator characters are removed and added as needed, but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.


Standard C++ defines a single (abstract) type 'char' value which
denotes 'newline' ('\n'). It does not specify its numeric value
or a mapping to a particular character set. The implementation is
responsible for translating between an external 'end-of-line'
indicator and '\n'. (This happens for streams opened in 'text mode'
(the default).)

If a stream is opened in 'binary mode', no such translation occurs
(however, there may still be a conversion from the 'external' to
'internal' [i.e. in-memory] encoding). IOW in 'binary mode',
'newline' has no meaning.

If you're opening your streams in text mode, and your compiler
is failing to do the proper translations to/from '\n', then
it's non-compliant, broken, or not configured correctly.

Everything You Ever Wanted To Know About C++ Streams:
http://www.langer.camelot.de/iostreams.html
-Mike

Jul 23 '05 #4
"Mike Wahler" <mk******@mkwahler.net> wrote in message
news:PM*****************@newsread1.news.pas.earthl ink.net...
If you're opening your streams in text mode, and your compiler
is failing to do the proper translations to/from '\n', then
it's non-compliant, broken, or not configured correctly.


I think I spoke too soon. Rereading your message, I see
you're trying to read a 'foreign' file format. This means
you'll have to manage the translations yourself. Or alternatively
there exist utilities which can convert files between "DOS text"
and "UNIX text" formats. That might make things easier for you.
Check google.

-Mike
Jul 23 '05 #5
Dave Moore wrote:
I realize this is a somewhat platform specific question,
but I think it is still of general enough interest to ask it here.
This is a perfectly valid C++ question.
If I am wrong, I guess I will find out 8*). As we all know, DOS uses two characters (carriage-return and line-feed),
to signal the end of a line, while UNIX uses only one (line-feed).
When using getline in C++, (default is '\n'),
one can only specify a single character as the terminator
so, if you read a line of text from a DOS-style text file into a string,
there is still a carriage return on the end of it.
This, then, causes problems,
particularly if I want to later concatenate two strings read in this way.

Perhaps Windoze-based compilers automatically set things up
so that both of the terminator characters are removed and added as needed,
but I am using g++ on cygwin, and I have to deal with this myself.
No! The GNU C++ compiler on cygwin will do this for you too.
So, is there a general technique for dealing with this?
Open the file in text mode. This converts
the carriage-return/line-feed sequence to a line-feed on input and
the line-feed or a carriage-return/line-feed sequence on output.
I don't really want to have to check the last character
each time I read in a string with getline
and remove it if it is a carriage-return.
Actually, I don't even know how I would do that offhand.
I guess look up ^CR in an ASCII table and check it using the octal value?
Any help would be appreciated.


If you need to see the carriage-return/linefeed sequence
in your program, open the file in binary mode:

std::ifstream input("input_file_name", std::ios::binary);
Jul 23 '05 #6

"Dave Moore" <dt*****@email.unc.edu> wrote in message
news:37*************@individual.net...
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to signal the end of a line, while UNIX uses only one (line-feed). When using getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file into a string,
using a UNIX implementation
there is still a carriage return on the end of it. This then
causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of the terminator characters are removed and added as needed,
Using a Windows implementation, 'end of line' indicators
in the file are automatically translated to '\n' (which
C++ does not assign a specific value).
but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.


#include <fstream>
#include <iostream>
#include <istream>
#include <string>

/*
Extracts a string from the stream 'is', using
default terminator '\n', and stores the string
in 'line'. If the last character of the extracted
string is equal to 'rem', removes it. Returns a
reference to 'is'.
*/
std::istream& get_xlate_line(std::istream& is,
std::string& line,
char rem = '\r')
{
std::getline(is, line);

if(!line.empty())
{
std::string::iterator e(line.end() - 1);
if(*e == rem)
line.erase(e);
}

return is;
}

/* extract and output strings from a file */
int main()
{
std::ifstream ifs("filename");
std::string line;

while(get_xlate_line(ifs, line))
std::cout << line << '\n';

return 0;
}

-Mike
Jul 23 '05 #7
"Noah Roberts" <nr******@stmartin.edu> wrote in message
news:11*********************@c13g2000cwb.googlegro ups.com...

Dave Moore wrote:
Perhaps Windoze-based compilers automatically set things up so that

both of
the terminator characters are removed and added as needed, but I am

using
g++ on cygwin, and I have to deal with this myself.


The OS should be doing it. I believe there is hackary with mounting
mode in cygwin.

So, is there a general
technique for dealing with this?


Usually you open your file in text mode. With cygwin I believe that
folder or whatever has to be 'mounted' in text mode as well...or
something of that order. Read docs in cygwin about mounting.


It seems that something a bit different is going on, but your reply led me
in the right direction. I was compiling my executable to use the cygwin
run-time environment (cygwin.dll), rather than the windows environment. I
am pretty sure I set up my cygwin installation to use unix-style text files,
so that might well explain the confusion.

Once I compiled my program to use the windows environment
(using -mno-cygwin, as specified in the cygwin FAQ), everything was groovy.
Thanks for the suggestion!

Dave Moore
Jul 23 '05 #8
Noah Roberts wrote:
Dave Moore wrote:

Perhaps Windoze-based compilers automatically set things up so that


both of
the terminator characters are removed and added as needed, but I am


using
g++ on cygwin, and I have to deal with this myself.

The OS should be doing it. I believe there is hackary with mountin
mode in cygwin.

The OS might do it, but I rarely see so. The expansion is done in the
language runtime library. What most likely is confused here is that
the CYGWIN environment has the compiler thinking that there is no conversion
needed, but he's giving it files from the DOS world.
Jul 23 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: KevinGPO | last post by:
Just wondering if anyone knows if there are converters to convert from: MS Visual C++ 6.0 or MS Visual Studio 2003 project files into UNIX autogen/configure/make files?
7
by: David Meier | last post by:
Hi, I am new to C# and I am facing this small problem: I start a new process using cygwin and I redirect the standard output to a string variable. When I display the string variable in a list...
3
by: Steve | last post by:
Is there a UNIX style HUP call for .Net apps to make them keep running, but reread their .config files? Thanks, Steve
22
by: Xah Lee | last post by:
The Nature of the “Unix Philosophy” Xah Lee, 2006-05 In the computing industry, especially among unix community, we often hear that there's a “Unix Philosophy”. In this essay, i...
1
by: Ben | last post by:
Hi, I have a python script on a unix system that runs fine. I have a python script on a windows system that runs fine. Both use tabs to indent sections of the code. I now want to run them on the...
13
by: Zytan | last post by:
I am downloading a file with \n newlines from a Unix system, and storing it to a string. I want to convert it to \r\n newlines for Windows. I know the StreamReader has an Encoding attribute, but...
16
by: mazwolfe | last post by:
Someone recently asked about reading lines. I had this code written some time ago (part of a BASIC-style interpreter based on H. Shildts in Art of C) to read a file with the lines ended in any...
65
by: Hongyu | last post by:
Dear all: I am trying to write to a file with full directory name and file name specified (./outdir/mytestout.txt where . is the current directory) in C programming language and under Unix, but...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development projectplanning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.