473,606 Members | 2,115 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Dos vs Unix style text files

I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I
will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to
signal the end of a line, while UNIX uses only one (line-feed). When using
getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file
into a string, there is still a carriage return on the end of it. This then
causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of
the terminator characters are removed and added as needed, but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general
technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.

TIA,

Dave Moore
Jul 23 '05 #1
8 3410
Dave Moore wrote:
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I
will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to
signal the end of a line, while UNIX uses only one (line-feed). When using
getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file
into a string, there is still a carriage return on the end of it. This then
causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of
the terminator characters are removed and added as needed, but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general
technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.


If you open a file that you know _may_ contain \r, just discard them
from the lines before you process your lines further.

V
Jul 23 '05 #2

Dave Moore wrote:
Perhaps Windoze-based compilers automatically set things up so that both of the terminator characters are removed and added as needed, but I am using g++ on cygwin, and I have to deal with this myself.
The OS should be doing it. I believe there is hackary with mounting
mode in cygwin.

So, is there a general technique for dealing with this?


Usually you open your file in text mode. With cygwin I believe that
folder or whatever has to be 'mounted' in text mode as well...or
something of that order. Read docs in cygwin about mounting.

Jul 23 '05 #3

"Dave Moore" <dt*****@email. unc.edu> wrote in message
news:37******** *****@individua l.net...
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to signal the end of a line, while UNIX uses only one (line-feed). When using getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file into a string, there is still a carriage return on the end of it. This then causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of the terminator characters are removed and added as needed, but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.


Standard C++ defines a single (abstract) type 'char' value which
denotes 'newline' ('\n'). It does not specify its numeric value
or a mapping to a particular character set. The implementation is
responsible for translating between an external 'end-of-line'
indicator and '\n'. (This happens for streams opened in 'text mode'
(the default).)

If a stream is opened in 'binary mode', no such translation occurs
(however, there may still be a conversion from the 'external' to
'internal' [i.e. in-memory] encoding). IOW in 'binary mode',
'newline' has no meaning.

If you're opening your streams in text mode, and your compiler
is failing to do the proper translations to/from '\n', then
it's non-compliant, broken, or not configured correctly.

Everything You Ever Wanted To Know About C++ Streams:
http://www.langer.camelot.de/iostreams.html
-Mike

Jul 23 '05 #4
"Mike Wahler" <mk******@mkwah ler.net> wrote in message
news:PM******** *********@newsr ead1.news.pas.e arthlink.net...
If you're opening your streams in text mode, and your compiler
is failing to do the proper translations to/from '\n', then
it's non-compliant, broken, or not configured correctly.


I think I spoke too soon. Rereading your message, I see
you're trying to read a 'foreign' file format. This means
you'll have to manage the translations yourself. Or alternatively
there exist utilities which can convert files between "DOS text"
and "UNIX text" formats. That might make things easier for you.
Check google.

-Mike
Jul 23 '05 #5
Dave Moore wrote:
I realize this is a somewhat platform specific question,
but I think it is still of general enough interest to ask it here.
This is a perfectly valid C++ question.
If I am wrong, I guess I will find out 8*). As we all know, DOS uses two characters (carriage-return and line-feed),
to signal the end of a line, while UNIX uses only one (line-feed).
When using getline in C++, (default is '\n'),
one can only specify a single character as the terminator
so, if you read a line of text from a DOS-style text file into a string,
there is still a carriage return on the end of it.
This, then, causes problems,
particularly if I want to later concatenate two strings read in this way.

Perhaps Windoze-based compilers automatically set things up
so that both of the terminator characters are removed and added as needed,
but I am using g++ on cygwin, and I have to deal with this myself.
No! The GNU C++ compiler on cygwin will do this for you too.
So, is there a general technique for dealing with this?
Open the file in text mode. This converts
the carriage-return/line-feed sequence to a line-feed on input and
the line-feed or a carriage-return/line-feed sequence on output.
I don't really want to have to check the last character
each time I read in a string with getline
and remove it if it is a carriage-return.
Actually, I don't even know how I would do that offhand.
I guess look up ^CR in an ASCII table and check it using the octal value?
Any help would be appreciated.


If you need to see the carriage-return/linefeed sequence
in your program, open the file in binary mode:

std::ifstream input("input_fi le_name", std::ios::binar y);
Jul 23 '05 #6

"Dave Moore" <dt*****@email. unc.edu> wrote in message
news:37******** *****@individua l.net...
I realize this is a somewhat platform specific question, but I think it is
still of general enough interest to ask it here ... if I am wrong I guess I will find out 8*).

As we all know, DOS uses two characters (carriage-return and line-feed), to signal the end of a line, while UNIX uses only one (line-feed). When using getline in C++, one can only specify a single character as the terminator
(default is '\n'), so if you read a line of text from a DOS-style text file into a string,
using a UNIX implementation
there is still a carriage return on the end of it. This then
causes problems, particularly if I want to later concatenate two strings
read in this way.

Perhaps Windoze-based compilers automatically set things up so that both of the terminator characters are removed and added as needed,
Using a Windows implementation, 'end of line' indicators
in the file are automatically translated to '\n' (which
C++ does not assign a specific value).
but I am using
g++ on cygwin, and I have to deal with this myself. So, is there a general technique for dealing with this? I don't really want to have to check the
last character each time I read in a string with getline, and remove it if
it is a carriage-return. Actually, I don't even know how I would do that
offhand .. I guess look up ^CR in an ASCII table and check it using the
octal value? Any help would be appreciated.


#include <fstream>
#include <iostream>
#include <istream>
#include <string>

/*
Extracts a string from the stream 'is', using
default terminator '\n', and stores the string
in 'line'. If the last character of the extracted
string is equal to 'rem', removes it. Returns a
reference to 'is'.
*/
std::istream& get_xlate_line( std::istream& is,
std::string& line,
char rem = '\r')
{
std::getline(is , line);

if(!line.empty( ))
{
std::string::it erator e(line.end() - 1);
if(*e == rem)
line.erase(e);
}

return is;
}

/* extract and output strings from a file */
int main()
{
std::ifstream ifs("filename") ;
std::string line;

while(get_xlate _line(ifs, line))
std::cout << line << '\n';

return 0;
}

-Mike
Jul 23 '05 #7
"Noah Roberts" <nr******@stmar tin.edu> wrote in message
news:11******** *************@c 13g2000cwb.goog legroups.com...

Dave Moore wrote:
Perhaps Windoze-based compilers automatically set things up so that

both of
the terminator characters are removed and added as needed, but I am

using
g++ on cygwin, and I have to deal with this myself.


The OS should be doing it. I believe there is hackary with mounting
mode in cygwin.

So, is there a general
technique for dealing with this?


Usually you open your file in text mode. With cygwin I believe that
folder or whatever has to be 'mounted' in text mode as well...or
something of that order. Read docs in cygwin about mounting.


It seems that something a bit different is going on, but your reply led me
in the right direction. I was compiling my executable to use the cygwin
run-time environment (cygwin.dll), rather than the windows environment. I
am pretty sure I set up my cygwin installation to use unix-style text files,
so that might well explain the confusion.

Once I compiled my program to use the windows environment
(using -mno-cygwin, as specified in the cygwin FAQ), everything was groovy.
Thanks for the suggestion!

Dave Moore
Jul 23 '05 #8
Noah Roberts wrote:
Dave Moore wrote:

Perhaps Windoze-based compilers automatically set things up so that


both of
the terminator characters are removed and added as needed, but I am


using
g++ on cygwin, and I have to deal with this myself.

The OS should be doing it. I believe there is hackary with mountin
mode in cygwin.

The OS might do it, but I rarely see so. The expansion is done in the
language runtime library. What most likely is confused here is that
the CYGWIN environment has the compiler thinking that there is no conversion
needed, but he's giving it files from the DOS world.
Jul 23 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1972
by: KevinGPO | last post by:
Just wondering if anyone knows if there are converters to convert from: MS Visual C++ 6.0 or MS Visual Studio 2003 project files into UNIX autogen/configure/make files?
7
19923
by: David Meier | last post by:
Hi, I am new to C# and I am facing this small problem: I start a new process using cygwin and I redirect the standard output to a string variable. When I display the string variable in a list box I see those squares representing UNIX line breaks. How can I convert those to Windows style line breaks? Thanks. Dave.
3
1449
by: Steve | last post by:
Is there a UNIX style HUP call for .Net apps to make them keep running, but reread their .config files? Thanks, Steve
22
3650
by: Xah Lee | last post by:
The Nature of the “Unix Philosophy” Xah Lee, 2006-05 In the computing industry, especially among unix community, we often hear that there's a “Unix Philosophy”. In this essay, i dissect the nature and characterization of such “unix philosophy”, as have been described by Brian Kernighan, Rob Pike, Dennis Ritchie, Ken Thompson, and Richard P Gabriel et al, and in recent years by Eric Raymond.
1
295
by: Ben | last post by:
Hi, I have a python script on a unix system that runs fine. I have a python script on a windows system that runs fine. Both use tabs to indent sections of the code. I now want to run them on the same system, actually in the same script by combining bits and pieces. But whatever I try my windows tabs get converted to spaces when I transfer it to the unix system and the interpreter complains that the indentation style is not consistant...
13
4492
by: Zytan | last post by:
I am downloading a file with \n newlines from a Unix system, and storing it to a string. I want to convert it to \r\n newlines for Windows. I know the StreamReader has an Encoding attribute, but this isn't what I need. Should I do a String.Replace(), or is there a better solution? Zytan
16
2936
by: mazwolfe | last post by:
Someone recently asked about reading lines. I had this code written some time ago (part of a BASIC-style interpreter based on H. Shildts in Art of C) to read a file with the lines ended in any format: Microsoft-style CR/LF pair, Unix-style NL, or Mac-style CR. It also allows for EOF that does not follow a blank line. I thought this would make text-file sharing a bit easier. Here it is: /* Load a file, normalizing newlines to *nix...
65
5052
by: Hongyu | last post by:
Dear all: I am trying to write to a file with full directory name and file name specified (./outdir/mytestout.txt where . is the current directory) in C programming language and under Unix, but got errors of Failed to open file ./outdir/mytestout.txt. Below is the code: #include <stdio.h>
0
7939
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8432
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8428
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8299
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
5962
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5456
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
3919
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
3964
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1548
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.