By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,968 Members | 1,589 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,968 IT Pros & Developers. It's quick & easy.

C Text/Binary Files

P: n/a
The stdin/stdout files of C seem to be always in Text mode.

Is there any way of running a C program so that these (but especially
stdout) are in Binary mode instead?

(I'm in the process of wrapping a different language around C which doesn't
want the concept of text and binary files. But if I output a string such as
"ONE\nTWO\n", this will behave differently between stdout and a regular
(binary) file. Examples on my OS:

"\n" Output 13,10 in text mode; 10 in binary mode
"\w" Output 13,13,10 in text mode; 13,10 in binary mode

(\w is a new escape code equivalent to \r\n). Workarounds will be awkward
(and I could never stop \n expanding to 13,10 for stdout) so would be nice
to avoid them)

--
Thanks,

Bartc
Jun 27 '08 #1
Share this Question
Share on Google+
24 Replies


P: n/a
Bartc wrote:
The stdin/stdout files of C seem to be always in Text mode.
That's what the standards say.

--
pete
Jun 27 '08 #2

P: n/a
Bartc wrote:
The stdin/stdout files of C seem to be always in Text mode.

Is there any way of running a C program so that these (but especially
stdout) are in Binary mode instead?
Yes, use freopen like this:

FILE *fin, *fout, *ferr;

fin = freopen(NULL, "rb", stdin);
fout = freopen(NULL, "ab", stdout);
ferr = freopen(NULL, "ab", stderr);

You could assign the return value to stdin, stdout and stderr itself,
but the standards says that they are not necessarily modifiable
lvalues. However it will probably work on most systems you would care
about.

See section 7.19.5.4 of the standard for details.

<snip>

Jun 27 '08 #3

P: n/a
santosh <sa*********@gmail.comwrites:
Bartc wrote:
>The stdin/stdout files of C seem to be always in Text mode.

Is there any way of running a C program so that these (but especially
stdout) are in Binary mode instead?

Yes, use freopen like this:

FILE *fin, *fout, *ferr;

fin = freopen(NULL, "rb", stdin);
fout = freopen(NULL, "ab", stdout);
ferr = freopen(NULL, "ab", stderr);

You could assign the return value to stdin, stdout and stderr itself,
but the standards says that they are not necessarily modifiable
lvalues. However it will probably work on most systems you would care
about.
More importantly, freopen is not guaranteed to do what Bartc wants.
Thus the key information is not what the standard says but what
typical implementations do on systems where there is difference
between text and binary mode. I can give only one data point:
lcc-win32 returns NULL from the freopen call (for stdout).

--
Ben.
Jun 27 '08 #4

P: n/a
Ben Bacarisse wrote:
santosh <sa*********@gmail.comwrites:
>Bartc wrote:
>>The stdin/stdout files of C seem to be always in Text mode.

Is there any way of running a C program so that these (but
especially stdout) are in Binary mode instead?

Yes, use freopen like this:

FILE *fin, *fout, *ferr;

fin = freopen(NULL, "rb", stdin);
fout = freopen(NULL, "ab", stdout);
ferr = freopen(NULL, "ab", stderr);

You could assign the return value to stdin, stdout and stderr itself,
but the standards says that they are not necessarily modifiable
lvalues. However it will probably work on most systems you would care
about.

More importantly, freopen is not guaranteed to do what Bartc wants.
Thus the key information is not what the standard says but what
typical implementations do on systems where there is difference
between text and binary mode. I can give only one data point:
lcc-win32 returns NULL from the freopen call (for stdout).
And it similarly fails for stdin too. It's perhaps surprising that it
should fail. What difficulty would an implementation like win-lcc have
with this?

Jun 27 '08 #5

P: n/a
>
See section 7.19.5.4 of the standard for details.

<snip>
Anyway, How can I find out standard's documents?
Jun 27 '08 #6

P: n/a
"Bartc" <bc@freeuk.comwrote in message
news:LC*******************@text.news.virginmedia.c om...
The stdin/stdout files of C seem to be always in Text mode.
Thanks for the replies.

I think if I use exclusively "\w" for newlines (ie. "\r\n") in strings and
internal functions that generate newlines, then this will work for binary
files.

For stdout, this will generate (on my OS) 13,13,10, but for console output
that is not critical. The only problem will be when stdout is piped or
redirected to a file at the OS command line, then I will need to process the
output to take out the extra 13.

I can live with that.

I have tried freopen() as suggested, and that sort of works, but output is
then sent to a file. So this is an alternative perhaps to redirection by the
OS and the mode /will/ be binary.

--
Bartc
Jun 27 '08 #7

P: n/a
Ali Karaali <al****@gmail.comwrites:
>>
See section 7.19.5.4 of the standard for details.

<snip>

Anyway, How can I find out standard's documents?
http://www.open-std.org/jtc1/sc22/wg...docs/n1256.pdf is a recent
draft of C99. The same site has lots of other useful documents.

--
Ben.
Jun 27 '08 #8

P: n/a
On Jun 23, 8:21 am, santosh <santosh....@gmail.comwrote:
And it similarly fails for stdin too. It's perhaps surprising that it
should fail. What difficulty would an implementation like win-lcc have
with this?

The following works for me:
#include <stdio.h>
#include <stdlib.h>

int
main(void) {
stdout = freopen(NULL, "ab", stdout);
return 0;
}

I compiled that with gcc on Linux. It works probably because Linux/
Unix does not distinguish between text and binary mode.
Jun 27 '08 #9

P: n/a
santosh <sa*********@gmail.comwrote:
Bartc wrote:
The stdin/stdout files of C seem to be always in Text mode.

Is there any way of running a C program so that these (but especially
stdout) are in Binary mode instead?

Yes, use freopen like this:

FILE *fin, *fout, *ferr;

fin = freopen(NULL, "rb", stdin);
fout = freopen(NULL, "ab", stdout);
ferr = freopen(NULL, "ab", stderr);
Note that freopen() with a null first argument is new in C99. In C89,
you had to give a new file name.

Richard
Jun 27 '08 #10

P: n/a
"Bartc" <bc@freeuk.comwrites:
"Bartc" <bc@freeuk.comwrote in message
news:LC*******************@text.news.virginmedia.c om...
The stdin/stdout files of C seem to be always in Text mode.

Thanks for the replies.

I think if I use exclusively "\w" for newlines (ie. "\r\n") in strings and
internal functions that generate newlines, then this will work for binary
files.
[...]

What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #11

P: n/a
On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.
"\w" does not match the syntax of a string literal, so by the rule of the
longest match this is tokenised as {"}{\}{w}{"}. The behaviour is
undefined if a double quote character occurs as a single token. There need
not be any value given to "\w", and if there is, it need not be documented.
Jun 27 '08 #12

P: n/a

"Keith Thompson" <ks*@cts.comwrote in message
news:lz************@stalkings.ghoti.net...
"Bartc" <bc@freeuk.comwrites:
>"Bartc" <bc@freeuk.comwrote in message
news:LC*******************@text.news.virginmedia. com...
The stdin/stdout files of C seem to be always in Text mode.

Thanks for the replies.

I think if I use exclusively "\w" for newlines (ie. "\r\n") in strings
and
internal functions that generate newlines, then this will work for binary
files.
[...]

What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.
Sorry. In my original post I'd indicated (not very clearly) that \w was a
new escape in a language I was creating to wrap around C.

So it's not a C escape but is translated to "\r\n". It represents 'windows
newline'; (or more generally, the full newline sequence used in the target
OS).

--
Bartc

Jun 27 '08 #13

P: n/a
Harald van D3k <tr*****@gmail.comwrites:
On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.

"\w" does not match the syntax of a string literal, so by the rule
of the longest match this is tokenised as {"}{\}{w}{"}. The
behaviour is undefined if a double quote character occurs as a
single token. There need not be any value given to "\w", and if
there is, it need not be documented.
I believe you're mostly or entirely right, and I was wrong.

I misinterpreted the second clause of C99 6.4.4.4p10:

The value of an integer character constant containing more than
one character (e.g., 'ab'), or containing a character or escape
sequence that does not map to a single-byte execution character,
is implementation-defined.

as applying to things like '\w'; instead, it applies to things like
'\xffffffff'.

"\w" is split into 4 preprocessor tokens:
" \ w "
The " is not a punctuator; it's in the category "each non-white-space
character that cannot be one of the above" (C99 6.4), which means the
behavior is undefined.

In addition, though, this preprocessor token cannot be converted to a
token. The constraint in 6.4p2 is:

Each preprocessing token that is converted to a token shall have
the lexical form of a keyword, an identifier, a constant, a string
literal, or a punctuator.

So, assuming that "\w" isn't surrounded by something like "#if 0"
.... "endif", it would seem to be a constraint violation. By C99
5.1.1.3, this requires a diagnostic even if the behavior is also
undefined.

Note that, by the same reasoning, "abcd\w" should be split into 5
preprocessing tokens:

" abcd \ w "

which just seems confusing. But since such cases require a diagnostic
anyway, a compiler doesn't actually have to pp-tokenize it that way;
as long as it prints a warning or error message, its job is done.

Still, I think the description would have been simpler if a \ followed
by any character in a character or string literal were allowed
syntactically, with a constraint limiting the following character to
the ones that are specified. Then "\w" would be a single pp-token and
a single token (a string literal), with a diagnostic required because
of the constraint violation.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #14

P: n/a
On 23 Jun 2008 at 21:43, Keith Thompson wrote:
䡡牡汤⁶慮⁄ij*㱴牵敤晸䁧浡楬⹣ **睲楴敳?
*佮⁍潮*㈳⁊畮′〰**ㄲ㨵㤺〱* 〰**楴**周*灳潮⁷牯瑥?
You may want to check whether you really mean to include this header:
Content-Type: text/plain; charset=utf-16be

Jun 27 '08 #15

P: n/a
On Mon, 23 Jun 2008 14:43:50 -0700, Keith Thompson wrote:
Harald van Dþÿ3k <tr*****@gmail.comwrites:
>On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.

"\w" does not match the syntax of a string literal, so by the rule of
the longest match this is tokenised as {"}{\}{w}{"}. The behaviour is
undefined if a double quote character occurs as a single token. There
need not be any value given to "\w", and if there is, it need not be
documented.
[...]
"\w" is split into 4 preprocessor tokens:
" \ w "
The " is not a punctuator; it's in the category "each non-white-space
character that cannot be one of the above" (C99 6.4), which means the
behavior is undefined.
Yes. This would normally cause nothing more than a constraint violation
(as you pointed out below) or syntax error, but in the special case of '
or ", the behaviour is explicitly undefined.
In addition, though, this preprocessor token cannot be converted to a
token. The constraint in 6.4p2 is:

Each preprocessing token that is converted to a token shall have the
lexical form of a keyword, an identifier, a constant, a string
literal, or a punctuator.

So, assuming that "\w" isn't surrounded by something like "#if 0" ...
"endif", it would seem to be a constraint violation. By C99 5.1.1.3,
this requires a diagnostic even if the behavior is also undefined.
That's a fair point, though I'm not sure this is intended. As I understand
it, the point of making a stray " undefined was (in part) to allow for
implementations to support multi-line string literals as an extension. An
example similar to what I've posted on c.l.c before:

#define IGNORE(arg) /* nothing */
int main(void) {
IGNORE(")
void *p = 1;
IGNORE(")
}

Strictly by the standard, the two identical lines are tokenised as
{IGNORE}{(}{"}{)}, which expands to nothing. So after preprocessing, an
non-zero integer constant is used to initialise a pointer, which violates
a constraint. Some implementations, however, are unable to diagnose this,
because they take the undefined behaviour of a stray " as permission to
tokenise the body of main as

{IGNORE}
{(}
{")\n void *p = 1;\n IGNORE("}
{)}

I believe that since the behaviour is undefined in translation phase 3,
any constraint violations in later phases should not require a diagnostic.
I cannot back this up with wording from the standard, only explain with
examples.
Note that, by the same reasoning, "abcd\w" should be split into 5
preprocessing tokens:

" abcd \ w "
Yes, and then by my interpretation, the behaviour is undefined, so an
implementation may choose to make this a single string literal, with or
without a diagnostic, without any requirement on generated code (if any).
which just seems confusing. But since such cases require a diagnostic
anyway, a compiler doesn't actually have to pp-tokenize it that way; as
long as it prints a warning or error message, its job is done.

Still, I think the description would have been simpler if a \ followed
by any character in a character or string literal were allowed
syntactically, with a constraint limiting the following character to the
ones that are specified. Then "\w" would be a single pp-token and a
single token (a string literal), with a diagnostic required because of
the constraint violation.
Agreed.
Jun 27 '08 #16

P: n/a
Ali Karaali wrote:
>
>See section 7.19.5.4 of the standard for details.

<snip>

Anyway, How can I find out standard's documents?
Some useful references about C:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://c-faq.com/ (C-faq)
<http://benpfaff.org/writings/clc/off-topic.html>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf(C99)
<http://cbfalconer.home.att.net/download/n869_txt.bz2(C99, txt)
<http://www.dinkumware.com/c99.aspx (C-library}
<http://gcc.gnu.org/onlinedocs/ (GNU docs)
<http://clc-wiki.net/wiki/C_community:comp.lang.c:Introduction>

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
Jun 27 '08 #17

P: n/a
[Apologies for the binary garbage I posted earlier. I'm having
multiple system problems, and the system I'm now using apparently
didn't like the non-ASCII character in Harald's last name. My "From:"
address has also been incorrect in most of today's postings; the
"ks*@cts.com" address hasn't existed for several years.]

Harald van D?k <tr*****@gmail.comwrites:
On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.

"\w" does not match the syntax of a string literal, so by the rule
of the longest match this is tokenised as {"}{\}{w}{"}. The
behaviour is undefined if a double quote character occurs as a
single token. There need not be any value given to "\w", and if
there is, it need not be documented.
I believe you're mostly or entirely right, and I was wrong.

I misinterpreted the second clause of C99 6.4.4.4p10:

The value of an integer character constant containing more than
one character (e.g., 'ab'), or containing a character or escape
sequence that does not map to a single-byte execution character,
is implementation-defined.

as applying to things like '\w'; instead, it applies to things like
'\xffffffff'.

"\w" is split into 4 preprocessor tokens:
" \ w "
The " is not a punctuator; it's in the category "each non-white-space
character that cannot be one of the above" (C99 6.4), which means the
behavior is undefined.

In addition, though, this preprocessor token cannot be converted to a
token. The constraint in 6.4p2 is:

Each preprocessing token that is converted to a token shall have
the lexical form of a keyword, an identifier, a constant, a string
literal, or a punctuator.

So, assuming that "\w" isn't surrounded by something like "#if 0"
.... "endif", it would seem to be a constraint violation. By C99
5.1.1.3, this requires a diagnostic even if the behavior is also
undefined.

Note that, by the same reasoning, "abcd\w" should be split into 5
preprocessing tokens:

" abcd \ w "

which just seems confusing. But since such cases require a diagnostic
anyway, a compiler doesn't actually have to pp-tokenize it that way;
as long as it prints a warning or error message, its job is done.

Still, I think the description would have been simpler if a \ followed
by any character in a character or string literal were allowed
syntactically, with a constraint limiting the following character to
the ones that are specified. Then "\w" would be a single pp-token and
a single token (a string literal), with a diagnostic required because
of the constraint violation.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Jun 27 '08 #18

P: n/a
Bartc wrote:
"Keith Thompson" <ks*@cts.comwrote in message
news:lz************@stalkings.ghoti.net...
>"Bartc" <bc@freeuk.comwrites:
>>"Bartc" <bc@freeuk.comwrote in message
news:LC*******************@text.news.virginmedia .com...
The stdin/stdout files of C seem to be always in Text mode.

Thanks for the replies.

I think if I use exclusively "\w" for newlines (ie. "\r\n") in
strings and
internal functions that generate newlines, then this will work for
binary files.
[...]

What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.

Sorry. In my original post I'd indicated (not very clearly) that \w
was a new escape in a language I was creating to wrap around C.

So it's not a C escape but is translated to "\r\n". It represents
'windows newline'; (or more generally, the full newline sequence used
in the target OS).
So where then does your '\w' differ from C's '\n'? In Windows '\n' results
in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on other
platforms in whatever that platform uses to separate lines.

Bye, Jojo
Jun 27 '08 #19

P: n/a

"Joachim Schmitz" <no*********@schmitz-digital.dewrote in message
news:g3**********@online.de...
Bartc wrote:
>"Keith Thompson" <ks*@cts.comwrote in message
news:lz************@stalkings.ghoti.net...
>>What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.

Sorry. In my original post I'd indicated (not very clearly) that \w
was a new escape in a language I was creating to wrap around C.

So it's not a C escape but is translated to "\r\n". It represents
'windows newline'; (or more generally, the full newline sequence used
in the target OS).
So where then does your '\w' differ from C's '\n'? In Windows '\n' results
in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on
other platforms in whatever that platform uses to separate lines.
\w expands to \r\n (eg. CR,LF) at compile-time (in the other language).
\n stays as \n (typically LF) at compile-time.

\n only expands to all those other combinations at runtime, and only for
text modes.
At runtime, \w would result in \r followed by the expansion of \n, for text
modes.

Actual code:
printf("Hello World\w")

After translating to C:
printf("Hello World\r\n");

At runtime (using printf, stdout directed to a file):
150C:0100 48 65 6C 6C 6F 20 57 6F-72 6C 64 0D 0D 0A 30 3A Hello
World...0:

--
Bartc
Jun 27 '08 #20

P: n/a
Joachim Schmitz wrote:
Bartc wrote:
>"Keith Thompson" <ks*@cts.comwrote in message
news:lz************@stalkings.ghoti.net...
>>"Bartc" <bc@freeuk.comwrites:
"Bartc" <bc@freeuk.comwrote in message
news:LC*******************@text.news.virginmedi a.com...
The stdin/stdout files of C seem to be always in Text mode.
Thanks for the replies.

I think if I use exclusively "\w" for newlines (ie. "\r\n") in
strings and
internal functions that generate newlines, then this will work for
binary files.
[...]

What is "\w"? It's not a standard escape sequence; its value is
implementation-defined.
Sorry. In my original post I'd indicated (not very clearly) that \w
was a new escape in a language I was creating to wrap around C.

So it's not a C escape but is translated to "\r\n". It represents
'windows newline'; (or more generally, the full newline sequence used
in the target OS).
So where then does your '\w' differ from C's '\n'? In Windows '\n' results
in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on other
platforms in whatever that platform uses to separate lines.
I think Bartc just doesn't grok text mode.

--
pete
Jun 27 '08 #21

P: n/a
In article <tD*******************@text.news.virginmedia.com >,
Bartc <bc@freeuk.comwrote:
>\w expands to \r\n (eg. CR,LF) at compile-time (in the other language).
\n stays as \n (typically LF) at compile-time.
I can see this might be useful for writing to binary files in the
system's native text format.

It's limited to systems where the line break is represented by a
sequence of characters: it doesn't make sense on systems with lines
implemented in some other way (e.g. with a count). Of course, you may
not consider that important nowadays.

For a purely C solution you could just define a macro; e.g. for
Windows

#define LINEEND "\015\012"

and you can use it easily in constant strings

"hello" LINEEND "world" LINEEND

-- Richard
--
In the selection of the two characters immediately succeeding the numeral 9,
consideration shall be given to their replacement by the graphics 10 and 11 to
facilitate the adoption of the code in the sterling monetary area. (X3.4-1963)
Jun 27 '08 #22

P: n/a
Hello ``
I am a student from china.
I like c.

If you make a friend with me, I am very happy.
My MSN ID is bi******@live.cn

--
Message posted using http://www.talkaboutprogramming.com/group/comp.lang.c/
More information at http://www.talkaboutprogramming.com/faq.html

Jun 27 '08 #23

P: n/a
BigRelax wrote:
Hello ``
I am a student from china.
I like c.

If you make a friend with me, I am very happy.
My MSN ID is bi******@live.cn
This is not a group for "making friends" or idle chit-chat. If you have
questions or problem on standard C post them here.
--
Message posted using
http://www.talkaboutprogramming.com/group/comp.lang.c/ More
information at http://www.talkaboutprogramming.com/faq.html
Complain to the maintainer of the above forum that the signature
separator that they add is broken.

Jun 27 '08 #24

P: n/a
santosh wrote:
BigRelax wrote:
.... snip ...
>
>--
Message posted using
http://www.talkaboutprogramming.com/group/comp.lang.c/ More
information at http://www.talkaboutprogramming.com/faq.html

Complain to the maintainer of the above forum that the signature
separator that they add is broken.
Your comment would be much more useful if you pointed out how it
was broken. It requires a line containing exactly "-- ". Note the
terminal space.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
Jun 27 '08 #25

This discussion thread is closed

Replies have been disabled for this discussion.