By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,853 Members | 1,570 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,853 IT Pros & Developers. It's quick & easy.

Interesting string.resize behavior

P: n/a
#include <string>
#include <iostream>
using namespace std;

int main()
{
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";

cout << str << endl;

return 0;
}

Jun 21 '07 #1
Share this Question
Share on Google+
28 Replies


P: n/a
v4vijayakumar wrote:
#include <string>
#include <iostream>
using namespace std;

int main()
{
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";

cout << str << endl;

return 0;
}
What do you find "interesting" about it? The string is appended
to its *end* keeping all characters that it already has, not to the
"last character before trailing null characters".

I am guessing that you find this behaviour different from that of
a C string. Yes, it's different. Since a C string cannot have any
other way of knowing its size, it has to keep track of the null chars
(since they are considered *terminating*). The C++ 'std::string' has
no such special meaning for the null character. It keeps track of
its size differently.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Jun 21 '07 #2

P: n/a
You're going to have to state exactly what it is that you find
interesting about this.

On 2007-06-21 09:31:50 -0700, v4vijayakumar
<vi******************@gmail.comsaid:
#include <string>
#include <iostream>
using namespace std;

int main()
{
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';
OK, str now contains the string "test\0".
>
str += "-test2";
str now contains the string "test\0-test2".
str += "-test3";
str now contains the string "test\0-test2-test3".
cout << str << endl;

return 0;
}

--
Clark S. Cox III
cl*******@gmail.com

Jun 21 '07 #3

P: n/a
On Jun 21, 9:49 pm, Clark Cox <clarkc...@gmail.comwrote:
You're going to have to state exactly what it is that you find
interesting about this.
...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...
Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]

Jun 21 '07 #4

P: n/a
v4vijayakumar wrote:
On Jun 21, 9:49 pm, Clark Cox <clarkc...@gmail.comwrote:
>You're going to have to state exactly what it is that you find
interesting about this.
>> ...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...

Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]
That's looks and sounds like a buggy compiler or library. Have
you tried it on any more recent (or just different) one?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Jun 21 '07 #5

P: n/a
v4vijayakumar wrote:
On Jun 21, 9:49 pm, Clark Cox <clarkc...@gmail.comwrote:
>You're going to have to state exactly what it is that you find
interesting about this.
>> ...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...

Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]
I'm sorry to tell you that is not an interesting string::resize
behaviour. It's a visual studio 6 behaviour. and I would call it *buggy*
behaviour, not interesting ;)

Regards,

Zeppe
Jun 21 '07 #6

P: n/a
On 2007-06-21 18:57, v4vijayakumar wrote:
On Jun 21, 9:49 pm, Clark Cox <clarkc...@gmail.comwrote:
>You're going to have to state exactly what it is that you find
interesting about this.
...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...

Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]
Well, surprise

Output is test -test2-test3

[ Tried in MS VS2005 ]

:-)

As a rule, don't trust anything VS6 does.

--
Erik Wikström
Jun 21 '07 #7

P: n/a
v4vijayakumar wrote:
On Jun 21, 9:49 pm, Clark Cox <clarkc...@gmail.comwrote:
>You're going to have to state exactly what it is that you find
interesting about this.
>> ...
string str;
str.resize(5);

str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';

str += "-test2";
str += "-test3";
cout << str << endl;
...

Well, surprise!

Output is not, "test-test1-test2", but just "test". :)

[ Tried in MS VS 6.0. ]
with gcc version 3.3.3 (A few years old itself):
g++ -o resize resize.cpp
./resize
test-test2-test3
>
Update your compiler.
Jun 21 '07 #8

P: n/a
Erik Wikström wrote:
Well, surprise

Output is test -test2-test3

[ Tried in MS VS2005 ]

:-)
With a space... interesting...
--
\|||/ Gennaro Prota - For hire
(o o) https://sourceforge.net/projects/breeze/
--ooO-(_)-Ooo----- (to mail: name . surname / yaho ! com)
Jun 21 '07 #9

P: n/a
Gennaro Prota wrote:
Erik Wikström wrote:
>Well, surprise

Output is test -test2-test3

[ Tried in MS VS2005 ]

:-)

With a space... interesting...
That's how that particular cout handles outputting null character.
Nothing to do with the language, I suppose. Implementation- and
platform-specific behaviour.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Jun 22 '07 #10

P: n/a
On Jun 21, 7:02 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
v4vijayakumar wrote:
On Jun 21, 9:49 pm, Clark Cox <clarkc...@gmail.comwrote:
You're going to have to state exactly what it is that you find
interesting about this.
...
string str;
str.resize(5);
> str[0] = 't';
str[1] = 'e';
str[2] = 's';
str[3] = 't';
str[4] = '\0';
> str += "-test2";
str += "-test3";
cout << str << endl;
...
Well, surprise!
Output is not, "test-test1-test2", but just "test". :)
[ Tried in MS VS 6.0. ]
That's looks and sounds like a buggy compiler or library.
The implementation of std::string in VC++ 6.0 didn't handle '\0'
in std::string correctly. In many cases, in fact, a string with
a '\0' would crash the program.
Have you tried it on any more recent (or just different) one?
This problem isn't present in the current version of the
compiler (at least in the cases where I'd seen it---the code
which didn't work then works now).

Note that a '\0' character in a string can have curious effects
on an output device. You're not allowed to output it to a
stream opened in text mode (like cout), so his results don't
actually prove a bug anywhere. To be sure, he should open a
file in binary mode, output to it, and then verify the contents.
It's quite possible that the phenomena that he is observing has
nothing to do with the bug I mention.

--
James Kanze (GABI Software, from CAI) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 22 '07 #11

P: n/a
On Jun 22, 2:02 am, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
Gennaro Prota wrote:
Erik Wikström wrote:
Well, surprise
Output is test -test2-test3
[ Tried in MS VS2005 ]
:-)
With a space... interesting...
That's how that particular cout handles outputting null character.
Nothing to do with the language, I suppose. Implementation- and
platform-specific behaviour.
Writing a '\0' character to a text stream is implementation
defined, yes. In practice, I suspect that the system is just
copying the bytes directly to the output device, and that it is
the tty device which determines what you see. But an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.

--
James Kanze (GABI Software, from CAI) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 22 '07 #12

P: n/a
James Kanze wrote:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.
Can you back this up with anything? Why is '\0' not text? I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text. It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Jun 22 '07 #13

P: n/a
James Kanze <ja*********@gmail.comwrote:
Note that a '\0' character in a string can have curious effects
on an output device. You're not allowed to output it to a
stream opened in text mode (like cout), so his results don't
actually prove a bug anywhere. To be sure, he should open a
file in binary mode, output to it, and then verify the contents.
I am working on a project where we have to deal with unprintable
characters. The incoming strings are in 7-bit ASCII, so it was easy to
write a function that replaced '\0' with "<NUL>", '\x0D' with "<CR>",
etc. so that we could easily see what characters were being received,
without having to look at hex output all the time.

For example, the original string "test\0-test2-test3" would then be
output as "test<NUL>-test2-test3". Obviously, there is the ambiguity of
whether the original string actually had the sequence '<', 'N', 'U',
'L', '>' or whether it was '\0', but since this was just for our
internal display purposes it wasn't an issue.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply
Jun 22 '07 #14

P: n/a
Victor Bazarov wrote:
James Kanze wrote:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.

Can you back this up with anything? Why is '\0' not text? I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text. It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?
Fairer to say that it's not a printable character.


Brian
Jun 22 '07 #15

P: n/a
Default User wrote:
Victor Bazarov wrote:
>James Kanze wrote:
>>[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.

Can you back this up with anything? Why is '\0' not text? I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text. It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?

Fairer to say that it's not a printable character.
Yes, but does it mean that it cannot be output to a stream opened
in "text" mode?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Jun 22 '07 #16

P: n/a
Victor Bazarov wrote:
Default User wrote:
Victor Bazarov wrote:
James Kanze wrote:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.
>
Can you back this up with anything? Why is '\0' not text? I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text. It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?
Fairer to say that it's not a printable character.

Yes, but does it mean that it cannot be output to a stream opened
in "text" mode?
I think it almost certainly can. I didn't really find much in the C++
standard on the topic, so I fell back to the C standard (C99 draft).

[#2] A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or more
characters plus a terminating new-line character. Whether
the last line requires a terminating new-line character is
implementation-defined. Characters may have to be added,
altered, or deleted on input and output to conform to
differing conventions for representing text in the host
environment. Thus, there need not be a one-to-one
correspondence between the characters in a stream and those
in the external representation. Data read in from a text
stream will necessarily compare equal to the data that were
earlier written out to that stream only if: the data consist |
only of printing characters and the control characters
horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last
character is a new-line character. Whether space characters
that are written out immediately before a new-line character
appear when read in is implementation-defined.
So there seems to be no problem writing a character of any type to a
text stream, although conversion of some characters may take place
(CRLF of course). I think the contention of \0 perhaps being altered is
correct, but I don't think it's correct to call it "not text". Control
characters, of which \0 is one, can be written to text streams.

If we're talking about displays:

5.2.2 Character display semantics

[#1] The active position is that location on a display
device where the next character output by the fputc or
fputwc function would appear. The intent of writing a |
printing character (as defined by the isprint or iswprint
function) to a display device is to display a graphic
representation of that character at the active position and
then advance the active position to the next position on the
current line. The direction of writing is locale-specific.
If the active position is at the final position of a line
(if there is one), the behavior is unspecified.

This is followed by a discussion of several control characters and
their defined behavior, but \0 is not one of them. I'm hesitant to
declare that writing \0 to a display device is undefined behavior. I'd
have to take it to those more expert in reading the standard than I am.


Brian (standards diving on a Friday afternoon)
Jun 22 '07 #17

P: n/a
On Jun 22, 2:52 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
James Kanze wrote:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.
Can you back this up with anything?
The standard. The only thing which has defined behavior when
output to a file opened in text mode are printable characters,
horizontal tab and new line. In addition, what happens with
trailing space in a line is not specified, and it's
implementation defined whether you're allowed to close a
non-empty file if the last character written was not a '\n'.

(The C++ standard defines file semantics by reference to the C
standard; this is in §7.9.2/2 of C90.)
Why is '\0' not text?
Because the standard says so.
I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text.
The standard doesn't define text. (At least I don't think it
does.) It defines the required semantics of a file opened in
text mode.
It's not part of the
basic character set, but that has nothing to do with "not text",
or does it?
I don't think so. The only possibly vague point is "printable
character"; I would interpret that to mean something for which
isprint() returns true, at least in some locale.

Note that most implementations actually do define a little bit
more:-). In both Unix and Windows, most characters will
probably pass transparently, even in text mode (suppose "C"
locale, anyway). But there can be surprises: try writing a file
with 0x1A somewhere in the middle, then rereading it.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 22 '07 #18

P: n/a
On Jun 22, 11:15 pm, "Default User" <defaultuse...@yahoo.comwrote:
Yes, but does it mean that it cannot be output to a stream opened
in "text" mode?
I think it almost certainly can. I didn't really find much in the C++
standard on the topic, so I fell back to the C standard (C99 draft).
[#2] A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or more
characters plus a terminating new-line character. Whether
the last line requires a terminating new-line character is
implementation-defined. Characters may have to be added,
altered, or deleted on input and output to conform to
differing conventions for representing text in the host
environment. Thus, there need not be a one-to-one
correspondence between the characters in a stream and those
in the external representation. Data read in from a text
stream will necessarily compare equal to the data that were
earlier written out to that stream only if: the data consist |
only of printing characters and the control characters
horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last
character is a new-line character. Whether space characters
that are written out immediately before a new-line character
appear when read in is implementation-defined.
So there seems to be no problem writing a character of any type to a
text stream, although conversion of some characters may take place
(CRLF of course).
That's not what's written above. The standard explicitly
doesn't define any semantics for writing non-printable
characters other than new line and horizontal tab. And when the
standard doesn't define the semantics of something, or
specifically say that it is unspecified or implementation
defined, the behavior is undefined.
I think the contention of \0 perhaps being altered is
correct, but I don't think it's correct to call it "not text". Control
characters, of which \0 is one, can be written to text streams.
But the behavior becomes undefined when you do so.

In practice, I've never had problems with '\0'. But under
Windows, '\032' traditionally did some funny things. The
wording above was introduced specifically to allow such funny
things.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 22 '07 #19

P: n/a
James Kanze wrote:
On Jun 22, 2:52 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
>James Kanze wrote:
>>[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.
>Can you back this up with anything?

The standard. The only thing which has defined behavior when
output to a file opened in text mode are printable characters,
horizontal tab and new line. In addition, what happens with
trailing space in a line is not specified, and it's
implementation defined whether you're allowed to close a
non-empty file if the last character written was not a '\n'.

(The C++ standard defines file semantics by reference to the C
standard; this is in §7.9.2/2 of C90.)
I have to admit that I don't have C90 handy, could you *please*
quote it? Thanks! Also, *please* quote the part of the C++
Standard that says that 'ostream' for text output is governed by
the same rules as C streams. Thanks a bunch!
>Why is '\0' not text?

Because the standard says so.
>I
cannot find any explicit definition of "text" in the Standard
that would say that '\0' is not text.

The standard doesn't define text. (At least I don't think it
does.) It defines the required semantics of a file opened in
text mode.
Where?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask
Jun 22 '07 #20

P: n/a
James Kanze wrote:
On Jun 22, 11:15 pm, "Default User" <defaultuse...@yahoo.comwrote:
Yes, but does it mean that it cannot be output to a stream opened
in "text" mode?
I think it almost certainly can. I didn't really find much in the
C++ standard on the topic, so I fell back to the C standard (C99
draft).
[#2] A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or more
characters plus a terminating new-line character. Whether
the last line requires a terminating new-line character is
implementation-defined. Characters may have to be added,
altered, or deleted on input and output to conform to
differing conventions for representing text in the host
environment. Thus, there need not be a one-to-one
correspondence between the characters in a stream and those
in the external representation. Data read in from a text
stream will necessarily compare equal to the data that were
earlier written out to that stream only if: the data consist
only of printing characters and the control characters
horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last
character is a new-line character. Whether space characters
that are written out immediately before a new-line character
appear when read in is implementation-defined.
So there seems to be no problem writing a character of any type to a
text stream, although conversion of some characters may take place
(CRLF of course).

That's not what's written above. The standard explicitly
doesn't define any semantics for writing non-printable
characters other than new line and horizontal tab. And when the
standard doesn't define the semantics of something, or
specifically say that it is unspecified or implementation
defined, the behavior is undefined.
I think the contention of \0 perhaps being altered is
correct, but I don't think it's correct to call it "not text".
Control characters, of which \0 is one, can be written to text
streams.

But the behavior becomes undefined when you do so.

In practice, I've never had problems with '\0'. But under
Windows, '\032' traditionally did some funny things. The
wording above was introduced specifically to allow such funny
things.

I'm not convinced that you're right, but I'm not convinced that I am
either. I'm going to cross-post this over to comp.lang.c (hence no
snippage, sorry).

We'll see what the gang says.


Brian
Jun 22 '07 #21

P: n/a

James Kanze <ja*********@gmail.comwrote in message...
On Jun 22, 11:15 pm, "Default User" <defaultuse...@yahoo.comwrote:
>I think it almost certainly can. I didn't really find much in the C++
standard on the topic, so I fell back to the C standard (C99 draft).
> [#2] A text stream is an ordered sequence of characters
[snip]
> ....; ** no new-line character is
immediately preceded by space characters; **
[ Sorry to side-step this thread. Just for a minute.]

That line caught my eye.
I wonder if it's 'C99' mangling the sigs?
( as in our discussion in the other thread.)

--
Bob R
POVrookie
Jun 23 '07 #22

P: n/a
BobR wrote:
I wonder if it's 'C99' mangling the sigs?
( as in our discussion in the other thread.)
Speaking of which, your post earlier in the day (21:22:54 GMT) had the
busted .sig, but later ones like this one were correct.

Brian
Jun 23 '07 #23

P: n/a
Default User wrote:
James Kanze wrote:
>On Jun 22, 11:15 pm, "Default User" <defaultuse...@yahoo.comwrote:
>>>Yes, but does it mean that it cannot be output to a stream opened
in "text" mode?
I think it almost certainly can. I didn't really find much in the
C++ standard on the topic, so I fell back to the C standard (C99
draft).
[#2] A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or more
characters plus a terminating new-line character. Whether
the last line requires a terminating new-line character is
implementation-defined. Characters may have to be added,
altered, or deleted on input and output to conform to
differing conventions for representing text in the host
environment. Thus, there need not be a one-to-one
correspondence between the characters in a stream and those
in the external representation. Data read in from a text
stream will necessarily compare equal to the data that were
earlier written out to that stream only if: the data consist
only of printing characters and the control characters
horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last
character is a new-line character. Whether space characters
that are written out immediately before a new-line character
appear when read in is implementation-defined.
So there seems to be no problem writing a character of any type to a
text stream, although conversion of some characters may take place
(CRLF of course).
That's not what's written above. The standard explicitly
doesn't define any semantics for writing non-printable
characters other than new line and horizontal tab. And when the
standard doesn't define the semantics of something, or
specifically say that it is unspecified or implementation
defined, the behavior is undefined.
>>I think the contention of \0 perhaps being altered is
correct, but I don't think it's correct to call it "not text".
Control characters, of which \0 is one, can be written to text
streams.
But the behavior becomes undefined when you do so.

In practice, I've never had problems with '\0'. But under
Windows, '\032' traditionally did some funny things. The
wording above was introduced specifically to allow such funny
things.


I'm not convinced that you're right, but I'm not convinced that I am
either. I'm going to cross-post this over to comp.lang.c (hence no
snippage, sorry).

We'll see what the gang says.


Brian
[ again, no snippage ]

The '\032' is 0x1a, ^Z or 26, the infamous EOF character from CPM and
PCDOS. Many M$ apps still append this character to text files to be
compatible with CPM file systems. In "r" mode you'll never see this
character.

Whether '\0' should be legal in text files is contentious and I say no.
In my view the '\0' has no place in a text file which by definition
consists of lines, not strings.

A line is an input 'stream' of char terminated with '\n'.
A string is an array of char in memory and terminated with '\0'.

The two should not be confused.
--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
Jun 23 '07 #24

P: n/a
Joe Wright wrote:
Default User wrote:
>James Kanze wrote:
>>On Jun 22, 11:15 pm, "Default User" <defaultuse...@yahoo.comwrote:

Yes, but does it mean that it cannot be output to a stream opened
in "text" mode?
I think it almost certainly can. I didn't really find much in the
C++ standard on the topic, so I fell back to the C standard (C99
draft).
[#2] A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or more
characters plus a terminating new-line character. Whether
the last line requires a terminating new-line character is
implementation-defined. Characters may have to be added,
altered, or deleted on input and output to conform to
differing conventions for representing text in the host
environment. Thus, there need not be a one-to-one
correspondence between the characters in a stream and those
in the external representation. Data read in from a text
stream will necessarily compare equal to the data that were
earlier written out to that stream only if: the data consist
only of printing characters and the control characters
horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last
character is a new-line character. Whether space characters
that are written out immediately before a new-line character
appear when read in is implementation-defined.
So there seems to be no problem writing a character of any type to a
text stream, although conversion of some characters may take place
(CRLF of course).
That's not what's written above. The standard explicitly
doesn't define any semantics for writing non-printable
characters other than new line and horizontal tab. And when the
standard doesn't define the semantics of something, or
specifically say that it is unspecified or implementation
defined, the behavior is undefined.

I think the contention of \0 perhaps being altered is
correct, but I don't think it's correct to call it "not text".
Control characters, of which \0 is one, can be written to text
streams.
But the behavior becomes undefined when you do so.
[...]
Whether '\0' should be legal in text files is contentious and I say no.
The question was not whether it should be allowed, but whether
outputting it yields undefined behavior. I see only two
interpretations of the above:

a) yes, it is undefined behavior
b) it is "ok"; but you don't have any guarantee on
reading back the file

Perhaps both interpretations need some more backup from the standard.

--
\|||/ Gennaro Prota - For hire
(o o) https://sourceforge.net/projects/breeze/
--ooO-(_)-Ooo----- (to mail: name . surname / yaho ! com)
Jun 23 '07 #25

P: n/a
Gennaro Prota wrote:
Joe Wright wrote:
>Whether '\0' should be legal in text files is contentious and I say no.

The question was not whether it should be allowed, but whether
outputting it yields undefined behavior.
Oops, sorry; I meant "not whether it *should* be, but whether it *is*
(legal)".
--
\|||/ Gennaro Prota - For hire
(o o) https://sourceforge.net/projects/breeze/
--ooO-(_)-Ooo----- (to mail: name . surname / yaho ! com)
Jun 23 '07 #26

P: n/a
On Jun 23, 5:53 pm, Gennaro Prota <inva...@yahoo.comwrote:
Joe Wright wrote:
[...]
Whether '\0' should be legal in text files is contentious and I say no.
The question was not whether it should be allowed, but whether
outputting it yields undefined behavior. I see only two
interpretations of the above:
a) yes, it is undefined behavior
b) it is "ok"; but you don't have any guarantee on
reading back the file
Perhaps both interpretations need some more backup from the standard.
I'm not sure what the second even means with regards to the
standard. If you write it, and it is not undefined behavior,
what is it: unspecified? implementation defined? The standard
doesn't have all that many categories for such things.

In the definition of "undefined behavior" in the C standard,
there is a phrase "Undefined behavior is otherwise indicated in
this International Standard [...] or by the omission of any
explicit definition of the behavior." I think that rather
covers this case, unless someone can show some explicit
definition. (Saying that something is "unspecified" or
"implementation defined" is an explicit definition of the
behavior.)

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 23 '07 #27

P: n/a
On Jun 23, 2:40 am, "BobR" <removeBadB...@worldnet.att.netwrote:
James Kanze <james.ka...@gmail.comwrote in message...
On Jun 22, 11:15 pm, "Default User" <defaultuse...@yahoo.comwrote:
I think it almost certainly can. I didn't really find much in the C++
standard on the topic, so I fell back to the C standard (C99 draft).
[#2] A text stream is an ordered sequence of characters
[snip]
....; ** no new-line character is
immediately preceded by space characters; **
[ Sorry to side-step this thread. Just for a minute.]
That line caught my eye.
I wonder if it's 'C99' mangling the sigs?
( as in our discussion in the other thread.)
The same thought occured to me:-). Whatever else might be the
case, requiring a trailing space is really bad, since it means
that you cannot use standard C or C++ in any of your
implementation. (Except in binary mode. But in fact, I'd say
that anything that leaves your local disk has to be in binary
mode anyway, since you need to control exactly what it looks
like; NNTP requires CRLF as a line end, for example, even if
that's not what you get with a text file under Unix.)

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 23 '07 #28

P: n/a
On Jun 23, 12:09 am, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
James Kanze wrote:
On Jun 22, 2:52 pm, "Victor Bazarov" <v.Abaza...@comAcast.netwrote:
James Kanze wrote:
[..writing '\0' to a text stream is implementation-defined..]
an
implementation can map the '\0' character to something else in a
text file, or even use some system API which treats the buffer
as a null terminated string. Text streams are for text, and
'\0' is not text.
Can you back this up with anything?
The standard. The only thing which has defined behavior when
output to a file opened in text mode are printable characters,
horizontal tab and new line. In addition, what happens with
trailing space in a line is not specified, and it's
implementation defined whether you're allowed to close a
non-empty file if the last character written was not a '\n'.
(The C++ standard defines file semantics by reference to the C
standard; this is in §7.9.2/2 of C90.)
I have to admit that I don't have C90 handy, could you *please*
quote it?
You are difficult:-). I don't have it on line either---only in
paper format---, so I can't copy paste it. FWIW (and modulo any
typos):

§7.9.2 Streams (paragraph 2)

A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or
more characters plus a terminating new-line character.
Whether the last line requires a terminating new-line
character is implementation-defined. Characters may
have to be added, altered, or deleted on input and
output to conform to differing conventions for
representing text in the host environment. Thus, there
need not be a one-to-one correspondance between
characters in the stream and those in the external
representation. Data read in from a text stream will
necessarily compare equal to data that were earlier
written out to that stream only if: the data consist
only of printable characters and the control characters
horizontal tab and new-line; no new-line character is
immediately preceded by space characters; and the last
character is a new-line character. Whether space
characters that are written out immediately before a
new-line character appear when read in is
implementation-defined.

Notice particularly the sentence which starts "Data read in
from a text stream [...]". That is the only sentence I know
of in the C standard where the semantics of writing to a
text file are defined. And in §3.16 (the definition of
"undefined behavior") we have "Undefined behavior is
otherwise indicated in this International Standard [...], or
by the omission of any explicit defintion of behavior."

In other words, the burden of proof concerning undefined
behavior is on those claiming defined behavior. In this
case, it's possible that there is text somewhere else, that
I've missed, that defines the behavior. But the behavior
must be considered undefined until someone shows that text.
(The way the requirements in §7.9.2 are worded certainly
suggests that the authors don't expect writing a '\0' to
e.g. reformat your hard disk. But they only have a limited
number of categories to play with: defined,
implemenation-defined, unspecified and undefined behavior.
And since the first three don't really fit with what they
intend to allow, they are stuck with the last.)
Thanks! Also, *please* quote the part of the C++
Standard that says that 'ostream' for text output is governed by
the same rules as C streams.
That's all over the place. Anywhere you look for the
defined behavior or the semantics of C++ IO, you end up
(usually after having chased references through numerous
sections) with a reference to the C standard. In this case,
§27.8.1.1/2 "The restrictions on reading and writing a
sequence controlled by an object of class
basic_filebuf<charT,traitsare the same as for reading and
writing with the Standard C library FILEs", for starters;
the definition of the modes for basic_filebuf::open in
§27.8.1.3 also just map them to fopen in C.

[...]
The standard doesn't define text. (At least I don't think it
does.) It defines the required semantics of a file opened in
text mode.
Where?
By reference to the C standard. And in the C standard,
section §7.9.2.

--
James Kanze (Gabi Software) email: ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 23 '07 #29

This discussion thread is closed

Replies have been disabled for this discussion.