473,324 Members | 2,193 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

substring

Hey,
I would like to know 2 things.
1)Is there any function (in C standard library) that extracts a
substring from a string?
2)Is there any function (in C standard library) that returns the
position of a substring in a string?

Thx a lot...
Nov 13 '05 #1
62 31819

"Jupiter" <ze******@yahoo.com> wrote in message
news:27**************************@posting.google.c om...
Hey,
I would like to know 2 things.
1)Is there any function (in C standard library) that extracts a
substring from a string? Not directly -- this has to be emulated with a more general
function such as memcpy (how exactly depends on the
destination storage for the substring).
2)Is there any function (in C standard library) that returns the
position of a substring in a string?

Yes:
char* pos = strstr(str_to_search_through,str_to_find);
(null if not found).

hth, Ivan
--
http://ivan.vecerina.com
Nov 13 '05 #2
"Jupiter" <ze******@yahoo.com> wrote in message
news:27**************************@posting.google.c om...
Hey,
I would like to know 2 things.
1)Is there any function (in C standard library) that extracts a
substring from a string?
Yes.
2)Is there any function (in C standard library) that returns the
position of a substring in a string?
Yes.
Thx a lot...


The questions you presumably *wanted* to ask were "what is the function
to..." :-)

To extract a substring on positions n to m from a string s you can use

strncpy(dest, s + n, m - n + 1);
dest[m - n] = 0;

(or in short strncpy(dest, s + n, m - n + 1)[m - n] = 0;)

Look up strncpy in your manual to see how it works and why adding the
zero is necessary. You must also make sure beforehand that:
a) s is at leat n + 1 characters long and
b) you have enough space in dest for m - n + 1 characters.

To find a substring in a string you can use

char * position = strstr(substring, string);

If you want an offset rather than a pointer to the substring, use

size_t offset = position - string;

Again, look up strstr in your manual for reference. Please note that
strstr returns NULL if the substring is not foud in string, you must
check for that.
Nov 13 '05 #3

"Peter Pichler" <pi*****@pobox.sk> wrote in message
news:UO***************@newsfep1-gui.server.ntli.net...
To find a substring in a string you can use

char * position = strstr(substring, string); .... Again, look up strstr in your manual for reference.


Err, *I* should have looked up strstr in the manual. Swap the two
parameters around. Doh!
Nov 13 '05 #4
In article <27**************************@posting.google.com >, Jupiter wrote:
Hey,
I would like to know 2 things.
1)Is there any function (in C standard library) that extracts a
substring from a string?
Yes, strncpy() can be made to copy a substring of one string
into another:

#include <string.h> /* for strncpy() */
#include <stdio.h> /* for printf() */

int
main(void)
{
char msg[] = "Hello World!";
char submsg[10]; /* Must be long enough */

/* Copy the substring "o W" from msg to submsg */
strncpy(submsg, &msg[4], 3);

/* Terminate the resulting string since strncpy() doesn't */
submsg[3] = '\0';

printf("msg[] = '%s'\nsubmsg[] = '%s'\n", msg, submsg);

return 0;
}

2)Is there any function (in C standard library) that returns the
position of a substring in a string?
No, but you may use strstr() like this:

#include <string.h> /* for strstr() */
#include <stdio.h> /* for printf(), fprintf() */
#include <stddef.h> /* for ptrdiff_t, NULL */

int
main(void)
{
char msg[] = "Hello World!";
char *ptr;

ptrdiff_t ptrpos;

/* Locate the substring "o W" in msg */
ptr = strstr(msg, "o W");

if (ptr == NULL) {
fprintf(stderr, "Substring not found\n");
} else {
/* Calculate the position of ptr in msg */
ptrpos = ptr - &msg[0];

printf("Position of 'o W' in '%s' is %d\n", msg, ptrpos);
}

return 0;
}

Thx a lot...


Wlcm a lot...
--
Andreas Kähäri
Nov 13 '05 #5


Jupiter wrote:
Hey,
I would like to know 2 things.
1)Is there any function (in C standard library) that extracts a
substring from a string?
2)Is there any function (in C standard library) that returns the
position of a substring in a string?


Yes. Take a look at http://www-ccs.ucsd.edu/c/string.html for a list of
the string functions in string.h. Buying a copy of K&R 2 wouldn't be a
bad idea either...

Ed.

Nov 13 '05 #6
Greetings.

In article <3f********@news.swissonline.ch>, Ivan Vecerina wrote:
1)Is there any function (in C standard library) that extracts a
substring from a string?


Not directly -- this has to be emulated with a more general
function such as memcpy (how exactly depends on the
destination storage for the substring).


Eh? You got something against strncpy() or something?

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #7


Tristan Miller wrote:
Greetings.

In article <3f********@news.swissonline.ch>, Ivan Vecerina wrote:
1)Is there any function (in C standard library) that extracts a
substring from a string?


Not directly -- this has to be emulated with a more general
function such as memcpy (how exactly depends on the
destination storage for the substring).

Eh? You got something against strncpy() or something?


'Something' might be more appropriate than strncpy. It depends on
the definition of 'extract'. I interpret this to mean to remove a
substring from a string.

Take string:
char s[] = "Have a very good day"
and extract the substring "very " to make s the
string "Have a good day".

If this is the intent of the OP, then a solution would not involve
strncpy or memcpy. A function duo of strstr and memmove would
do the job.

--
Al Bowers
Tampa, Fl USA
mailto: xa******@myrapidsys.com (remove the x to send email)
http://www.geocities.com/abowers822/

Nov 13 '05 #8
Tristan Miller wrote:
Greetings.

In article <3f********@news.swissonline.ch>, Ivan Vecerina wrote:
1)Is there any function (in C standard library) that extracts a
substring from a string?


Not directly -- this has to be emulated with a more general
function such as memcpy (how exactly depends on the
destination storage for the substring).


Eh? You got something against strncpy() or something?


Well, as a substring extractor, it's suboptimal.

Consider, for example:

char foo[16];
char bar[16] = "abcdefghijklmno";

strncpy(foo, bar, 3);

At the end of this operation, foo does not contain a string. Oops.

--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
Nov 13 '05 #9
Greetings.

In article <bo**********@hercules.btinternet.com>, Richard Heathfield wrote:
Consider, for example:

char foo[16];
char bar[16] = "abcdefghijklmno";

strncpy(foo, bar, 3);

At the end of this operation, foo does not contain a string. Oops.


Well, it doesn't contain a nul-terminated string, but it does contain the
first three characters of bar, which may be all that is needed in some
cases. Does the C standard use the term "string" to refer to both nul- and
non-nul-terminated strings, or does it make a nomenclatural distinction
between "string" (which always includes the sentinel) and the more general
"array-of-char"?

--
_
_V.-o Tristan Miller [en,(fr,de,ia)] >< Space is limited
/ |`-' -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= <> In a haiku, so it's hard
(7_\\ http://www.nothingisreal.com/ >< To finish what you
Nov 13 '05 #10
Tristan Miller wrote:
Does the C standard use the term "string" to refer to both nul-
and non-nul-terminated strings, or does it make a nomenclatural
distinction between "string" (which always includes the sentinel) and the
more general "array-of-char"?


The C Standard defines a string to be an array of characters terminated by
the first null character. If you want the exact wording, I'm sure someone
can oblige.

--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
Nov 13 '05 #11
On Sun, 02 Nov 2003 21:51:51 +0100, in comp.lang.c , Tristan Miller
<ps********@nothingisreal.com> wrote:
cases. Does the C standard use the term "string" to refer to both nul- and
non-nul-terminated strings


C defines a string as
7.1.1 (1) A string is a contiguous sequence of characters terminated
by and including the first null character.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
Nov 13 '05 #12
Richard Heathfield wrote:

Tristan Miller wrote:
Does the C standard use the term "string" to refer to both nul-
and non-nul-terminated strings, or does it make a nomenclatural
distinction between "string"
(which always includes the sentinel) and the
more general "array-of-char"?
The C Standard defines a string to be an array
of characters terminated by the first null character.


The word "array" is conspicuously absent from the
standard definition of string.
If a program can determine whether or not two seperately
allocated objects are contigous,
then a string may span them if they are contiguous.
Also, an array may contain several strings.
If you want the exact wording, I'm sure someone
can oblige.


Here's the 411 from the C89 last public draft:
4. LIBRARY
4.1 INTRODUCTION
4.1.1 Definitions of terms

A string is a contiguous sequence of characters terminated by and
including the first null character. It is represented by a pointer to
its initial (lowest addressed) character and its length is the number
of characters preceding the null character.

--
pete
Nov 13 '05 #13
pete wrote:
Richard Heathfield wrote:

The C Standard defines a string to be an array
of characters terminated by the first null character.

The word "array" is conspicuously absent from the
standard definition of string.
If a program can determine whether or not two seperately
allocated objects are contigous,
then a string may span them if they are contiguous.


It would be a useless string though since neither pointer arithmetic
or the array index operator is defined for accessing these things.

Lets assume that the following two arrays are contigous in memory.
(s2 follows s1 directly).

char s1[3] = "123";
char s2[4] = "456";

You cannot do this:

int i = 0;
while(s1[i] != '\0')
putchar(s1[i++]);

Since it will cause UB. Same arguments applies to pointers (since the
above really operates on pointers anyway).

I am not so sure if the same goes for arguments to library functions.
But the whole string will be accessed by the same pointer so I would
say you cannot do the above. The avoidance of the word array in the
standard might be to allow string literals to be strings proper.

So s1 and s2 constitutes a singe string but you will have to treat the
arrays separately and cannot draw any benefit from them being contigous.

--
Thomas.

Nov 13 '05 #14
Thomas Stegen wrote:

pete wrote:
Richard Heathfield wrote:
The C Standard defines a string to be an array
of characters terminated by the first null character.

The word "array" is conspicuously absent from the
standard definition of string.
If a program can determine whether or not two seperately
allocated objects are contigous,
then a string may span them if they are contiguous.


It would be a useless string


It's just some C trivia.
though since neither pointer arithmetic
or the array index operator is defined for accessing these things.

Lets assume that the following two arrays are contigous in memory.
(s2 follows s1 directly).

char s1[3] = "123";
char s2[4] = "456";

You cannot do this:

int i = 0;
while(s1[i] != '\0')
putchar(s1[i++]);

Since it will cause UB. Same arguments applies to pointers (since the
above really operates on pointers anyway).
I am not so sure if the same goes for arguments to library functions.


If the arrays are shown to be contiguous,
then you can have
puts(s1);

--
pete
Nov 13 '05 #15
pete <pf*****@mindspring.com> wrote:
Thomas Stegen wrote:
char s1[3] = "123";
char s2[4] = "456";

You cannot do this:

int i = 0;
while(s1[i] != '\0')
putchar(s1[i++]);

Since it will cause UB.


If the arrays are shown to be contiguous,
then you can have
puts(s1);


No, you can't. Any reference to s1[3] and above, including those
implicit in puts(s1), invoke undefined behaviour. Although it is true
that on many architectures this instance of UB will behave as if it is
defined, you cannot rely on that.

Richard
Nov 13 '05 #16
Richard Bos wrote:

pete <pf*****@mindspring.com> wrote:
Thomas Stegen wrote:
char s1[3] = "123";
char s2[4] = "456";

You cannot do this:

int i = 0;
while(s1[i] != '\0')
putchar(s1[i++]);

Since it will cause UB.
If the arrays are shown to be contiguous,
then you can have
puts(s1);


No, you can't.

Any reference to s1[3] and above, including those
implicit in puts(s1), invoke undefined behaviour.


That doesn't matter.
If you give puts a pointer to a string,
then the behavior is defined.
How puts accomplishes the behavior, is up to the implementors.
If two objects are being spanned by a string,
and if puts doesn't want to index across them,
then puts may deal with the objects seperately,
or do it some other way.

--
pete
Nov 13 '05 #17
pete <pf*****@mindspring.com> wrote:
Richard Bos wrote:

pete <pf*****@mindspring.com> wrote:
Thomas Stegen wrote:

> char s1[3] = "123";
> char s2[4] = "456"; If the arrays are shown to be contiguous,
then you can have
puts(s1);
Any reference to s1[3] and above, including those
implicit in puts(s1), invoke undefined behaviour.
That doesn't matter.


Yes, it does.
If you give puts a pointer to a string,
then the behavior is defined.


s1 is not a string. It may happen to look like one on your favourite
architecture, but that doesn't make it one.

Richard
Nov 13 '05 #18
Richard Bos wrote:

pete <pf*****@mindspring.com> wrote:
Richard Bos wrote:

pete <pf*****@mindspring.com> wrote:

> Thomas Stegen wrote:
>
> > char s1[3] = "123";
> > char s2[4] = "456"; If the arrays are shown to be contiguous,
> then you can have
> puts(s1); Any reference to s1[3] and above, including those
implicit in puts(s1), invoke undefined behaviour.


That doesn't matter.


Yes, it does.
If you give puts a pointer to a string,
then the behavior is defined.


s1 is not a string. It may happen to look like one on your favourite
architecture, but that doesn't make it one.


I believe we are only talking about cases where s1 and s3
are shown to be contiguous.
In that case s1, satisfies the defintion for "pointer to a string"

N869

7.1.1 Definitions of terms

[#1] A string is a contiguous sequence of characters
terminated by and including the first null character. The
term multibyte string is sometimes used instead to emphasize
special processing given to multibyte characters contained
in the string or to avoid confusion with a wide string. A
pointer to a string is a pointer to its initial (lowest
addressed) character.

--
pete
Nov 13 '05 #19
pete wrote:

Richard Bos wrote:

pete <pf*****@mindspring.com> wrote:
Richard Bos wrote:
>
> pete <pf*****@mindspring.com> wrote:
>
> > Thomas Stegen wrote:
> >
> > > char s1[3] = "123";
> > > char s2[4] = "456";
> > If the arrays are shown to be contiguous,
> > then you can have
> > puts(s1);

> Any reference to s1[3] and above, including those
> implicit in puts(s1), invoke undefined behaviour.

That doesn't matter.


Yes, it does.
If you give puts a pointer to a string,
then the behavior is defined.


s1 is not a string. It may happen to look like one on your favourite
architecture, but that doesn't make it one.


I believe we are only talking about cases where

s1 and s3
That should be "s1 and s2".
are shown to be contiguous.
In that case s1, satisfies the defintion for "pointer to a string"

N869

7.1.1 Definitions of terms

[#1] A string is a contiguous sequence of characters
terminated by and including the first null character. The
term multibyte string is sometimes used instead to emphasize
special processing given to multibyte characters contained
in the string or to avoid confusion with a wide string. A
pointer to a string is a pointer to its initial (lowest
addressed) character.


--
pete
Nov 13 '05 #20


pete wrote:
>>char s1[3] = "123";

In that case s1, satisfies the defintion for "pointer to a string"

N869

7.1.1 Definitions of terms

[#1] A string is a contiguous sequence of characters
terminated by and including the first null character. The
term multibyte string is sometimes used instead to emphasize
special processing given to multibyte characters contained
in the string or to avoid confusion with a wide string. A
pointer to a string is a pointer to its initial (lowest
addressed) character.


No.
Consider the declaration and initialization.
char s1[3] = "123";

You have declared an array of 3 characters and assigned the characters
'1','2','3' to this array. s1[0] has the value '1'. s1[1] has the
value '2'. s1[3] has the value '3'.

Where in this character array is there a contigous sequence of
characters terminated by and including the first null character?

Answer: There is no null character in the array, thus the array
does not represent a string.

--
Al Bowers
Tampa, Fl USA
mailto: xa******@myrapidsys.com (remove the x to send email)
http://www.geocities.com/abowers822/

Nov 13 '05 #21
Al Bowers wrote:

pete wrote:
>>>char s1[3] = "123";
In that case s1, satisfies the defintion for "pointer to a string"

N869

7.1.1 Definitions of terms

[#1] A string is a contiguous sequence of characters
terminated by and including the first null character. The
term multibyte string is sometimes used instead to emphasize
special processing given to multibyte characters contained
in the string or to avoid confusion with a wide string. A
pointer to a string is a pointer to its initial (lowest
addressed) character.


No.
Consider the declaration and initialization.
char s1[3] = "123";

You have declared an array of 3 characters and assigned the characters
'1','2','3' to this array. s1[0] has the value '1'. s1[1] has the
value '2'. s1[3] has the value '3'.

Where in this character array is there a contigous sequence of
characters terminated by and including the first null character?


Nowhere in the array.
However, since strings are not confined to arrays,
what difference does it make ?
Answer: There is no null character in the array, thus the array
does not represent a string.


That's been my point all along.
But you snipped the relevant part of the post,
which describes the specific case under dsicussion.
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}

--
pete
Nov 13 '05 #22
In <3F***********@mindspring.com> pete <pf*****@mindspring.com> writes:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}


I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #23
On 4 Nov 2003 17:59:19 GMT, Da*****@cern.ch (Dan Pop) wrote:
In <3F***********@mindspring.com> pete <pf*****@mindspring.com> writes:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}


I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.


How is it different from the puts() call above? Surely, "array" means a
consecutive "string" of characters, not a C array, otherwise, char *s1 coming
from a successful calloc() would cause undefined behaviour right away when fed
to printf().

Nov 13 '05 #24
On Tue, 04 Nov 2003 15:50:57 GMT, in comp.lang.c , pete
<pf*****@mindspring.com> wrote:
Al Bowers wrote:

Where in this character array is there a contigous sequence of
characters terminated by and including the first null character?


Nowhere in the array.
However, since strings are not confined to arrays,
what difference does it make ?


It makes the difference that puts requires a string, and s1 is not a
string since it has no null terminator. So when puts is putting chars
to stdout, it will read from memory beyond the region allocated to s1,
and this is disallowed.

The fact that somewhere in memory nearby there is a null doesn't mean
that magically s1 becomes a string. It merely means that puts will by
good luck stop sending data to stdout.
Answer: There is no null character in the array, thus the array
does not represent a string.


That's been my point all along.


I'm confused. Are you agreeing that this is UB or not?

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
Nov 13 '05 #25
Mark McIntyre wrote:
Answer: There is no null character in the array, thus the array
does not represent a string.


That's been my point all along.

I'm confused. Are you agreeing that this is UB or not?


I think it is quite clear that this is not as clear as one might
wish.

Have a closer look at the examples given and note that s1 and
s2 indeed does constitute a string if they happen to be
contigous in memory. In particular note that the standard
does not include the mention of the word array when it defines
the term string.

--
Thomas.

Nov 13 '05 #26
rihad wrote:
Da*****@cern.ch (Dan Pop) wrote:
pete <pf*****@mindspring.com> writes:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}


I entirely agree that, according to the wording of the standard,
this code snippet is correct. There is no consensus in
comp.std.c on whether this is the intent of the standard or not,
but the actual wording is unambiguous. However, if you replace
the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including)
a terminating null character; if the precision is specified, no
more than that many characters are written. If the precision
is not specified or is greater than the size of the array, the
array shall contain a null character.


How is it different from the puts() call above? Surely, "array" means
a consecutive "string" of characters, not a C array, otherwise, char
*s1 coming from a successful calloc() would cause undefined behaviour
right away when fed to printf().


Very simply. The first checks the contiguity of the two arrays
(not guaranteed) before using the first as a string. The second
simply uses it as a string. The expression "s1 + sizeof s1" is
specifically valid because it points one beyond the actual array.
The expression s2 is valid by definition. You can replace the
puts() call with the printf() call (and vice-versa) without
altering the validity/invalidity of the two fragments.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 13 '05 #27

"Dan Pop" <Da*****@cern.ch> schrieb im Newsbeitrag
news:bo**********@sunnews.cern.ch...
In <3F***********@mindspring.com> pete <pf*****@mindspring.com> writes:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {

^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Just curious - if I don't misunderstand 6.5.9 verse 6 (in N869) the above
statement is correct, and if the expression "s2 == s1 + sizeof s1" yields
true, s1 and s2 form a continguous sequence of 7 char objects which contains
a '\0' in the last position.
Isn't in this case the resulting object the same (except of scope) as if we
wrote
char *s1 = malloc(7);
if(s1)
{
strcpy(s1, "123456");
}
Here we also did not explicitely define an array, but we definitely created
a string.
Do you know any possibility for an implementation to produce different
results (provided the expression "s2 == s1 + sizeof s1" yields true)?

Robert
Nov 13 '05 #28


Dan Pop wrote:
In <3F***********@mindspring.com> pete <pf*****@mindspring.com> writes:

Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}

I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.


What about some of the string handling functions?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(buf,s1);
}

In section:
7.21.1 String function conventions
You have:
If an array is accessed beyond the end of an object, the behavior is
undefined.

--
Al Bowers
Tampa, Fl USA
mailto: xa******@myrapidsys.com (remove the x to send email)
http://www.geocities.com/abowers822/

Nov 13 '05 #29
In <c1********************************@4ax.com> rihad <ri***@mail.ru> writes:
On 4 Nov 2003 17:59:19 GMT, Da*****@cern.ch (Dan Pop) wrote:
In <3F***********@mindspring.com> pete <pf*****@mindspring.com> writes:
Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}
I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.


How is it different from the puts() call above?


puts doesn't expect an array, it merely expects a string.

The puts function writes the string pointed to by s to the stream
pointed to by stdout, and appends a new-line character to the output.
The terminating null character is not written.
Surely, "array" means a
consecutive "string" of characters, not a C array, otherwise, char *s1 coming
Array means whatever the standard defines as an array:

* An array type describes a contiguously allocated set of objects
with a particular member object type, called the element type. Array
types are characterized by their element type and by the number of
members of the array.
from a successful calloc() would cause undefined behaviour right away when fed
to printf().


Wrong:

The calloc function allocates space for an array of nmemb objects,
^^^^^
each of whose size is size.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #30
In <3f***********************@newsreader02.highway.te lekom.at> "Robert Stankowic" <pc******@netway.at> writes:

"Dan Pop" <Da*****@cern.ch> schrieb im Newsbeitrag
news:bo**********@sunnews.cern.ch...
In <3F***********@mindspring.com> pete <pf*****@mindspring.com> writes:
>Specifically, we're discussing the case where the program
>has determined whether or not s1 and s2 are contiguous,
>and only the case where they are contiguous.
>
>char s1[3] = "123";
>char s2[4] = "456";
>
>if (s2 == s1 + sizeof s1) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Just curious - if I don't misunderstand 6.5.9 verse 6 (in N869) the above
statement is correct, and if the expression "s2 == s1 + sizeof s1" yields
true, s1 and s2 form a continguous sequence of 7 char objects which contains
a '\0' in the last position.
Isn't in this case the resulting object the same (except of scope) as if we
wrote
char *s1 = malloc(7);
if(s1)
{
strcpy(s1, "123456");
}
Here we also did not explicitely define an array, but we definitely created
a string.


The malloc call has created a *single* object that can be treated as an
array of 7 char, while the definitions of s1 and s2 create two different
objects that cannot be treated as a single object, even if adjacent.
Do you know any possibility for an implementation to produce different
results (provided the expression "s2 == s1 + sizeof s1" yields true)?


Of course. Any implementation doing array bound checking *properly*
should object in the s1/s2 case. The big challenge of such an
implementation is NOT to object to the puts call.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #31
In <bo*************@ID-169908.news.uni-berlin.de> Al Bowers <xa******@rapidsys.com> writes:


Dan Pop wrote:
In <3F***********@mindspring.com> pete <pf*****@mindspring.com> writes:

Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}

I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.


What about some of the string handling functions?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(buf,s1);
}

In section:
7.21.1 String function conventions
You have:
If an array is accessed beyond the end of an object, the behavior is
undefined.


There is an unfortunate conflict between your identifiers and the
parameter names used by the standard in the description of strcpy.
I'm using s1 and s2 with their meanings in the C standard, below.

The strcpy function copies the string pointed to by s2 (including
^^^^^^^^^^
the terminating null character) into the array pointed to by s1.
^^^^^^^^^

Still no problem, since only the s1 argument is supposed to point to an
array. And buf is large enough to hold a 6 character string.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #32
On 5 Nov 2003 11:52:35 GMT, Da*****@cern.ch (Dan Pop) wrote:
In <c1********************************@4ax.com> rihad <ri***@mail.ru> writes:
Surely, "array" means a
consecutive "string" of characters, not a C array, otherwise, char *s1 coming


(By a "C array" I meant char a[] = "hello"; printf("%s\n", a);)
Array means whatever the standard defines as an array:

* An array type describes a contiguously allocated set of objects
with a particular member object type, called the element type. Array
types are characterized by their element type and by the number of
members of the array.


Then given char *p = calloc(1, 1); p points to an array of one char (barring
nomem)? And given char c = 0; &c points to an array of one char? Sorry, but I
read the above as if int i; meant array of 1 int.
from a successful calloc() would cause undefined behaviour right away when fed
to printf().


Wrong:

The calloc function allocates space for an array of nmemb objects,
^^^^^
each of whose size is size.


Sorry, but I *really* fail to understand why substituting the puts(s1); call
below with printf("%s\n", s1); suddenly invokes undefined behaviour, as you have
pointed out. &s1[0] points to an array of objects. The array is ended by a
((char) 0). Nowhere in the range of [ (s1 + 0) .. (s1 + sizeof s1 + sizeof s2) )
is an uninitialized object being accessed for reading (assuming the if holds
true, which is just a compile time constant IIRC).

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}

Nov 13 '05 #33


Dan Pop wrote:
In <bo*************@ID-169908.news.uni-berlin.de> Al Bowers <xa******@rapidsys.com> writes:
Dan Pop wrote:
In <3F***********@mindspring.com> pete <pf*****@mindspring.com> writes:

Specifically, we're discussing the case where the program
has determined whether or not s1 and s2 are contiguous,
and only the case where they are contiguous.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}
I entirely agree that, according to the wording of the standard, this
code snippet is correct. There is no consensus in comp.std.c on whether
this is the intent of the standard or not, but the actual wording is
unambiguous. However, if you replace the puts call by a printf call:

printf("%s\n", s1);

you're right into undefined behaviour, because:

s The argument shall be a pointer to an array of character type.
Characters from the array are written up to (but not including) a
terminating null character; if the precision is specified, no more
than that many characters are written. If the precision is not
specified or is greater than the size of the array, the array shall
contain a null character.


What about some of the string handling functions?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(buf,s1);
}

In section:
7.21.1 String function conventions
You have:
If an array is accessed beyond the end of an object, the behavior is
undefined.

There is an unfortunate conflict between your identifiers and the
parameter names used by the standard in the description of strcpy.
I'm using s1 and s2 with their meanings in the C standard, below.

The strcpy function copies the string pointed to by s2 (including
^^^^^^^^^^
the terminating null character) into the array pointed to by s1.
^^^^^^^^^

Still no problem, since only the s1 argument is supposed to point to an
array. And buf is large enough to hold a 6 character string.

Dan


I agree.
What about?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(s1,"Hello");
}
Assuming equality(the equality expression yields 1).

--
Al Bowers
Tampa, Fl USA
mailto: xa******@myrapidsys.com (remove the x to send email)
http://www.geocities.com/abowers822/

Nov 13 '05 #34
rihad wrote:
&s1[0] points to an array of objects.
The array is ended by a ((char) 0).
The array is terminated by a ((char)'3')
char s1[3] = "123";


--
pete
Nov 13 '05 #35
On Wed, 05 Nov 2003 14:47:52 GMT, pete <pf*****@mindspring.com> wrote:
rihad wrote:
&s1[0] points to an array of objects.
The array is ended by a ((char) 0).


The array is terminated by a ((char)'3')
char s1[3] = "123";


The array of objects terminated by a ((char) 0), not s1.
char s1[3] = "123";
char s2[4] = "456";
Given this:

char s[] = "123456", (*p3)[3] = &s;

is calling

printf("%s\n", p3[0]);

illegal, but

printf("%s\n", p3[1]);

is legal?

I'm pretty sure they are both legal, because nowhere is unowned/uninitialized
memory being accessed. Then why can't we assume that in the case of

char s1[3] = "123";
char s2[4] = "456";

and assert(s2 == s1 + sizeof s1);

there's some virtual object s that spans the two objects s1 and s2 and that
object s consitutes a valid C string?

Nov 13 '05 #36
In <da********************************@4ax.com> rihad <ri***@mail.ru> writes:
On 5 Nov 2003 11:52:35 GMT, Da*****@cern.ch (Dan Pop) wrote:
In <c1********************************@4ax.com> rihad <ri***@mail.ru> writes:
Surely, "array" means a
consecutive "string" of characters, not a C array, otherwise, char *s1 coming
(By a "C array" I meant char a[] = "hello"; printf("%s\n", a);)
C array is *everything* the standard defines as such.
Array means whatever the standard defines as an array:

* An array type describes a contiguously allocated set of objects
with a particular member object type, called the element type. Array
types are characterized by their element type and by the number of
members of the array.


Then given char *p = calloc(1, 1); p points to an array of one char (barring
nomem)?


Yes, in common parlance. A pedant would say that p points to the first
character of an array of one char. However, given the type of p, there
is no place for confusion if one simply uses your wording.
And given char c = 0; &c points to an array of one char?
Yes.
Sorry, but I read the above as if int i; meant array of 1 int.
This is correct, too.

7 For the purposes of these operators, a pointer to an object that
is not an element of an array behaves the same as a pointer to
the first element of an array of length one with the type of
the object as its element type.
from a successful calloc() would cause undefined behaviour right away when fed
to printf().


Wrong:

The calloc function allocates space for an array of nmemb objects,
^^^^^
each of whose size is size.


Sorry, but I *really* fail to understand why substituting the puts(s1); call
below with printf("%s\n", s1); suddenly invokes undefined behaviour, as you have
pointed out. &s1[0] points to an array of objects. The array is ended by a
((char) 0).


Nope. The array is ended by a character that is NOT a null character.
It is *only* the s2 array that ends with a null character.
Nowhere in the range of [ (s1 + 0) .. (s1 + sizeof s1 + sizeof s2) )
is an uninitialized object being accessed for reading (assuming the if holds
true, which is just a compile time constant IIRC).
It doesn't matter. The standard clearly states that %s expects an array
containing a null character. There is no such character in the s1 array,
therefore the printf call invokes undefined behaviour. It's as simple as
that, whether you get it or not.

An implementation doing array bounds checking *can* detect that the end
of the array has been reached without encountering any null character.
At this point, the implementation is free to do anything it wants,
including making demons fly out of your nose.
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
puts(s1);
}


OTOH, this is fine because puts() does NOT expect an array. It expects
a sequence of characters terminated by a null characters and it does not
care about how this sequence of characters is allocated. Reread the
definition of "string".

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #37
In <bo*************@ID-169908.news.uni-berlin.de> Al Bowers <xa******@rapidsys.com> writes:
What about?

char buf[32];
char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
strcpy(s1,"Hello");
}
Assuming equality(the equality expression yields 1).


An obvious case of undefined behaviour: you're writing beyond the end of
the s1 array. A bounds checking implementation is not supposed to be
impressed by the fact that s2 == s1 + sizeof s1.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #38
rihad wrote:
On Wed, 05 Nov 2003 14:47:52 GMT, pete <pf*****@mindspring.com> wrote:

rihad wrote:

&s1[0] points to an array of objects.
The array is ended by a ((char) 0).
The array is terminated by a ((char)'3')

char s1[3] = "123";

The array of objects terminated by a ((char) 0), not s1.


This makes no sense...
char s1[3] = "123";
char s2[4] = "456";
Given this:

char s[] = "123456", (*p3)[3] = &s;
Incompatible pointer types. &s is of type (*)[6].

is calling

printf("%s\n", p3[0]);
Illegal, %s expects a pointer to char not a pointer to
char[3]

illegal, but

printf("%s\n", p3[1]);
Same here.

is legal?

I'm pretty sure they are both legal, because nowhere is unowned/uninitialized
memory being accessed. Then why can't we assume that in the case of
Neither is legal.

char s1[3] = "123";
char s2[4] = "456";

and assert(s2 == s1 + sizeof s1);

In this case s1 and s2 is a valid string. printf expect a null
terminated array. A null terminated array is always a string, but
a string is not necessarily a null terminated array.
there's some virtual object s that spans the two objects s1 and s2 and that
object s consitutes a valid C string?


The definition of a string in C never mentions object nor array. Just
a contigous sequence of chars of which the last one is 0.

puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

--
Thomas.

Nov 13 '05 #39
Thomas Stegen wrote:
puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.


I don't believe this is true. Consider the following text from C99
7.1.4 ("Use of library functions"):

If a function argument is described as being an array, the pointer
actually passed to the function shall have a value such that all
address computations and accesses to objects (that would be valid if
the pointer did point to the first element of such an array) are in
fact valid.

In the library section of the standard the word "array" is just a
convenient shorthand to denote array-like objects (including the
object returned from malloc(), for example). You can't draw any
conclusions from the fact that the description of fprintf() uses the
word "array" to describe the pointer-to-string passed as argument and
the description of puts() doesn't.

Jeremy.
Nov 13 '05 #40
Dan Pop wrote:
An implementation doing array bounds checking *can* detect that the end
of the array has been reached without encountering any null character.
At this point, the implementation is free to do anything it wants,
including making demons fly out of your nose.


I find this a bit upsetting, if true. This means that we can have two
pointers that compare equal, one of which is known to point to a valid
object, and yet dereferencing the other has undefined behaviour. For
example, in the following, loop 2 has (according to the above)
undefined behaviour, while loop 3 does not.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
char *p = s1, *q = s2;

/* loop 1 */
for (; p != q; p++) {
putchar(*p);
}

assert (p == q);

/* loop 2 */
for (; p != s2 + sizeof s2; p++) {
putchar(*p);
}

/* loop 3 */
for (; q != s2 + sizeof s2; q++) {
putchar(*q);
}
}

Jeremy.
Nov 13 '05 #41
On Wed, 05 Nov 2003 18:58:44 +0000, Thomas Stegen <ts*****@cis.strath.ac.uk>
wrote:
rihad wrote:
On Wed, 05 Nov 2003 14:47:52 GMT, pete <pf*****@mindspring.com> wrote:
Given this:

char s[] = "123456", (*p3)[3] = &s;
Incompatible pointer types. &s is of type (*)[6].


It's actually of type (char (*)[7]), but nontheless I hoped it would be a valid
assignment. Alas... Would be neat though if it were :)

is calling

printf("%s\n", p3[0]);
Illegal, %s expects a pointer to char not a pointer to
char[3]


p3[0] is an expression of type (char [3]) which decays into (char *).

illegal, but

printf("%s\n", p3[1]);
Same here.


Same here.
puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.


Gosh! What is the difference between a string and a zero-terminated array (of
chars)?! Please help the desperate!

Nov 13 '05 #42
rihad wrote:

On Wed, 05 Nov 2003 14:47:52 GMT, pete <pf*****@mindspring.com> wrote:
rihad wrote:
&s1[0] points to an array of objects.
The array is ended by a ((char) 0).


The array is terminated by a ((char)'3')
char s1[3] = "123";


The array of objects terminated by a ((char) 0), not s1.
char s1[3] = "123";
char s2[4] = "456";


s1 and s2 are two distinct arrays.
s2, is not part of s1.
s2 ends in a null character.
s1 ends in '3'.

--
pete
Nov 13 '05 #43
On 5 Nov 2003 19:39:58 GMT, Jeremy Yallop <je****@jdyallop.freeserve.co.uk>
wrote:
Dan Pop wrote:
An implementation doing array bounds checking *can* detect that the end
of the array has been reached without encountering any null character.
At this point, the implementation is free to do anything it wants,
including making demons fly out of your nose.


I find this a bit upsetting, if true. This means that we can have two
pointers that compare equal, one of which is known to point to a valid
object, and yet dereferencing the other has undefined behaviour. For
example, in the following, loop 2 has (according to the above)
undefined behaviour, while loop 3 does not.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
char *p = s1, *q = s2;

/* loop 1 */
for (; p != q; p++) {
putchar(*p);
}

assert (p == q);

/* loop 2 */
for (; p != s2 + sizeof s2; p++) {
putchar(*p);
}

/* loop 3 */
for (; q != s2 + sizeof s2; q++) {
putchar(*q);
}
}


If loop 2 is undefined, there's no point in living. Thanks for the eye-opening
example, Jeremy.

Nov 13 '05 #44
On 5 Nov 2003 19:28:45 GMT, Jeremy Yallop <je****@jdyallop.freeserve.co.uk>
wrote:
Thomas Stegen wrote:
puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.


I don't believe this is true. Consider the following text from C99
7.1.4 ("Use of library functions"):

If a function argument is described as being an array, the pointer
actually passed to the function shall have a value such that all
address computations and accesses to objects (that would be valid if
the pointer did point to the first element of such an array) are in
fact valid.

In the library section of the standard the word "array" is just a
convenient shorthand to denote array-like objects (including the
object returned from malloc(), for example). You can't draw any
conclusions from the fact that the description of fprintf() uses the
word "array" to describe the pointer-to-string passed as argument and
the description of puts() doesn't.


That's what I've felt since my first followup to Dan Pop! Maybe I haven't been
thinking in terms of the standard's wording but nontheless I'm glad to see that
you happen to share my opinion, even though yours is far more educated, while
mine is based on what "makes sense to me" :)
Nov 13 '05 #45

"Dan Pop" <Da*****@cern.ch> schrieb im Newsbeitrag
news:bo**********@sunnews.cern.ch...
In <3f***********************@newsreader02.highway.te lekom.at> "Robert Stankowic" <pc******@netway.at> writes:
Do you know any possibility for an implementation to produce different
results (provided the expression "s2 == s1 + sizeof s1" yields true)?


Of course. Any implementation doing array bound checking *properly*
should object in the s1/s2 case. The big challenge of such an
implementation is NOT to object to the puts call.


Thank you for the clarification
regards
Robert
Nov 13 '05 #46
In <sl*******************@ekoi.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Thomas Stegen wrote:
puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.
I don't believe this is true. Consider the following text from C99
7.1.4 ("Use of library functions"):

If a function argument is described as being an array, the pointer
actually passed to the function shall have a value such that all
address computations and accesses to objects (that would be valid if
the pointer did point to the first element of such an array) are in
fact valid.

In the library section of the standard the word "array" is just a
convenient shorthand to denote array-like objects (including the
object returned from malloc(), for example).


The word "array" being defined by the standard, cannot be interpreted in
any other way when used in the standard. The object returned by malloc
satisfies the standard's definition of array.
You can't draw any
conclusions from the fact that the description of fprintf() uses the
word "array" to describe the pointer-to-string passed as argument and
the description of puts() doesn't.


Of course you can. If you ignore the definitions of the terms used by
the standard in a purely arbitrary way (i.e. according to your own
preconceptions about the language), the standard becomes a useless
document.

The *real* issue is whether the current wording of the standard accurately
reflects the intent of those who wrote it. If it doesn't, the wording
needs to be fixed, but until then, one cannot take arbitrary liberties
in interpreting the text of the standard.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #47
In <sl*******************@ekoi.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Dan Pop wrote:
An implementation doing array bounds checking *can* detect that the end
of the array has been reached without encountering any null character.
At this point, the implementation is free to do anything it wants,
including making demons fly out of your nose.
I find this a bit upsetting, if true. This means that we can have two
pointers that compare equal, one of which is known to point to a valid
object, and yet dereferencing the other has undefined behaviour.


Yup, C99 *explicitly* mentions this possibility:

6 Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an
object and a subobject at its beginning) or function, both are
pointers to one past the last element of the same array object,
or one is a pointer to one past the end of one array object and
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^
the other is a pointer to the start of a different array object
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^
that happens to immediately follow the first array object in
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^
the address space.91)
^^^^^^^^^^^^^^^^^
____________________

91) Two objects may be adjacent in memory because they are
adjacent elements of a larger array or adjacent members of
a structure with no padding between them, or because the
implementation chose to place them so, even though they
are unrelated. If prior invalid pointer operations (such as
accesses outside array bounds) produced undefined behavior,
subsequent comparisons also produce undefined behavior.
For
example, in the following, loop 2 has (according to the above)
undefined behaviour, while loop 3 does not.

char s1[3] = "123";
char s2[4] = "456";

if (s2 == s1 + sizeof s1) {
char *p = s1, *q = s2;
Stylistic issue: the code is much more readable if you name the pointers
p1 and p2, to be consistent with the way they are initialised.
/* loop 1 */
for (; p != q; p++) {
putchar(*p);
}

assert (p == q);
What for?!? Don't you trust the compiler to get the exit condition from
loop1 right or do you suspect that both != and == can evaluate to false
on the same pointer operands?
/* loop 2 */
for (; p != s2 + sizeof s2; p++) {
putchar(*p);
}
You can increment p one past the end of its object, but the
result cannot be either dereferenced or further incremented.

8 When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If
the pointer operand points to an element of an array object,
and the array is large enough, the result points to an element
offset from the original element such that the difference of the
subscripts of the resulting and original array elements equals
the integer expression. In other words, if the expression P
points to the i-th element of an array object, the expressions
(P)+N (equivalently, N+(P)) and (P)-N (where N has the value n)
point to, respectively, the i+n-th and i-n-th elements of the
array object, provided they exist. Moreover, if the expression
P points to the last element of an array object, the expression
(P)+1 points one past the last element of the array object, and
if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the
array object. If both the pointer operand and the result point to
elements of the same array object, or one past the last element of
the array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined. If the result points one
^^^^^^^^^^^^^^^^^^^^^^^^
past the last element of the array object, it shall not be used
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^
as the operand of a unary * operator that is evaluated.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^

There is no possible doubt that, according to the standard, your code
invokes undefined behaviour.
/* loop 3 */
for (; q != s2 + sizeof s2; q++) {
putchar(*q);
}
}


No problems here.

Imagine that you were writing a bounds checking implementation. It is
obvious, from these quotes, that pointer equality checking would have to
ignore the bounds information, but the indirection operator would have to
take it into account, as well as the addition and subtraction operators.

If your implementation would silently execute loop2, it would fail to
report a bound violation related invocation of undefined behaviour.

This is a typical example of how a very common mental image of the C
language is at odds with the C standard. Most people would expect loop2
to work and it will work on most (if not all) implementations without
bounds checking, but it will work by accident, not by design.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #48
Dan Pop wrote:
In <sl*******************@ekoi.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Thomas Stegen wrote:
puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.


I don't believe this is true. Consider the following text from C99
7.1.4 ("Use of library functions"):

If a function argument is described as being an array, the pointer
actually passed to the function shall have a value such that all
address computations and accesses to objects (that would be valid if
the pointer did point to the first element of such an array) are in
fact valid.

In the library section of the standard the word "array" is just a
convenient shorthand to denote array-like objects (including the
object returned from malloc(), for example).


The word "array" being defined by the standard, cannot be interpreted in
any other way when used in the standard.


I'm not sure why you say that. The section that I quoted clearly
states that "array" has a broader sense when used in the library
section.
You can't draw any
conclusions from the fact that the description of fprintf() uses the
word "array" to describe the pointer-to-string passed as argument and
the description of puts() doesn't.


Of course you can. If you ignore the definitions of the terms used by
the standard in a purely arbitrary way (i.e. according to your own
preconceptions about the language), the standard becomes a useless
document.


There's nothing arbitrary about making use of the explicit exception
given in the introduction to the library section of the standard
(quoted above). The word "array" is used in the library section for a
data pointer on which certain operations are valid. Here's another
example:

size_t fread(void * restrict ptr,
size_t size, size_t nmemb,
FILE * restrict stream);

The fread function reads, into the array pointed to by ptr [...]

Now, the following is perfectly valid, although `a' is not an array.

int a;
fread(&a, sizeof a, 1, fp);

Were it not for the exception quoted above such an interpretation
might be questionable. As it is, it's the only reasonable way to
interpret this aspect of the standard.

Jeremy.
Nov 13 '05 #49
Dan Pop wrote:
In <sl*******************@ekoi.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Dan Pop wrote:
An implementation doing array bounds checking *can* detect that the end
of the array has been reached without encountering any null character.
At this point, the implementation is free to do anything it wants,
including making demons fly out of your nose.


I find this a bit upsetting, if true. This means that we can have two
pointers that compare equal, one of which is known to point to a valid
object, and yet dereferencing the other has undefined behaviour.


Yup, C99 *explicitly* mentions this possibility:


It seems that you're right. It is pretty counterintuitive (if you
have the wrong intuitions, I suppose).
assert (p == q);


What for?!? Don't you trust the compiler to get the exit condition from
loop1 right or do you suspect that both != and == can evaluate to false
on the same pointer operands?


It was just for documentation, really. Perhaps a comment would have
been clearer. I didn't expect the assertion to fail (but then I don't
write assertions that I expect to fail).

Jeremy.
Nov 13 '05 #50

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Don Freeman | last post by:
Seems like whatever value I use for the first int field (starting position) the substring procedure negates it and triggers a String index out of range error. I've tried all sorts of work...
7
by: Radhika Sambamurti | last post by:
Hi, I've written a substring function. The prototype is: int substr(char s1, char s2) Returns 1 if s2 is a substring of s1, else it returns 0. I have written this program, but Im sure there is an...
1
by: sysindex | last post by:
I am trying to find a way to dynamically retrieve the substring starting point of an nText field. My query looks something like SELECT ID,Substring(DOCTEXT,0,200) from mytable where DOCTEXT...
11
by: Darren Anderson | last post by:
I have a function that I've tried using in an if then statement and I've found that no matter how much reworking I do with the code, the expected result is incorrect. the code: If Not...
5
by: btober | last post by:
I can't seem to get right the regular expression for parsing data like these four sample rows (names and addresses changed to ficticious values) from a text-type column: Yolanda Harris, 38, of...
2
by: mallard134 | last post by:
Could someone please help a newbee vb programmer with a question that is driving me crazy. I am trying to understand a line of code that is supposed to return the domain portion of a valid email...
4
by: Jean-François Michaud | last post by:
Hello, I've been looking at this for a bit now and I don't see what's wrong with the code. Can anybody see a problem with this? Here is an XSLT snippet I use. <xsl:template match="graphic">...
6
by: kellygreer1 | last post by:
What is a good one line method for doing a "length safe" String.Substring? The VB classes offer up the old Left function so that string s = Microsoft.VisualBasic.Left("kelly",200) // s will =...
11
by: dyc | last post by:
how do i make use of substring method in order to extract the specified data from a a long string? I also need to do some checking b4 extracting the data, for instance: it only will extract the...
3
by: =?Utf-8?B?anAybXNmdA==?= | last post by:
Two part question: 1. Is Regex more efficient than manually comparing values using Substring? 2. I've never created a Regex expression. How would I use regex to do the equivalent of what I...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.