473,419 Members | 1,523 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,419 software developers and data experts.

substring

Hey,
I would like to know 2 things.
1)Is there any function (in C standard library) that extracts a
substring from a string?
2)Is there any function (in C standard library) that returns the
position of a substring in a string?

Thx a lot...
Nov 13 '05
62 31826
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Dan Pop wrote:
In <sl*******************@ekoi.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Thomas Stegen wrote:
puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

I don't believe this is true. Consider the following text from C99
7.1.4 ("Use of library functions"):

If a function argument is described as being an array, the pointer
actually passed to the function shall have a value such that all
address computations and accesses to objects (that would be valid if
the pointer did point to the first element of such an array) are in
fact valid.

In the library section of the standard the word "array" is just a
convenient shorthand to denote array-like objects (including the
object returned from malloc(), for example).
The word "array" being defined by the standard, cannot be interpreted in
any other way when used in the standard.


I'm not sure why you say that. The section that I quoted clearly
states that "array" has a broader sense when used in the library
section.


I don't see any broader sense in your quote. The restrictions about
address computations are exactly the same as those defined in the
paragraph dealing with pointer arithmetic.
You can't draw any
conclusions from the fact that the description of fprintf() uses the
word "array" to describe the pointer-to-string passed as argument and
the description of puts() doesn't.


Of course you can. If you ignore the definitions of the terms used by
the standard in a purely arbitrary way (i.e. according to your own
preconceptions about the language), the standard becomes a useless
document.


There's nothing arbitrary about making use of the explicit exception
given in the introduction to the library section of the standard
(quoted above). The word "array" is used in the library section for a
data pointer on which certain operations are valid.


Precisely my point!
Here's another example:

size_t fread(void * restrict ptr,
size_t size, size_t nmemb,
FILE * restrict stream);

The fread function reads, into the array pointed to by ptr [...]

Now, the following is perfectly valid, although `a' is not an array.

int a;
fread(&a, sizeof a, 1, fp);
Every scalar can be considered as either an array of 1 of its type, or as
an array of sizeof(scalar) unsigned characters. This is explained in
other parts of the standard.
Were it not for the exception quoted above such an interpretation
might be questionable. As it is, it's the only reasonable way to
interpret this aspect of the standard.


No exception is needed. The standard explains how any object can be
accessed on a byte by byte basis by treating it as an array of characters.
This is enough for your example.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #51
Dan Pop wrote:
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Dan Pop wrote:
In <sl*******************@ekoi.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:

Thomas Stegen wrote:
> puts and printf are different because puts prints a string, while
> printf explicitly takes a null terminated array.

I don't believe this is true. Consider the following text from C99
7.1.4 ("Use of library functions"):

If a function argument is described as being an array, the pointer
actually passed to the function shall have a value such that all
address computations and accesses to objects (that would be valid if
the pointer did point to the first element of such an array) are in
fact valid.

In the library section of the standard the word "array" is just a
convenient shorthand to denote array-like objects (including the
object returned from malloc(), for example).

The word "array" being defined by the standard, cannot be interpreted in
any other way when used in the standard.
I'm not sure why you say that. The section that I quoted clearly
states that "array" has a broader sense when used in the library
section.


I don't see any broader sense in your quote. The restrictions about
address computations are exactly the same as those defined in the
paragraph dealing with pointer arithmetic.


Perhaps, but I was responding to the claim that:

puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

A null terminated "array" in the library section is no different from
a string. In particular, the address computations that can be
performed on each are precisely the same.
Every scalar can be considered as either an array of 1 of its type, or as
an array of sizeof(scalar) unsigned characters. This is explained in
other parts of the standard.


True. The guarantee is slightly stronger, I think: all the character
types can be used to access any object.
Were it not for the exception quoted above such an interpretation
might be questionable. As it is, it's the only reasonable way to
interpret this aspect of the standard.


No exception is needed.


Well, why is it there, then? I agree that the guarantees elsewhere in
the standard can be taken as sufficient to allow the current meaning,
but I don't think that they're unambiguous enough.

Jeremy.
Nov 13 '05 #52
Jeremy Yallop wrote:
puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

A null terminated "array" in the library section is no different from
a string. In particular, the address computations that can be
performed on each are precisely the same.


The library section defines a string as:

"7.1.1 Definitions of terms
1 A string is a contiguous sequence of characters terminated by and
including the first null character. [...] The length of a string is
the number of bytes preceding the null character and the value of a
string is the sequence of the values of the contained characters, in
order."

Seems to go to great lengths to avoid the term array i think.
Furthermore the descriptions of functions are very careful
where the term string is used and where the term array is used.

Not that the outcome of this discussion will have much effect
on my coding style ;)

--
Thomas.

Nov 13 '05 #53
Thomas Stegen wrote:
Jeremy Yallop wrote:
puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

A null terminated "array" in the library section is no different from
a string. In particular, the address computations that can be
performed on each are precisely the same.


The library section defines a string as:

"7.1.1 Definitions of terms
1 A string is a contiguous sequence of characters terminated by and
including the first null character. [...] The length of a string is
the number of bytes preceding the null character and the value of a
string is the sequence of the values of the contained characters, in
order."

Seems to go to great lengths to avoid the term array i think.
Furthermore the descriptions of functions are very careful
where the term string is used and where the term array is used.


It seems to me that "string" is used wherever the "array" argument is
null-terminated. This is entirely in keeping with the way these terms
are used elsewhere in the standard: "array" denotes the properties of
the object; "string" describes the value that the object has when
accessed as a sequence of char.

Consequently, the description for [f]printf() uses "array" rather than
"string" (in the 's' specifier section) because the argument is not
necessarily null-terminated. I don't think there's any other
significant difference between "string" and "array" in the library
section. "Array" tends to be used for output parameters for obvious
reasons.

For example:

size_t strxfrm(char * restrict s1,
const char * restrict s2,
size_t n);

The strxfrm function transforms the string pointed to by s2 and
places the resulting string into the array pointed to by s1.

Nobody can seriously claim that this description means that `s1' must
be an actual array whereas `s2' may be split across two or more
objects.

Jeremy.
Nov 13 '05 #54
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Perhaps, but I was responding to the claim that:

puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

A null terminated "array" in the library section is no different from
a string. In particular, the address computations that can be
performed on each are precisely the same.


This is the root of your misunderstanding. The definition of array you
have quoted yourself *explicitly* requires pointer arithmetic to work
inside the array.

The definition of string contains NO such requirement:

1 A string is a contiguous sequence of characters terminated by
and including the first null character. The term multibyte
string is sometimes used instead to emphasize special processing
given to multibyte characters contained in the string or to
avoid confusion with a wide string. A pointer to a string is a
pointer to its initial (lowest addressed) character. The length
of a string is the number of bytes preceding the null character
and the value of a string is the sequence of the values of the
contained characters, in order.

A direct consequence of this anomaly is that NO function expecting a
string parameter that is not explicitly required to be contained in an
array, cannot be *portably* implemented in C, because a C implementation
would necessarily rely on pointer arithmetic working inside the string.
But the definition of string quoted above provide no such guarantee.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #55
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Dan Pop wrote:
In <sl*******************@ekoi.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Dan Pop wrote:
An implementation doing array bounds checking *can* detect that the end
of the array has been reached without encountering any null character.
At this point, the implementation is free to do anything it wants,
including making demons fly out of your nose.

I find this a bit upsetting, if true. This means that we can have two
pointers that compare equal, one of which is known to point to a valid
object, and yet dereferencing the other has undefined behaviour.


Yup, C99 *explicitly* mentions this possibility:


It seems that you're right. It is pretty counterintuitive (if you
have the wrong intuitions, I suppose).


It's not that counterintuitive to people familiar with segmented memory
systems. Imagine what happens if s1 is allocated at the end of a segment
and s2 at the beginning of another segment and there is one byte of
overlap (the first byte of s2) between the two segments...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #56
Da*****@cern.ch (Dan Pop) writes:
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Dan Pop wrote:
In <sl*******************@ekoi.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:

Dan Pop wrote:
> An implementation doing array bounds checking *can* detect that the end
> of the array has been reached without encountering any null character.
> At this point, the implementation is free to do anything it wants,
> including making demons fly out of your nose.

I find this a bit upsetting, if true. This means that we can have two
pointers that compare equal, one of which is known to point to a valid
object, and yet dereferencing the other has undefined behaviour.

Yup, C99 *explicitly* mentions this possibility:


It seems that you're right. It is pretty counterintuitive (if you
have the wrong intuitions, I suppose).


It's not that counterintuitive to people familiar with segmented memory
systems. Imagine what happens if s1 is allocated at the end of a segment
and s2 at the beginning of another segment and there is one byte of
overlap (the first byte of s2) between the two segments...


It's unclear to me why, if it can't make inter-segment pointer
arithmetic work properly, a compiler would go to the trouble of
ensuring that inter-segment pointer comparisons work properly.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}
Nov 13 '05 #57
In <87************@pfaff.stanford.edu> Ben Pfaff <bl*@cs.stanford.edu> writes:
Da*****@cern.ch (Dan Pop) writes:
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
>Dan Pop wrote:
>> In <sl*******************@ekoi.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
>>
>>>Dan Pop wrote:
>>>> An implementation doing array bounds checking *can* detect that the end
>>>> of the array has been reached without encountering any null character.
>>>> At this point, the implementation is free to do anything it wants,
>>>> including making demons fly out of your nose.
>>>
>>>I find this a bit upsetting, if true. This means that we can have two
>>>pointers that compare equal, one of which is known to point to a valid
>>>object, and yet dereferencing the other has undefined behaviour.
>>
>> Yup, C99 *explicitly* mentions this possibility:
>
>It seems that you're right. It is pretty counterintuitive (if you
>have the wrong intuitions, I suppose).


It's not that counterintuitive to people familiar with segmented memory
systems. Imagine what happens if s1 is allocated at the end of a segment
and s2 at the beginning of another segment and there is one byte of
overlap (the first byte of s2) between the two segments...


It's unclear to me why, if it can't make inter-segment pointer
arithmetic work properly, a compiler would go to the trouble of
ensuring that inter-segment pointer comparisons work properly.


Maybe because the compiler has nothing to do for that: the underlying
hardware may implement address comparisons this way.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #58
Dan Pop wrote:
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Perhaps, but I was responding to the claim that:

puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

A null terminated "array" in the library section is no different from
a string. In particular, the address computations that can be
performed on each are precisely the same.
This is the root of your misunderstanding.


I prefer "This is the point under discusssion".
The definition of array you have quoted yourself *explicitly*
requires pointer arithmetic to work inside the array.

The definition of string contains NO such requirement:


You may be right, although that would make "contiguous" a rather
unhelpful word to describe the bytes that contain a string. If two
pointers into a string cannot be compared for equality and if no valid
pointer arithmetic on one will yield a pointer equivalent to the other
then the bytes aren't contiguous in any useful sense. Just to be
clear, though, are you claiming that in the following:

#include <string.h>
char *strcpy(char * restrict s1, const char * restrict s2);

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by s1.

`s1' *must* point to a single object, whereas `s2' may point to two
adjacent objects spanned by a single string?
A direct consequence of this anomaly is that NO function expecting a

string parameter that is not explicitly required to be contained in an
array, cannot be *portably* implemented in C, because a C implementation
would necessarily rely on pointer arithmetic working inside the string.
But the definition of string quoted above provide no such guarantee.


Again, you may well be right according to the letter of the standard
but that this sort of absurdity is a consequence shows (to me) that
this is not its intent.

Jeremy.
Nov 13 '05 #59
Dan Pop wrote:
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Perhaps, but I was responding to the claim that:

puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

A null terminated "array" in the library section is no different from
a string. In particular, the address computations that can be
performed on each are precisely the same.
This is the root of your misunderstanding.


I prefer "This is the point under discussion".
The definition of array you have quoted yourself *explicitly*
requires pointer arithmetic to work inside the array.

The definition of string contains NO such requirement:


You may be right, although that would make "contiguous" a rather
unhelpful word to describe the bytes that contain a string. If two
pointers into a string cannot be compared for equality and if no valid
pointer arithmetic on one will yield a pointer equivalent to the other
then the bytes aren't contiguous in any useful sense. Just to be
clear, though, are you claiming that in the following:

#include <string.h>
char *strcpy(char * restrict s1, const char * restrict s2);

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by s1.

`s1' *must* point to a single object, whereas `s2' may point to two
adjacent objects spanned by a single string?
A direct consequence of this anomaly is that NO function expecting a

string parameter that is not explicitly required to be contained in an
array, cannot be *portably* implemented in C, because a C implementation
would necessarily rely on pointer arithmetic working inside the string.
But the definition of string quoted above provide no such guarantee.


Again, you may well be right according to the letter of the standard
but that this sort of absurdity is a consequence shows (to me) that
this is not its intent.

Jeremy.
Nov 13 '05 #60
In <sl*******************@embo.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Dan Pop wrote:
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Perhaps, but I was responding to the claim that:

puts and printf are different because puts prints a string, while
printf explicitly takes a null terminated array.

A null terminated "array" in the library section is no different from
a string. In particular, the address computations that can be
performed on each are precisely the same.
This is the root of your misunderstanding.


I prefer "This is the point under discussion".


I've made that point far too often in this discussion and no one could
provide a quote from the standard proving me wrong. By that time, you
should have gotten it, hence my actual wording.
The definition of array you have quoted yourself *explicitly*
requires pointer arithmetic to work inside the array.

The definition of string contains NO such requirement:


You may be right, although that would make "contiguous" a rather
unhelpful word to describe the bytes that contain a string.


It's helpful enough in deciding whether s1 (in your previous example)
can be used where a string is expected or not.
If two
pointers into a string cannot be compared for equality and if no valid
pointer arithmetic on one will yield a pointer equivalent to the other
then the bytes aren't contiguous in any useful sense. Just to be
clear, though, are you claiming that in the following:

#include <string.h>
char *strcpy(char * restrict s1, const char * restrict s2);

The strcpy function copies the string pointed to by s2 (including
the terminating null character) into the array pointed to by s1.

`s1' *must* point to a single object, whereas `s2' may point to two
adjacent objects spanned by a single string?


Precisely. This is what the standard says, with no room for an alternate
interpretation. All the relevant quotes already provided in this thread.
A direct consequence of this anomaly is that NO function expecting a

string parameter that is not explicitly required to be contained in an
array, cannot be *portably* implemented in C, because a C implementation
would necessarily rely on pointer arithmetic working inside the string.
But the definition of string quoted above provide no such guarantee.


Again, you may well be right according to the letter of the standard
but that this sort of absurdity is a consequence shows (to me) that
this is not its intent.


I agree that the actual standard wording is sloppy and that it *probably*
does not reflect the intent. But this doesn't make any shred of a
difference: until fixed, it is the current wording that rules what is
allowed and what is not allowed.

OTOH, there might be a non-obvious reason the standard is worded this way.
When I posted this question to comp.std.c (about half a year ago), nobody
provided any useful insights.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #61
Dan Pop wrote:
I agree that the actual standard wording is sloppy and that it *probably*
does not reflect the intent. But this doesn't make any shred of a
difference: until fixed, it is the current wording that rules what is
allowed and what is not allowed.
Perhaps a defect report is called for. How would you suggest
resolving the inconsistency? Fixing the definition of string to
preclude spanning non-overlapping objects seems like the obvious
solution.
OTOH, there might be a non-obvious reason the standard is worded this way.
When I posted this question to comp.std.c (about half a year ago), nobody
provided any useful insights.


In <news:ag**********@sunnews.cern.ch>? Yes, I see what you mean.

Jeremy.
Nov 13 '05 #62
In <sl*******************@hehe.cl.cam.ac.uk> Jeremy Yallop <je****@jdyallop.freeserve.co.uk> writes:
Dan Pop wrote:
I agree that the actual standard wording is sloppy and that it *probably*
does not reflect the intent. But this doesn't make any shred of a
difference: until fixed, it is the current wording that rules what is
allowed and what is not allowed.


Perhaps a defect report is called for. How would you suggest
resolving the inconsistency? Fixing the definition of string to
preclude spanning non-overlapping objects seems like the obvious
solution.


I entirely agree. I can't see any problems created by explicitly
requiring that each C string is stored inside a C object. But even then
we could discuss about the validity of:

struct { char s1[3], s2[4]; } foo = {"abc", "def"};

if (foo.s1 + sizeof foo.s1 == foo.s2) puts(foo.s1);

Is the fact that the string is entirely contained in the foo object
enough? In my opinion it should be, but people like JW might disagree...

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #63

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Don Freeman | last post by:
Seems like whatever value I use for the first int field (starting position) the substring procedure negates it and triggers a String index out of range error. I've tried all sorts of work...
7
by: Radhika Sambamurti | last post by:
Hi, I've written a substring function. The prototype is: int substr(char s1, char s2) Returns 1 if s2 is a substring of s1, else it returns 0. I have written this program, but Im sure there is an...
1
by: sysindex | last post by:
I am trying to find a way to dynamically retrieve the substring starting point of an nText field. My query looks something like SELECT ID,Substring(DOCTEXT,0,200) from mytable where DOCTEXT...
11
by: Darren Anderson | last post by:
I have a function that I've tried using in an if then statement and I've found that no matter how much reworking I do with the code, the expected result is incorrect. the code: If Not...
5
by: btober | last post by:
I can't seem to get right the regular expression for parsing data like these four sample rows (names and addresses changed to ficticious values) from a text-type column: Yolanda Harris, 38, of...
2
by: mallard134 | last post by:
Could someone please help a newbee vb programmer with a question that is driving me crazy. I am trying to understand a line of code that is supposed to return the domain portion of a valid email...
4
by: Jean-François Michaud | last post by:
Hello, I've been looking at this for a bit now and I don't see what's wrong with the code. Can anybody see a problem with this? Here is an XSLT snippet I use. <xsl:template match="graphic">...
6
by: kellygreer1 | last post by:
What is a good one line method for doing a "length safe" String.Substring? The VB classes offer up the old Left function so that string s = Microsoft.VisualBasic.Left("kelly",200) // s will =...
11
by: dyc | last post by:
how do i make use of substring method in order to extract the specified data from a a long string? I also need to do some checking b4 extracting the data, for instance: it only will extract the...
3
by: =?Utf-8?B?anAybXNmdA==?= | last post by:
Two part question: 1. Is Regex more efficient than manually comparing values using Substring? 2. I've never created a Regex expression. How would I use regex to do the equivalent of what I...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.