How is string::c_str() usually implemented?

Derek

I'm curious about the performance of string::c_str,
so I'm wondering how it's commonly implemented. Do
most std::string implementations just keep an extra
char allocated for the NULL termination so they can
return a pointer to their internal buffer, or are
they equally likely to create a new buffer on demand?
I know the standard doesn't require any particular
implementation, which is why I'm curious if there is
a consensus among implementations.

Jul 22 '05 #1

Subscribe Post Reply

11556

Leor Zolman

On Tue, 25 May 2004 10:18:43 -0400, Derek <us**@nospam.org> wrote:

I'm curious about the performance of string::c_str,
so I'm wondering how it's commonly implemented. Do
most std::string implementations just keep an extra
char allocated for the NULL termination so they can
return a pointer to their internal buffer, or are
they equally likely to create a new buffer on demand?
I know the standard doesn't require any particular
implementation, which is why I'm curious if there is
a consensus among implementations.

I looked at about 5, and they all either return the pointer directly, or
first do a conditional test and return one of several pointers (I suspect
this has to do with the small string optimization). But none of the ones
I've looked at go creating a new buffer. I don't see how they /could/,
because it would change the dynamics of the pointer returned. The caller
would become responsible for deleting the memory, rendering code using that
library implementation incompatible with code from "normal"
implementations.

Looking up the Standard's spec for c_str (21.3.6), I didn't see anything
that explicitly disallows allocating a copy (I perhaps just don't know
where to look), but the way it says the pointer will become invalidated
after any call to a non-const string member function (on the associated
string) pretty much tells the tale.
-leor
--
Leor Zolman --- BD Software --- www.bdsoft.com
On-Site Training in C/C++, Java, Perl and Unix
C++ users: download BD Software's free STL Error Message Decryptor at:
www.bdsoft.com/tools/stlfilt.html

Jul 22 '05 #2

Ryan Mack

On Tue, 25 May 2004 10:18:43 -0400, Derek wrote:

I'm curious about the performance of string::c_str, so I'm wondering how
it's commonly implemented. Do most std::string implementations just
keep an extra char allocated for the NULL termination so they can return
a pointer to their internal buffer, or are they equally likely to create
a new buffer on demand? I know the standard doesn't require any
particular implementation, which is why I'm curious if there is a
consensus among implementations.

Here's from the GCC-3.4 C++ standard library basic_string.h:
// _Rep: string representation
// Invariants:
// 1. String really contains _M_length + 1 characters; last is set
// to 0 only on call to c_str(). We avoid instantiating //
_CharT() where the interface does not require it.

I believe other implementations also keep an extra byte around but I'm not
sure if they keep it set to 0 or not.

-Ryan Mack
email: [first letter of first name][last name]@[last name]man.net

Jul 22 '05 #3

Pete Becker

Leor Zolman wrote:

I looked at about 5, and they all either return the pointer directly, or
first do a conditional test and return one of several pointers (I suspect
this has to do with the small string optimization). But none of the ones
I've looked at go creating a new buffer. I don't see how they /could/,
because it would change the dynamics of the pointer returned. The caller
would become responsible for deleting the memory, rendering code using that
library implementation incompatible with code from "normal"
implementations.

Looking up the Standard's spec for c_str (21.3.6), I didn't see anything
that explicitly disallows allocating a copy (I perhaps just don't know
where to look), but the way it says the pointer will become invalidated
after any call to a non-const string member function (on the associated
string) pretty much tells the tale.

The basic_string object can delete the buffer at those points if it
allocated one, or it can wait until the next call to c_str or until
destruction. Not that anybody does it that way, of course. But the idea
was that basic_string could be implemented to hold its text in
non-contiguous chunks, and only be required to gather their contents
together on a call to c_str.

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Jul 22 '05 #4

Derek

Leor Zolman wrote:

I'm curious about the performance of string::c_str, so
I'm wondering how it's commonly implemented. Do most
std::string implementations just keep an extra char
allocated for the NULL termination so they can return a
pointer to their internal buffer, or are they equally
likely to create a new buffer on demand? I know the
standard doesn't require any particular implementation,
which is why I'm curious if there is a consensus among
implementations.
I looked at about 5, and they all either return the
pointer directly, or first do a conditional test and
return one of several pointers (I suspect this has to do
with the small string optimization).

Thanks for taking the time. I appreciate it.
But none of the ones I've looked at go creating a
new buffer. I don't see how they /could/, because it
would change the dynamics of the pointer returned.
The caller would become responsible for deleting the
memory, rendering code using that library implementation
incompatible with code from "normal" implementations.
Looking up the Standard's spec for c_str (21.3.6), I
didn't see anything that explicitly disallows allocating
a copy (I perhaps just don't know where to look), but the
way it says the pointer will become invalidated after
any call to a non-const string member function (on the
associated string) pretty much tells the tale.

I think a new buffer can be returned without transferring
ownership to the caller. (Though I can't really imagine
why an implementation would want to do this.) As long as
the string keeps an internal pointer to the buffer, the
string can be responsible for ownership and the caller
doesn't have to worry about deleting it. The string
could delete the c_str() buffer according to some policy
of its choosing, like when a non-const member is called.

Of course this scheme would require an extra pointer in the
string, memory allocation on c_str(), copying, and other
inefficiencies, but I think it's allowed. (Though I'm glad
most implementations seem to take a more efficient approach.)

Jul 22 '05 #5

Siemel Naran

"Leor Zolman" <le**@bdsoft.com> wrote in message

I looked at about 5, and they all either return the pointer directly, or

My version returns the pointer directly. But I think the implementation
could set the null char and then return the pointer directly. This is so
that calls to string+=c just append 1 char, like vector.push_back(c), and
don't set the null char over and over again.

Jul 22 '05 #6

Leor Zolman

On Tue, 25 May 2004 11:13:12 -0400, Pete Becker <pe********@acm.org> wrote:

Leor Zolman wrote:

I looked at about 5, and they all either return the pointer directly, or
first do a conditional test and return one of several pointers (I suspect
this has to do with the small string optimization). But none of the ones
I've looked at go creating a new buffer. I don't see how they /could/,
because it would change the dynamics of the pointer returned. The caller
would become responsible for deleting the memory, rendering code using that
library implementation incompatible with code from "normal"
implementations.

Looking up the Standard's spec for c_str (21.3.6), I didn't see anything
that explicitly disallows allocating a copy (I perhaps just don't know
where to look), but the way it says the pointer will become invalidated
after any call to a non-const string member function (on the associated
string) pretty much tells the tale.

The basic_string object can delete the buffer at those points if it
allocated one, or it can wait until the next call to c_str or until
destruction. Not that anybody does it that way, of course. But the idea
was that basic_string could be implemented to hold its text in
non-contiguous chunks, and only be required to gather their contents
together on a call to c_str.

Ah yes, of course it /could/ be set up that way, and it does help to
understand that the Standard allows for this. But it also does seem that
the practical concern most folks run up against when first exposed to
c_str() runs along the lines of wanting to know its potential cost under
the real platform in use; empirically, the call is an effective freebie.
-leor

--
Leor Zolman --- BD Software --- www.bdsoft.com
On-Site Training in C/C++, Java, Perl and Unix
C++ users: download BD Software's free STL Error Message Decryptor at:
www.bdsoft.com/tools/stlfilt.html

Jul 22 '05 #7

Leor Zolman

On Tue, 25 May 2004 11:27:45 -0400, Derek <us**@nospam.org> wrote:

I think a new buffer can be returned without transferring
ownership to the caller. (Though I can't really imagine
why an implementation would want to do this.) As long as
the string keeps an internal pointer to the buffer, the
string can be responsible for ownership and the caller
doesn't have to worry about deleting it. The string
could delete the c_str() buffer according to some policy
of its choosing, like when a non-const member is called.

Of course this scheme would require an extra pointer in the
string, memory allocation on c_str(), copying, and other
inefficiencies, but I think it's allowed. (Though I'm glad
most implementations seem to take a more efficient approach.)

It indeed does not sound useful, or at all in the spirit of C/C++ to be
doing all that work when it usually would not be necessary. That's probably
why doing it that way didn't even occur to me. If the caller wants a
mutable version of the text out of a c_str() call, they just make the copy
themselves. No muss, no fuss, no unsolicited overhead.

It seems to me that none of the approaches that make an "unnecessary" copy
of the buffer (the only "necessary" case being the one Pete outlined) would
really be consistent with the "const char *" return type from c_str()
anyway. I mean, if you wanted to advertise that you were providing a copy
that the caller could futz with, you wouldn't have that "const" there...and
what practical reason would there be for making a copy, other than to
provide a futz-able (tm) copy?
-leor
--
Leor Zolman --- BD Software --- www.bdsoft.com
On-Site Training in C/C++, Java, Perl and Unix
C++ users: download BD Software's free STL Error Message Decryptor at:
www.bdsoft.com/tools/stlfilt.html

Jul 22 '05 #8

Christopher Benson-Manica

Leor Zolman <le**@bdsoft.com> spoke thus:

Ah yes, of course it /could/ be set up that way, and it does help to
understand that the Standard allows for this. But it also does seem that
the practical concern most folks run up against when first exposed to
c_str() runs along the lines of wanting to know its potential cost under
the real platform in use; empirically, the call is an effective freebie.

Does "empirically" include implementations as old as 1999? And don't
think I'm talking about my implementation again, I never talk about my
implementation ;)

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.

Jul 22 '05 #9

Andrey Tarasevich

Leor Zolman wrote:

...
The basic_string object can delete the buffer at those points if it
allocated one, or it can wait until the next call to c_str or until
destruction. Not that anybody does it that way, of course. But the idea
was that basic_string could be implemented to hold its text in
non-contiguous chunks, and only be required to gather their contents
together on a call to c_str.

Ah yes, of course it /could/ be set up that way, and it does help to
understand that the Standard allows for this. But it also does seem that
the practical concern most folks run up against when first exposed to
c_str() runs along the lines of wanting to know its potential cost under
the real platform in use; empirically, the call is an effective freebie.

Unfortunately, in many cases (and I've seen it more than once) once some
folks find out that in practice the pointer returned by 'c_str()' really
points to the beginning of the actual controlled sequence, they proceed
right on to casting away the constness and using the resultant pointer
to modify the stored string. In my opinion, in order to keep beginner
C++ programmers from getting this nasty habit it might be quite useful
to continue to perpetuate the idea that 'c_str()' might return a pointer
to a independently allocated buffer (which is, BTW, at least partially
true in implementations that return a pointer to a static empty-string
literal "" for 'std::string's of zero length).

--
Best regards,
Andrey Tarasevich

Jul 22 '05 #10

Leor Zolman

On Tue, 25 May 2004 17:13:53 +0000 (UTC), Christopher Benson-Manica
<at***@nospam.cyberspace.org> wrote:

Leor Zolman <le**@bdsoft.com> spoke thus:
Ah yes, of course it /could/ be set up that way, and it does help to
understand that the Standard allows for this. But it also does seem that
the practical concern most folks run up against when first exposed to
c_str() runs along the lines of wanting to know its potential cost under
the real platform in use; empirically, the call is an effective freebie.

Does "empirically" include implementations as old as 1999? And don't
think I'm talking about my implementation again, I never talk about my
implementation ;)

VC6 was one of the ones I checked, and pjp's copyright notice says 1994.
Back before Dinkumware adopted the small-string optimization, c_str() was
even simpler than it is now (but not by much; they're both "simple").
-leor

--
Leor Zolman --- BD Software --- www.bdsoft.com
On-Site Training in C/C++, Java, Perl and Unix
C++ users: download BD Software's free STL Error Message Decryptor at:
www.bdsoft.com/tools/stlfilt.html

Jul 22 '05 #11

Leor Zolman

On Tue, 25 May 2004 10:17:18 -0700, Andrey Tarasevich
<an**************@hotmail.com> wrote:

Leor Zolman wrote:
...
The basic_string object can delete the buffer at those points if it
allocated one, or it can wait until the next call to c_str or until
destruction. Not that anybody does it that way, of course. But the idea
was that basic_string could be implemented to hold its text in
non-contiguous chunks, and only be required to gather their contents
together on a call to c_str.

Ah yes, of course it /could/ be set up that way, and it does help to
understand that the Standard allows for this. But it also does seem that
the practical concern most folks run up against when first exposed to
c_str() runs along the lines of wanting to know its potential cost under
the real platform in use; empirically, the call is an effective freebie.

Unfortunately, in many cases (and I've seen it more than once) once some
folks find out that in practice the pointer returned by 'c_str()' really
points to the beginning of the actual controlled sequence, they proceed
right on to casting away the constness and using the resultant pointer
to modify the stored string. In my opinion, in order to keep beginner
C++ programmers from getting this nasty habit it might be quite useful
to continue to perpetuate the idea that 'c_str()' might return a pointer
to a independently allocated buffer (which is, BTW, at least partially
true in implementations that return a pointer to a static empty-string
literal "" for 'std::string's of zero length).

How does perpetuating the idea that the buffer is separate serve to
discourage anyone from casting away the constness of the pointer? If
anything, it would /encourage/ the practice, since it implies that there's
now a copy that can be "freely" modified without affecting the original
string. If they're going to go circumvent the safeguards that are in place,
let's at least make sure they're cognizant of their actual
transgressions...
-leor
--
Leor Zolman --- BD Software --- www.bdsoft.com
On-Site Training in C/C++, Java, Perl and Unix
C++ users: download BD Software's free STL Error Message Decryptor at:
www.bdsoft.com/tools/stlfilt.html

Jul 22 '05 #12

Andrey Tarasevich

Leor Zolman wrote:

...
Unfortunately, in many cases (and I've seen it more than once) once some
folks find out that in practice the pointer returned by 'c_str()' really
points to the beginning of the actual controlled sequence, they proceed
right on to casting away the constness and using the resultant pointer
to modify the stored string. In my opinion, in order to keep beginner
C++ programmers from getting this nasty habit it might be quite useful
to continue to perpetuate the idea that 'c_str()' might return a pointer
to a independently allocated buffer (which is, BTW, at least partially
true in implementations that return a pointer to a static empty-string
literal "" for 'std::string's of zero length).

How does perpetuating the idea that the buffer is separate serve to
discourage anyone from casting away the constness of the pointer? If
anything, it would /encourage/ the practice, since it implies that there's
now a copy that can be "freely" modified without affecting the original
string. If they're going to go circumvent the safeguards that are in place,
let's at least make sure they're cognizant of their actual
transgressions...

Well, that only convinces me that both ways to perceive 'c_str()'s
functionality allow for certain misuses. Which one is more "encouraging"
is a subjective issue.

--
Best regards,
Andrey Tarasevich

Jul 22 '05 #13

Leor Zolman

On Tue, 25 May 2004 11:08:51 -0700, Andrey Tarasevich
<an**************@hotmail.com> wrote:

Leor Zolman wrote:
...
Unfortunately, in many cases (and I've seen it more than once) once some
folks find out that in practice the pointer returned by 'c_str()' really
points to the beginning of the actual controlled sequence, they proceed
right on to casting away the constness and using the resultant pointer
to modify the stored string. In my opinion, in order to keep beginner
C++ programmers from getting this nasty habit it might be quite useful
to continue to perpetuate the idea that 'c_str()' might return a pointer
to a independently allocated buffer (which is, BTW, at least partially
true in implementations that return a pointer to a static empty-string
literal "" for 'std::string's of zero length).

How does perpetuating the idea that the buffer is separate serve to
discourage anyone from casting away the constness of the pointer? If
anything, it would /encourage/ the practice, since it implies that there's
now a copy that can be "freely" modified without affecting the original
string. If they're going to go circumvent the safeguards that are in place,
let's at least make sure they're cognizant of their actual
transgressions...

Well, that only convinces me that both ways to perceive 'c_str()'s
functionality allow for certain misuses. Which one is more "encouraging"
is a subjective issue.

I guess. My reasoning went like this: if you think you're getting a
separate buffer, and you cast away the constness of a pointer to it, you'll
have a false sense of security coming from a position of, "Well, I'm just
using a resource that was unreasonably withheld from me for some unknown
archaic Standardese rationale, and I can't really get into any trouble for
doing it". On the other hand, if you realize you're writing into the
internal state of your std::string object, and you do it anyway, at least
you'd know you're doing something seriously unkosher, equivalent to editing
a header file by changing "private:" to "public:".
-leor
--
Leor Zolman --- BD Software --- www.bdsoft.com
On-Site Training in C/C++, Java, Perl and Unix
C++ users: download BD Software's free STL Error Message Decryptor at:
www.bdsoft.com/tools/stlfilt.html

Jul 22 '05 #14

Uwe Schnitker

Pete Becker <pe********@acm.org> wrote in message news:<40***************@acm.org>...

Leor Zolman wrote:

I looked at about 5, and they all either return the pointer directly, or
first do a conditional test and return one of several pointers (I suspect
this has to do with the small string optimization). But none of the ones
I've looked at go creating a new buffer. I don't see how they /could/,
because it would change the dynamics of the pointer returned. The caller
would become responsible for deleting the memory, rendering code using that
library implementation incompatible with code from "normal"
implementations.

Looking up the Standard's spec for c_str (21.3.6), I didn't see anything
that explicitly disallows allocating a copy (I perhaps just don't know
where to look), but the way it says the pointer will become invalidated
after any call to a non-const string member function (on the associated
string) pretty much tells the tale.
The basic_string object can delete the buffer at those points if it
allocated one, or it can wait until the next call to c_str or until
destruction. Not that anybody does it that way, of course.

Not with basic_string.

But just for the records: The rope implementation that SGI bundled with
their STL - and I believe it is also included as an extension in GNU
libstdc++, but no longer maintained - uses such a technique.

Rope was intended to be a heavy-duty string, working mostly just
ike a std::string, but optimized for repeated manipulation - copying,
pasting, replacing of substrings - of very long strings. A rope would
start off as a normal string with a contiguous buffer, but after such
manipulations it would consist of a tree - nitpickers would call it a
DAG - of text buffer nodes. The c_str function would have to copy - in
fact, (re)create - the string buffer, with associated performance cost.

Of course, rope is neither standard, nor general-purpose, nor, for
that matter, does it appear to be used anywhere, so discussing its
performance characteristic is not that important - but fun anyway.
But the idea
was that basic_string could be implemented to hold its text in
non-contiguous chunks, and only be required to gather their contents
together on a call to c_str.

Uwe

Jul 22 '05 #15

Leor Zolman

On Sat, 29 May 2004 22:18:58 GMT, "Daniel T." <po********@eathlink.net>
wrote:

In article <8q********************************@4ax.com>,
Leor Zolman <le**@bdsoft.com> wrote:
On Tue, 25 May 2004 10:17:18 -0700, Andrey Tarasevich
<an**************@hotmail.com> wrote:
Leor Zolman wrote:
> ...
>The basic_string object can delete the buffer at those points if it
>allocated one, or it can wait until the next call to c_str or until
>destruction. Not that anybody does it that way, of course. But the idea
>was that basic_string could be implemented to hold its text in
>non-contiguous chunks, and only be required to gather their contents
>together on a call to c_str.

Ah yes, of course it /could/ be set up that way, and it does help to
understand that the Standard allows for this. But it also does seem that
the practical concern most folks run up against when first exposed to
c_str() runs along the lines of wanting to know its potential cost under
the real platform in use; empirically, the call is an effective freebie.

Unfortunately, in many cases (and I've seen it more than once) once some
folks find out that in practice the pointer returned by 'c_str()' really
points to the beginning of the actual controlled sequence, they proceed
right on to casting away the constness and using the resultant pointer
to modify the stored string. In my opinion, in order to keep beginner
C++ programmers from getting this nasty habit it might be quite useful
to continue to perpetuate the idea that 'c_str()' might return a pointer
to a independently allocated buffer (which is, BTW, at least partially
true in implementations that return a pointer to a static empty-string
literal "" for 'std::string's of zero length).

How does perpetuating the idea that the buffer is separate serve to
discourage anyone from casting away the constness of the pointer?

Because the user of string doesn't know what the particular
implemenation he is using will do. If he wants to keep his code
portable, he must assume that some strings do, in fact, keep a seperate
buffer, while others don't, and act accordingly.

Okay, I'll buy that. In my initial reading of the first of Andrey's posts I
replied to (quoted above), I probably misread "c_str() might return a
pointer..." as "c_str() does return a pointer...". Indeed, as long as you
stress that /you just don't know/, which is essentially true (are there any
reference-counted std::string implementations in use, where assigning
through the return value of c_str() could conceivably modify /several/
string objects in one fell poke?) , it follows that there's just no valid
reason to cast away the constness of that pointer.
-leor

--
Leor Zolman --- BD Software --- www.bdsoft.com
On-Site Training in C/C++, Java, Perl and Unix
C++ users: download BD Software's free STL Error Message Decryptor at:
www.bdsoft.com/tools/stlfilt.html

Jul 22 '05 #16

How is string::c_str() usually implemented?

Similar topics