473,395 Members | 2,006 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

std::string and refcounting

In recent discussions relating to what to use for a new project which
integrated the work of two, previously seperate, teams we got to the
subject of our respective string implementations. One team rolled
their own strings while the other used the std::string. Reasons for
using the home-grown strings(and vectors) were mainly refcounting and
portabillity, but I thought that these days almost all STL
implementations used refcounted strings and that the STL was available
for most platforms.

When we got back to test things out with my compiler (MSVC++ 6 with
the latest patch-level) strings were refcounted but on the other team
lead's computer (.net) strings were not refcounted. Do any of you know
a webpage or site that consolidates information about the STL
implementations on various platforms or does anyone have specific
information about the state of the STL on Windows, WinCE, Symbian, Mac
or Linux?

Thanks for any help,

-joe
Jul 22 '05 #1
12 1929
joe martin wrote:
In recent discussions relating to what to use for a new project which
integrated the work of two, previously seperate, teams we got to the
subject of our respective string implementations. One team rolled
their own strings while the other used the std::string. Reasons for
using the home-grown strings(and vectors) were mainly refcounting and
portabillity, but I thought that these days almost all STL
implementations used refcounted strings and that the STL was available
for most platforms.

When we got back to test things out with my compiler (MSVC++ 6 with
the latest patch-level) strings were refcounted but on the other team
lead's computer (.net) strings were not refcounted. Do any of you know
a webpage or site that consolidates information about the STL
implementations on various platforms or does anyone have specific
information about the state of the STL on Windows, WinCE, Symbian, Mac
or Linux?


std::string was changed in VC.NET because of threading issues. However,
that's kind of OT.
Jul 22 '05 #2
joe martin wrote:
When we got back to test things out with my compiler (MSVC++ 6 with
the latest patch-level) strings were refcounted but on the other team
lead's computer (.net) strings were not refcounted. Do any of you know
a webpage or site that consolidates information about the STL
implementations on various platforms or does anyone have specific
information about the state of the STL on Windows, WinCE, Symbian, Mac
or Linux?


I don't have specific information but my understanding from talking
to the other C++ library implementers is that everybody is moving
away from reference counted implementations of 'std::string'.
Essentially, the reason is that the interface is not really suitable
for this kind of implementation despite the fact that the specification
in the standard actually even mentions reference counting in a note
(if I remember correctly). This is somewhat related to the history of
the string class which was vamped up when everything became a template.
Things are further complicated in [potentially] multi-threaded
environments where the reference counting approach effectively requires
mutex locks in various places which significantly increases the costs.

I haven't verified the results but apparently the conclusion is that
copying strings is acceptable and the costs can be further reduced
by the "small string"-optimization (which simply embeds the string in
the string object directly if it is smaller than eg. 32 chars). For
really large strings you probably want to pass them around by reference
or through a shared pointer - at least when they are immutable.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting
Jul 22 '05 #3
On Wed, 21 Apr 2004 05:38:10 +0200, Dietmar Kuehl
<di***********@yahoo.com> wrote:
joe martin wrote:
When we got back to test things out with my compiler (MSVC++ 6 with
the latest patch-level) strings were refcounted but on the other team
lead's computer (.net) strings were not refcounted. Do any of you know
a webpage or site that consolidates information about the STL
implementations on various platforms or does anyone have specific
information about the state of the STL on Windows, WinCE, Symbian, Mac
or Linux?
I don't have specific information but my understanding from talking
to the other C++ library implementers is that everybody is moving
away from reference counted implementations of 'std::string'.
Essentially, the reason is that the interface is not really suitable
for this kind of implementation despite the fact that the specification
in the standard actually even mentions reference counting in a note
(if I remember correctly). This is somewhat related to the history of
the string class which was vamped up when everything became a template.
Things are further complicated in [potentially] multi-threaded
environments where the reference counting approach effectively requires
mutex locks in various places which significantly increases the costs.


I thought though that the use of atomic incrementors and decrementors
could be used in place of a mutex and that they were available on most
processors. I asked on the Windows newsgroup about what their reasons
might have been to not just use the provided Interlocked functions but
havn't really gotten a response. Maybe I am wrong that atomic fcns are
all that are really needed for refcounted objects? That would be
unfortunate as I think this is our internal solution.

I haven't verified the results but apparently the conclusion is that
copying strings is acceptable and the costs can be further reduced
by the "small string"-optimization (which simply embeds the string in
the string object directly if it is smaller than eg. 32 chars). For
really large strings you probably want to pass them around by reference
or through a shared pointer - at least when they are immutable.


This makes sense I guess although refcounting seems so much more
efficient. My hope is that Smarter Brains Than Mine have considered
the necessary issues in most STL implementations and acted
accordingly. Anyway, thanks for your response.

-joe

Jul 22 '05 #4
"joe martin" <jo****@hormel.product.iwishiwasdead.org> wrote in message
<di***********@yahoo.com> wrote:
I don't have specific information but my understanding from talking
to the other C++ library implementers is that everybody is moving
away from reference counted implementations of 'std::string'.
Essentially, the reason is that the interface is not really suitable
for this kind of implementation despite the fact that the specification
in the standard actually even mentions reference counting in a note
(if I remember correctly). This is somewhat related to the history of
the string class which was vamped up when everything became a template.
What is it about the interface turns implementors away from reference
counting?
Things are further complicated in [potentially] multi-threaded
environments where the reference counting approach effectively requires
mutex locks in various places which significantly increases the costs.


This is only a problem when we share writable strings between threads. How
often does this happen anyway? For that matter, isn't boost::shared_ptr a
problem?

This makes sense I guess although refcounting seems so much more
efficient. My hope is that Smarter Brains Than Mine have considered
the necessary issues in most STL implementations and acted
accordingly. Anyway, thanks for your response.


Why would refcounted strings be faster? Sure, it's fast when you pass and
return strings by value. But then when you change the reference copied
string you have to make a deep copy anyway. Also, there is the return value
optimization, but I don't know how many compilers implement this.
Jul 22 '05 #5
joe martin <jo****@hormel.product.iwishiwasdead.org> wrote:
I thought though that the use of atomic incrementors and decrementors
could be used in place of a mutex and that they were available on most
processors. I asked on the Windows newsgroup about what their reasons
might have been to not just use the provided Interlocked functions but
havn't really gotten a response. Maybe I am wrong that atomic fcns are
all that are really needed for refcounted objects? That would be
unfortunate as I think this is our internal solution.
My understanding is that atomic increments and decrements (combined at
least in one direction with a test) are sufficient for single processor
machines but not for multi processor machines. However, I'm not really
sure about this.
This makes sense I guess although refcounting seems so much more
efficient.
Is this a conclusion from measuring or from deduction? I can see that
reference counting huge strings will probably be more efficient but
for the typical small strings I'm using in my programs I doubt that a
reference counted approach is really faster. On the other hand, I
haven't measured it either.
My hope is that Smarter Brains Than Mine have considered
the necessary issues in most STL implementations and acted
accordingly. Anyway, thanks for your response.


The problem with the standard string class is that the specification
effectively requires copying the string in many, often unexpected,
places. For example, when obtaining and dereferecing and iterator for
a non-const object, a copy becomes necessary. The necessary additional
logic will almost certain dwarf any gains obtained from omitting
copies except when handling mostly fairly large strings. A string
class avoiding problems like this could use reference counting more
effectively but I would still expect it to pay off only for bigger
strings. A good analysis of this would probably be quite interesting.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting
Jul 22 '05 #6
> What is it about the interface turns implementors away from reference
counting?


This (old, possibly outdated) article sheds some light on the problem with
reference counted string implementations:
http://www.sgi.com/tech/stl/string_discussion.html

--
Peter van Merkerk
peter.van.merkerk(at)dse.nl
Jul 22 '05 #7
joe martin <jo****@hormel.product.iwishiwasdead.org> wrote in message news:<2n********************************@4ax.com>. ..
In recent discussions relating to what to use for a new project which
integrated the work of two, previously seperate, teams we got to the
subject of our respective string implementations. One team rolled
their own strings while the other used the std::string. Reasons for
using the home-grown strings(and vectors) were mainly refcounting and
portabillity, but I thought that these days almost all STL
implementations used refcounted strings and that the STL was available
for most platforms.

When we got back to test things out with my compiler (MSVC++ 6 with
the latest patch-level) strings were refcounted but on the other team
lead's computer (.net) strings were not refcounted. Do any of you know
a webpage or site that consolidates information about the STL
implementations on various platforms or does anyone have specific
information about the state of the STL on Windows, WinCE, Symbian, Mac
or Linux?


A site that addresses the basics is www.gotw.ca, especially GOTW
articles #43-#45. The executive summary: refcounting is too hard in
threaded environments, and even in single-threaded environments
typically provides little if any advantage.

Regards,
Michiel Salters
Jul 22 '05 #8

Michiel Salters wrote:
[...]
A site that addresses the basics is www.gotw.ca, especially GOTW
articles #43-#45. The executive summary: refcounting is too hard in
threaded environments, and even in single-threaded environments
typically provides little if any advantage.


First off, it isn't really too hard. As for advantage... if deep
copying needs to allocate memory (small string optimisations
aside for amoment), it simply means that you'll incur "some"
synchronisation overheard in the allocator instead of one single
"naked" atomic increment without any membars on refcount. #43-#45
is rather interesting reading but don't believe everything
(especially conclusions) that it says.

http://groups.google.com/groups?thre...CBD4B%40web.de

regards,
alexander.
Jul 22 '05 #9
"Siemel Naran" <Si*********@REMOVE.att.net> wrote in message news:<k5********************@bgtnsc04-news.ops.worldnet.att.net>...
"joe martin" <jo****@hormel.product.iwishiwasdead.org> wrote in message
<di***********@yahoo.com> wrote:
I don't have specific information but my understanding from talking
to the other C++ library implementers is that everybody is moving
away from reference counted implementations of 'std::string'.
Essentially, the reason is that the interface is not really suitable
for this kind of implementation despite the fact that the specification
in the standard actually even mentions reference counting in a note
(if I remember correctly). This is somewhat related to the history of
the string class which was vamped up when everything became a template.
What is it about the interface turns implementors away from reference
counting?


Effectively, the string has to be unshared in many situations often
unexpected situation. In particular, the string has to be [potentially]
unshared for each character access. This means that you get a conditional
dealing with the reference count in each iterator dereference, each array
access operation (on non-const strings, of course). This costs cycles
even for the considerate people which normally pass strings by reference.
Also, implementers use small string optimizations which don't need an
allocation for strings up to a certain size, eg. 32 chars: this is big
enough to contain many strings (IDs, tpyical data base values, etc.) and
only incurs a memory allocation for really big strings. With all this it
turns out that reference counting is actually more expensive than copying
strings in some cases.
Things are further complicated in [potentially] multi-threaded
environments where the reference counting approach effectively requires
mutex locks in various places which significantly increases the costs.


This is only a problem when we share writable strings between threads.


Yes and no: the problem with reference counting inside a string is that
it is an implementation detail. As such, the implementer of a strings
class for a multi-threaded environment has to make sure that it works
correctly if the string representation is really shared between threads
(well, strictly speaking the standard makes no such requirement but the
users will do anyway): after all, to the user the strings are separate
things and there is no need to protect them in any form from concurrent
accesses. As a consequence, the string has to do protections internally.
That is, it is a problem even if the strings are read-only and not even
shared at all...
How
often does this happen anyway? For that matter, isn't boost::shared_ptr a
problem?


'shared_ptr' does not have this problem because they do no internal
sharing magic: if the 'shared_ptr' is used from different threads, the
user is responsible for the protection against concurrent accesses.
--
<mailto:di***********@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.contendix.com> - Software Development & Consulting
Jul 22 '05 #10
Alexander Terekhov <te******@web.de> wrote in message news:<40***************@web.de>...
Michiel Salters wrote:
[...]
A site that addresses the basics is www.gotw.ca, especially GOTW
articles #43-#45. The executive summary: refcounting is too hard in
threaded environments, and even in single-threaded environments
typically provides little if any advantage.
First off, it isn't really too hard.


I think we're talking about two things. You probably interpreted it
as "too hard to implement correctly" while I meant "too hard to
implement correctly and still faster than comparable non-refcounted".
As for advantage... if deep copying needs to allocate memory
(small string optimisations aside for amoment), it simply means
that you'll incur "some" synchronisation overheard in the
allocator instead of one single "naked" atomic increment without
any membars on refcount.
True. COW obviously shines in the absence of W. Of course, the common
CHAR_T& STRING::operator[](pos_type) might very well be a write, which
causes branches and possibly copies in COW-types.
#43-#45 is rather interesting reading but don't believe
everything (especially conclusions) that it says.


Indeed. The best string class can only be found by profiling.
Until that time, stick with std::string. It is universally available,
and in general recent versions are pretty good for common cases.
It also has the added advantage of being able to use
platform-specific tricks in the implementation without sacrificing
portability, something your code can never achieve ;)

Regards,
Michiel Salters
Jul 22 '05 #11

Michiel Salters wrote:
[...]
True. COW obviously shines in the absence of W. Of course, the common
CHAR_T& STRING::operator[](pos_type) might very well be a write, which
causes branches and possibly copies in COW-types.


Use "const CHAR_T& STRING::operator[](pos_type) const" for reads.

regards,
alexander.
Jul 22 '05 #12
> > #43-#45 is rather interesting reading but don't believe
everything (especially conclusions) that it says.


Indeed. The best string class can only be found by profiling.


For a given case and a given platform. It is not realistic to expect for a
string implementation to produce optimal results in every case, a trade off
has to be made somewhere.

--
Peter van Merkerk
peter.van.merkerk(at)dse.nl
Jul 22 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Angus Leeming | last post by:
Hello, Could someone explain to me why the Standard conveners chose to typedef std::string rather than derive it from std::basic_string<char, ...>? The result of course is that it is...
11
by: Christopher Benson-Manica | last post by:
Let's say I have a std::string, and I want to replace all the ',' characters with " or ", i.e. "A,B,C" -> "A or B or C". Is the following the best way to do it? int idx; while(...
22
by: Jason Heyes | last post by:
Does this function need to call eof after the while-loop to be correct? bool read_file(std::string name, std::string &s) { std::ifstream in(name.c_str()); if (!in.is_open()) return false; ...
19
by: Erik Wikström | last post by:
First of all, forgive me if this is the wrong place to ask this question, if it's a stupid question (it's my second week with C++), or if this is answered some place else (I've searched but not...
8
by: Patrick Kowalzick | last post by:
Dear NG, I would like to change the allocator of e.g. all std::strings, without changing my code. Is there a portable solution to achieve this? The only nice solution I can think of, would be...
6
by: Nemok | last post by:
Hi, I am new to STD so I have some questions about std::string because I want use it in one of my projects instead of CString. 1. Is memory set dinamicaly (like CString), can I define for...
2
by: FBergemann | last post by:
if i compile following sample: #include <iostream> #include <string> int main(int argc, char **argv) { std::string test = "hallo9811111z"; std::string::size_type ret;
84
by: Peter Olcott | last post by:
Is there anyway of doing this besides making my own string from scratch? union AnyType { std::string String; double Number; };
11
by: Jacek Dziedzic | last post by:
Hi! I need a routine like: std::string nth_word(const std::string &s, unsigned int n) { // return n-th word from the string, n is 0-based // if 's' contains too few words, return "" //...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.