473,378 Members | 1,099 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

std::string and std::ostringstream performances

Hi,

I have a C++ application that extensively uses std::string and
std::ostringstream in somewhat similar manner as below

std::string msgHeader;

msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";

Similarly it uses ostringstream as well and the function that uses
this gets called almost on every message that my application gets on
the socket. I am using this to precisely construct a XML Message to
be sent to another application.

What we observed when we ran a collect/analyzer on the application is
that it shows majority of the CPU spent in trying to deal with these 2
datatypes, their memory allocation using std::allocator and other
stuff. The CPU goes as high as 100% sometimes.

I would like to get an advice/suggestion on the following points
1. Is there a better way to use std::string / std::ostringstream than
the way I have been using it?
2. AM I using the wrong datatype for such kind of operations and
should move on to use something else? Any suggestions what the
datatype should be?

I eventually need these datatypes because the external library that I
am using to send this data out needs it in std::string /
std::ostringstream formats.

Would like to have some suggestions to bring down the CPU utilization.

Thanks,
Bala

Oct 31 '07 #1
25 8271
"Bala2508" <R.***********@gmail.comwrote in message
news:11**********************@v3g2000hsg.googlegro ups.com...
Hi,

I have a C++ application that extensively uses std::string and
std::ostringstream in somewhat similar manner as below

std::string msgHeader;

msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";

Similarly it uses ostringstream as well and the function that uses
this gets called almost on every message that my application gets on
the socket. I am using this to precisely construct a XML Message to
be sent to another application.

What we observed when we ran a collect/analyzer on the application is
that it shows majority of the CPU spent in trying to deal with these 2
datatypes, their memory allocation using std::allocator and other
stuff. The CPU goes as high as 100% sometimes.

I would like to get an advice/suggestion on the following points
1. Is there a better way to use std::string / std::ostringstream than
the way I have been using it?
2. AM I using the wrong datatype for such kind of operations and
should move on to use something else? Any suggestions what the
datatype should be?

I eventually need these datatypes because the external library that I
am using to send this data out needs it in std::string /
std::ostringstream formats.

Would like to have some suggestions to bring down the CPU utilization.
One suggestion would be .reserve(). I E.
std::string msgHeader;
msgHeader.reserve( 100 );

That way the string msgHeader wouldn't need to try to allocate more memory
until it has used the initial 100 characters allocated. Some compilers are
better at preallocating a default number of bytes than others. Sometimes
they have to be given a hint. Figure out a good size to reserve (one big
enough where you won't need to be doing reallocatings, one small enough that
you're not running out of memory) and then try profiling it again and see if
it helps.
Oct 31 '07 #2
On Oct 31, 12:09 pm, "Jim Langston" <tazmas...@rocketmail.comwrote:
"Bala2508" <R.Balaji.I...@gmail.comwrote in message

news:11**********************@v3g2000hsg.googlegro ups.com...


Hi,
I have a C++ application that extensively uses std::string and
std::ostringstream in somewhat similar manner as below
std::string msgHeader;
msgHeader = "<";
msgHeader += a;
msgHeader += "><";
msgHeader += b;
msgHeader += "><";
msgHeader += c;
msgHeader += ">";
Similarly it uses ostringstream as well and the function that uses
this gets called almost on every message that my application gets on
the socket. I am using this to precisely construct a XML Message to
be sent to another application.
What we observed when we ran a collect/analyzer on the application is
that it shows majority of the CPU spent in trying to deal with these 2
datatypes, their memory allocation using std::allocator and other
stuff. The CPU goes as high as 100% sometimes.
I would like to get an advice/suggestion on the following points
1. Is there a better way to use std::string / std::ostringstream than
the way I have been using it?
2. AM I using the wrong datatype for such kind of operations and
should move on to use something else? Any suggestions what the
datatype should be?
I eventually need these datatypes because the external library that I
am using to send this data out needs it in std::string /
std::ostringstream formats.
Would like to have some suggestions to bring down the CPU utilization.

One suggestion would be .reserve(). I E.
std::string msgHeader;
msgHeader.reserve( 100 );

That way the string msgHeader wouldn't need to try to allocate more memory
until it has used the initial 100 characters allocated. Some compilers are
better at preallocating a default number of bytes than others. Sometimes
they have to be given a hint. Figure out a good size to reserve (one big
enough where you won't need to be doing reallocatings, one small enough that
you're not running out of memory) and then try profiling it again and see if
it helps.- Hide quoted text -

- Show quoted text -
I also clear the string using msgHeader.str("") method once i am done
with the sending of the message. Then again when this method gets
called, the same sequence of events happen. Wouldnt it clear the
allocated memory once i do a msgHeader.str("")? How do reserving
essentially help in this scenario?

Oct 31 '07 #3
On 2007-10-31 19:48, Bala wrote:
On Oct 31, 12:09 pm, "Jim Langston" <tazmas...@rocketmail.comwrote:
>"Bala2508" <R.Balaji.I...@gmail.comwrote in message

news:11**********************@v3g2000hsg.googlegr oups.com...


Hi,
I have a C++ application that extensively uses std::string and
std::ostringstream in somewhat similar manner as below
std::string msgHeader;
msgHeader = "<";
msgHeader += a;
msgHeader += "><";
msgHeader += b;
msgHeader += "><";
msgHeader += c;
msgHeader += ">";
Similarly it uses ostringstream as well and the function that uses
this gets called almost on every message that my application gets on
the socket. I am using this to precisely construct a XML Message to
be sent to another application.
What we observed when we ran a collect/analyzer on the application is
that it shows majority of the CPU spent in trying to deal with these 2
datatypes, their memory allocation using std::allocator and other
stuff. The CPU goes as high as 100% sometimes.
I would like to get an advice/suggestion on the following points
1. Is there a better way to use std::string / std::ostringstream than
the way I have been using it?
2. AM I using the wrong datatype for such kind of operations and
should move on to use something else? Any suggestions what the
datatype should be?
I eventually need these datatypes because the external library that I
am using to send this data out needs it in std::string /
std::ostringstream formats.
Would like to have some suggestions to bring down the CPU utilization.

One suggestion would be .reserve(). I E.
std::string msgHeader;
msgHeader.reserve( 100 );

That way the string msgHeader wouldn't need to try to allocate more memory
until it has used the initial 100 characters allocated. Some compilers are
better at preallocating a default number of bytes than others. Sometimes
they have to be given a hint. Figure out a good size to reserve (one big
enough where you won't need to be doing reallocatings, one small enough that
you're not running out of memory) and then try profiling it again and see if
it helps.- Hide quoted text -
I also clear the string using msgHeader.str("") method once i am done
with the sending of the message. Then again when this method gets
called, the same sequence of events happen. Wouldnt it clear the
allocated memory once i do a msgHeader.str("")? How do reserving
essentially help in this scenario?
To clear the string use clear() instead, that is what it is meant for.
clear() will not affect the capacity of the string so if you do
something like

std::string str;
str.reserve(100);
str.clear();

you will still be able to put 100 characters into the string before it
needs to reallocate.

Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.

--
Erik Wikström
Oct 31 '07 #4
On Oct 31, 3:21 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
On 2007-10-31 19:48, Bala wrote:


On Oct 31, 12:09 pm, "Jim Langston" <tazmas...@rocketmail.comwrote:
"Bala2508" <R.Balaji.I...@gmail.comwrote in message
>news:11**********************@v3g2000hsg.googlegr oups.com...
Hi,
I have a C++ application that extensively uses std::string and
std::ostringstream in somewhat similar manner as below
std::string msgHeader;
msgHeader = "<";
msgHeader += a;
msgHeader += "><";
msgHeader += b;
msgHeader += "><";
msgHeader += c;
msgHeader += ">";
Similarly it uses ostringstream as well and the function that uses
this gets called almost on every message that my application gets on
the socket. I am using this to precisely construct a XML Message to
be sent to another application.
What we observed when we ran a collect/analyzer on the application is
that it shows majority of the CPU spent in trying to deal with these2
datatypes, their memory allocation using std::allocator and other
stuff. The CPU goes as high as 100% sometimes.
I would like to get an advice/suggestion on the following points
1. Is there a better way to use std::string / std::ostringstream than
the way I have been using it?
2. AM I using the wrong datatype for such kind of operations and
should move on to use something else? Any suggestions what the
datatype should be?
I eventually need these datatypes because the external library that I
am using to send this data out needs it in std::string /
std::ostringstream formats.
Would like to have some suggestions to bring down the CPU utilization.
One suggestion would be .reserve(). I E.
std::string msgHeader;
msgHeader.reserve( 100 );
That way the string msgHeader wouldn't need to try to allocate more memory
until it has used the initial 100 characters allocated. Some compilers are
better at preallocating a default number of bytes than others. Sometimes
they have to be given a hint. Figure out a good size to reserve (one big
enough where you won't need to be doing reallocatings, one small enough that
you're not running out of memory) and then try profiling it again and see if
it helps.- Hide quoted text -
I also clear the string using msgHeader.str("") method once i am done
with the sending of the message. Then again when this method gets
called, the same sequence of events happen. Wouldnt it clear the
allocated memory once i do a msgHeader.str("")? How do reserving
essentially help in this scenario?

To clear the string use clear() instead, that is what it is meant for.
clear() will not affect the capacity of the string so if you do
something like

std::string str;
str.reserve(100);
str.clear();

you will still be able to put 100 characters into the string before it
needs to reallocate.

Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.

--
Erik Wikström- Hide quoted text -

- Show quoted text -
msgHeader is local to the function. And the maximum size would not be
more than a 100 bytes. So I plan to modify my code to use reserve and
clear as you suggested and will try a hand on the performance. I hope
it helps.

BTW, a general question.
If i dont use reserve and my function looks somewhat like this below

void somefunction ()
{
std::string msgHeader
msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";
}

How is the actual memory allocation done? My understanding is that
the string library tries to reallocate memory on every statement.
That is, initially when it finds the statement "msgHeader = "<";", it
allocates say 1 byte to the msgHeader.
Then at the next statement it reallocates msgHeader as sizeof (a) +
current memory of msgHeader and so on.
Is this correct? If yes, then I am sure using reserve would improve
the performance dramatically.

Thanks,
Bala

Oct 31 '07 #5
On 2007-10-31 21:58, Bala wrote:
On Oct 31, 3:21 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
>On 2007-10-31 19:48, Bala wrote:


On Oct 31, 12:09 pm, "Jim Langston" <tazmas...@rocketmail.comwrote:
"Bala2508" <R.Balaji.I...@gmail.comwrote in message
>>news:11**********************@v3g2000hsg.googleg roups.com...
Hi,
I have a C++ application that extensively uses std::string and
std::ostringstream in somewhat similar manner as below
std::string msgHeader;
msgHeader = "<";
msgHeader += a;
msgHeader += "><";
msgHeader += b;
msgHeader += "><";
msgHeader += c;
msgHeader += ">";
Similarly it uses ostringstream as well and the function that uses
this gets called almost on every message that my application gets on
the socket. I am using this to precisely construct a XML Message to
be sent to another application.
What we observed when we ran a collect/analyzer on the application is
that it shows majority of the CPU spent in trying to deal with these 2
datatypes, their memory allocation using std::allocator and other
stuff. The CPU goes as high as 100% sometimes.
I would like to get an advice/suggestion on the following points
1. Is there a better way to use std::string / std::ostringstream than
the way I have been using it?
2. AM I using the wrong datatype for such kind of operations and
should move on to use something else? Any suggestions what the
datatype should be?
I eventually need these datatypes because the external library that I
am using to send this data out needs it in std::string /
std::ostringstream formats.
Would like to have some suggestions to bring down the CPU utilization.
>One suggestion would be .reserve(). I E.
std::string msgHeader;
msgHeader.reserve( 100 );
>That way the string msgHeader wouldn't need to try to allocate more memory
until it has used the initial 100 characters allocated. Some compilers are
better at preallocating a default number of bytes than others. Sometimes
they have to be given a hint. Figure out a good size to reserve (one big
enough where you won't need to be doing reallocatings, one small enough that
you're not running out of memory) and then try profiling it again and see if
it helps.- Hide quoted text -
I also clear the string using msgHeader.str("") method once i am done
with the sending of the message. Then again when this method gets
called, the same sequence of events happen. Wouldnt it clear the
allocated memory once i do a msgHeader.str("")? How do reserving
essentially help in this scenario?

To clear the string use clear() instead, that is what it is meant for.
clear() will not affect the capacity of the string so if you do
something like

std::string str;
str.reserve(100);
str.clear();

you will still be able to put 100 characters into the string before it
needs to reallocate.

Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.

--
Erik Wikström- Hide quoted text -

- Show quoted text -

msgHeader is local to the function. And the maximum size would not be
more than a 100 bytes. So I plan to modify my code to use reserve and
clear as you suggested and will try a hand on the performance. I hope
it helps.

BTW, a general question.
If i dont use reserve and my function looks somewhat like this below

void somefunction ()
{
std::string msgHeader
msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";
}

How is the actual memory allocation done? My understanding is that
the string library tries to reallocate memory on every statement.
That is, initially when it finds the statement "msgHeader = "<";", it
allocates say 1 byte to the msgHeader.
Then at the next statement it reallocates msgHeader as sizeof (a) +
current memory of msgHeader and so on.
Is this correct? If yes, then I am sure using reserve would improve
the performance dramatically.
I do not know, and I do not think the standard says anything about it.
But a good implementation will probably use a resizing scheme similar to
the one used for vectors, such as (at least) doubling the capacity every
time it resizes.

--
Erik Wikström
Oct 31 '07 #6
On Oct 31, 8:21 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
[...]
Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.
If you (re-)use a string with static lifetime, you probably
don't need reserve. It will very quickly reach the capacity of
the largest header, and never shrink.

In my own work, I tend to use std::vector<chara lot for this
sort of thing, using ostrstream (initialized to use the space in
the vector) for formatting. My own experience is that the
implementations of std::vector tend to be better optimized that
those of std::string, and you have a lot more guarantees
concerning the allocation strategy. In my case, this is rather
simple, since I am dealing with fixed length records and fields
(so something like:

size_t pos = v.size() ;
v.resize( v.size() + fieldSize ) ;
ostrstream formatter( &v[0] + pos, fieldSize ) ;
formatter << ... ;

works perfectly). But it's something that may be worth
considering. (If called with arguments, ostrstream will format
directly in place, with no dynamic allocation.)

Although not currently guaranteed, something similar using
std::string will actually work with all current implementations,
and will be guaranteed in the next version of the standard.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 1 '07 #7
On Oct 31, 9:58 pm, Bala <R.Balaji.I...@gmail.comwrote:
On Oct 31, 3:21 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
On 2007-10-31 19:48, Bala wrote:
[...]
If i dont use reserve and my function looks somewhat like this below
void somefunction ()
{
std::string msgHeader
msgHeader = "<";
msgHeader += a;
msgHeader += "><";
msgHeader += b;
msgHeader += "><";
msgHeader += c;
msgHeader += ">";
}
How is the actual memory allocation done?
However the implementation wants. There are no real
requirements.

In practice, I think most implementations today do something
similar to what they do in std::vector (which requires some sort
of exponential growth strategy). Many implementations also use
the small string optimization---there is no dynamic allocation
whatsoever if the string is small enough (typically something
between 8 and 32 bytes).
My understanding is that the string library tries to
reallocate memory on every statement.
It might, but it probably doesn't.

You can find out by tracing the capacity of the string after
each +=. (If the capacity of the empty string immediatly after
construction is greater than 0, then the implementation probably
uses the small string optimization.)
That is, initially when it finds the statement "msgHeader = "<";", it
allocates say 1 byte to the msgHeader.
Then at the next statement it reallocates msgHeader as sizeof (a) +
current memory of msgHeader and so on.
Is this correct? If yes, then I am sure using reserve would improve
the performance dramatically.
Anytime you can set a reasonable maximum for the length, reserve
is likely to help. How much depends largely on the
implementation, however.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 1 '07 #8
On Oct 31, 10:55 pm, Erik Wikström <Erik-wikst...@telia.comwrote:

[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.
Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 1 '07 #9
On Wed, 31 Oct 2007 20:58:33 -0000, Bala wrote:
>How is the actual memory allocation done? My understanding is that
the string library tries to reallocate memory on every statement.
That is, initially when it finds the statement "msgHeader =3D "<";", it
allocates say 1 byte to the msgHeader.
Probably yes.
>Then at the next statement it reallocates msgHeader as sizeof (a) +
current memory of msgHeader and so on.
Is this correct? If yes, then I am sure using reserve would improve
the performance dramatically.
Probably for string but you cannot call reserve() for ostringstream.
Both std::string and std::ostringstream are not meant to be utilized
"extensively". Consider to use a library for writing XML, e.g.
http://www.tbray.org/ongoing/When/20.../20/GenxStatus
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch
Nov 1 '07 #10
James Kanze wrote:
On Oct 31, 8:21 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
[...]
>Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.

If you (re-)use a string with static lifetime, you probably
don't need reserve. It will very quickly reach the capacity of
the largest header, and never shrink.
On the other hand the function won't be thread safe.

Nov 1 '07 #11
On Nov 1, 11:19 am, Tadeusz Kopec <tko...@NOSPAMPLEASElife.plwrote:
James Kanze wrote:
On Oct 31, 8:21 pm, Erik Wikström <Erik-wikst...@telia.comwrote:
[...]
Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.
If you (re-)use a string with static lifetime, you probably
don't need reserve. It will very quickly reach the capacity of
the largest header, and never shrink.

On the other hand the function won't be thread safe.
Yes you are right, it wont be thread safe. In this current scenario, i
am not using it with a static lifetime just to avoid locking and
unlocking while creating the headers, hence its an automatic
variable. In the meanwhile, im gettin myself tuned to run the test
with the suggested changes for reserve. Hoping that it helps, because
cant think of anything more as a solution at this moment.

Nov 1 '07 #12
"James Kanze" <ja*********@gmail.comwrote in message
news:11**********************@50g2000hsm.googlegro ups.com...
On Oct 31, 10:55 pm, Erik Wikström <Erik-wikst...@telia.comwrote:

[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.
Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

=====

I deciced to test my implementation so I wrote this:

#include <iostream>
#include <string>

int main()
{
std::string Foo;

std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}

The output for my system is:

Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105

So as we can see, on my system if I did not initially .reserve() and added 1
character at a time then I would wind up with 4 extra memory reallocations.

I'm on Windows XP with Microsoft Visual C++ .net 2005. Not sure that the OS
matters, just the compiler.
Nov 1 '07 #13
On 2007-11-01 21:53, Jim Langston wrote:
"James Kanze" <ja*********@gmail.comwrote in message
news:11**********************@50g2000hsm.googlegro ups.com...
On Oct 31, 10:55 pm, Erik Wikstré—£ <Erik-wikst...@telia.comwrote:

[...]
>I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.

Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

=====

I deciced to test my implementation so I wrote this:

#include <iostream>
#include <string>

int main()
{
std::string Foo;

std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}

The output for my system is:

Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105

So as we can see, on my system if I did not initially .reserve() and added 1
character at a time then I would wind up with 4 extra memory reallocations.
And for those too lazy to do the math themselves the numbers means that
the capacity is multiplied by 1.5 on each resize.

To James Kanze: I too would be interested in hearing about the source of
the "less than 1.6 multiplication" rule. I have tried to google for it
but I do not even know about where to start.

--
Erik Wikström
Nov 1 '07 #14
"Erik Wikström" <Er***********@telia.comwrote in message
news:du*****************@newsb.telia.net...
On 2007-11-01 21:53, Jim Langston wrote:
>"James Kanze" <ja*********@gmail.comwrote in message
news:11**********************@50g2000hsm.googlegr oups.com...
On Oct 31, 10:55 pm, Erik Wikstr? <Erik-wikst...@telia.comwrote:

[...]
>>I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.

Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

=====

I deciced to test my implementation so I wrote this:

#include <iostream>
#include <string>

int main()
{
std::string Foo;

std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}

The output for my system is:

Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105

So as we can see, on my system if I did not initially .reserve() and
added 1
character at a time then I would wind up with 4 extra memory
reallocations.

And for those too lazy to do the math themselves the numbers means that
the capacity is multiplied by 1.5 on each resize.

To James Kanze: I too would be interested in hearing about the source of
the "less than 1.6 multiplication" rule. I have tried to google for it
but I do not even know about where to start.
It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string.
Nov 1 '07 #15
On Nov 1, 5:37 pm, "Jim Langston" <tazmas...@rocketmail.comwrote:
"Erik Wikström" <Erik-wikst...@telia.comwrote in message

news:du*****************@newsb.telia.net...


On 2007-11-01 21:53, Jim Langston wrote:
"James Kanze" <james.ka...@gmail.comwrote in message
news:11**********************@50g2000hsm.googlegr oups.com...
On Oct 31, 10:55 pm, Erik Wikstr? <Erik-wikst...@telia.comwrote:
[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.
Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.
The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.
=====
I deciced to test my implementation so I wrote this:
#include <iostream>
#include <string>
int main()
{
std::string Foo;
std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}
The output for my system is:
Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105
So as we can see, on my system if I did not initially .reserve() and
added 1
character at a time then I would wind up with 4 extra memory
reallocations.
And for those too lazy to do the math themselves the numbers means that
the capacity is multiplied by 1.5 on each resize.
To James Kanze: I too would be interested in hearing about the source of
the "less than 1.6 multiplication" rule. I have tried to google for it
but I do not even know about where to start.

It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string..- Hide quoted text -

- Show quoted text -
I wrote a similar test application on Solaris 10 Sparc with g++3.4.3.
It clearly shows, that the reserve thing is surely going to help my
performance.

Thanks a ton for your inputs. Will let you know once the test is
successful.

I started appending 2 characters a time in a loop of 10.

This is the output without reserve

Start With :0
Current Size:2 Current Capacity:2
Current Size:4 Current Capacity:4
Current Size:6 Current Capacity:6
Current Size:8 Current Capacity:8
Current Size:10 Current Capacity:10
Current Size:12 Current Capacity:12
Current Size:14 Current Capacity:14
Current Size:16 Current Capacity:16
Current Size:18 Current Capacity:18
Current Size:20 Current Capacity:20

This is the output with reserve of 20

Start With :20

Thanks,
Bala

Nov 1 '07 #16

On Thu, 2007-11-01 at 14:37 -0700, Jim Langston wrote:
It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string.
Sounds like something Knuth would have noticed, but I can't find it in
TAoCP. Recognising its Knuthiness and the use of the golden ratio, it
looks like a Fibonacci thing:

+-+
|X|
+-+

+---+-+
|X X| |
+---+-+

+-----+
|X X X|
+-----+

+---------+-----+
|X X X X X| |
+---------+-----+

+---------------+
|X X X X X X X X|
+---------------+

+-------------------------+---------------+
|X X X X X X X X X X X X X |
+-------------------------+---------------+

+-----------------------------------------+
|X X X X X X X X X X X X X X X X X X X X X|
+-----------------------------------------+

But I wonder how useful this is in practise. It seems to be a real
best-case thing - if you can't merely extend, then get the next
Fibonacci number size *before* the previous allocation and copy there.
If you don't happen to get the space before the current storage, then
you still get fragmentation - although I suppose being *strictly*
Fibonacci probably makes that average out to be the lowest of a general
purpose system as adjacent free blocks will tend to support Fibonacci
sized allocations more often.

The ratio of adjacent pairs in the Fibonacci sequence of course
approaches the golden ratio as the noise due to being rational
approaches zero. Being that memory allocations must have integer size -
would you be better storing the previous capacity and adding the current
capacity to determine the next one - and *really* go Fibonacci?

--
Tristan Wibberley

Any opinion expressed is mine (or else I'm playing devils advocate for
the sake of a good argument). My employer had nothing to do with this
communication.

Nov 2 '07 #17

"Bala" <R.***********@gmail.comwrote in message
news:11**********************@19g2000hsx.googlegro ups.com...
On Nov 1, 5:37 pm, "Jim Langston" <tazmas...@rocketmail.comwrote:
"Erik Wikström" <Erik-wikst...@telia.comwrote in message

news:du*****************@newsb.telia.net...
On 2007-11-01 21:53, Jim Langston wrote:
"James Kanze" <james.ka...@gmail.comwrote in message
news:11**********************@50g2000hsm.googlegr oups.com...
On Oct 31, 10:55 pm, Erik Wikstr? <Erik-wikst...@telia.comwrote:
[...]
I do not know, and I do not think the standard says anything
about it. But a good implementation will probably use a
resizing scheme similar to the one used for vectors, such as
(at least) doubling the capacity every time it resizes.
Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.
The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.
=====
I deciced to test my implementation so I wrote this:
#include <iostream>
#include <string>
int main()
{
std::string Foo;
std::string::size_type LastCapacity = Foo.capacity();
std::cout << "Initial Capacity:" << LastCapacity << "\n";
for ( int i = 0; i < 100; ++i )
{
Foo += "x";
if ( Foo.capacity() != LastCapacity )
{
std::cout << "Size:" << Foo.size() << " " << "Capacity:" <<
Foo.capacity() << "\n";
LastCapacity = Foo.capacity();
}
}
}
The output for my system is:
Initial Capacity:15
Size:16 Capacity:31
Size:32 Capacity:47
Size:48 Capacity:70
Size:71 Capacity:105
So as we can see, on my system if I did not initially .reserve() and
added 1
character at a time then I would wind up with 4 extra memory
reallocations.
And for those too lazy to do the math themselves the numbers means that
the capacity is multiplied by 1.5 on each resize.
To James Kanze: I too would be interested in hearing about the source of
the "less than 1.6 multiplication" rule. I have tried to google for it
but I do not even know about where to start.

It was explained to me once. Consider a string with an initial capacity
of
100. There are 100 bytes of memory allocated. Then if you double it,
there
is a hole in the first 100 bytes, and the next 200 bytes are allocated.
Now
you double it again to 400. 100 is not enough so it can't use the hole,
so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have
to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the
string.- Hide quoted text -

- Show quoted text -
I wrote a similar test application on Solaris 10 Sparc with g++3.4.3.
It clearly shows, that the reserve thing is surely going to help my
performance.

Thanks a ton for your inputs. Will let you know once the test is
successful.

I started appending 2 characters a time in a loop of 10.

This is the output without reserve

Start With :0
Current Size:2 Current Capacity:2
Current Size:4 Current Capacity:4
Current Size:6 Current Capacity:6
Current Size:8 Current Capacity:8
Current Size:10 Current Capacity:10
Current Size:12 Current Capacity:12
Current Size:14 Current Capacity:14
Current Size:16 Current Capacity:16
Current Size:18 Current Capacity:18
Current Size:20 Current Capacity:20

This is the output with reserve of 20

Start With :20

Thanks,
Bala

======================

OUCH! No wonder you are having performance issues! Your platform isn't
preallocating ANY extra space! This means every time you add a character it
has to reallocate memory. Very bad. Yes, in your case .reserve() should
help TREMENDOUSLY.

That, in my opionion, is a very bad implementation of std::string.
Nov 2 '07 #18
On 2007-11-02 04:21, Tristan Wibberley wrote:
On Thu, 2007-11-01 at 14:37 -0700, Jim Langston wrote:
>It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string.

Sounds like something Knuth would have noticed, but I can't find it in
TAoCP. Recognising its Knuthiness and the use of the golden ratio, it
looks like a Fibonacci thing:

+-+
|X|
+-+

+---+-+
|X X| |
+---+-+

+-----+
|X X X|
+-----+

+---------+-----+
|X X X X X| |
+---------+-----+

+---------------+
|X X X X X X X X|
+---------------+

+-------------------------+---------------+
|X X X X X X X X X X X X X |
+-------------------------+---------------+

+-----------------------------------------+
|X X X X X X X X X X X X X X X X X X X X X|
+-----------------------------------------+
This seems like a good idea if you can use a smart implementation of
realloc. I have never written any allocators so I do not know much about
it but it was my belief that most implementations of realloc simply
looked to see if there was enough extra space "in front" to satisfy the
request (so that no data have to be copied), but I suppose there might
be some implementations that looks "behind" also.

However would that work in a C++ collection where the elements have to
be copy-constructed and simple copying wont work. It was my belief that
the container must first allocate the new memory, copy-construct the old
elements, add the new, and first then can it get rid of the old
allocation. To support a scheme like above the container must assume
that the two allocations overlap but to my knowledge the C++ allocators
can not allocate overlapping memory areas.

Perhaps another use for the ~ 1.6 ratio is to avoid unnecessary wasted
space in the form of over allocation?

--
Erik Wikström
Nov 2 '07 #19
On Nov 2, 6:51 am, Erik Wikström <Erik-wikst...@telia.comwrote:
On 2007-11-02 04:21, Tristan Wibberley wrote:


On Thu, 2007-11-01 at 14:37 -0700, Jim Langston wrote:
It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it,there
is a hole in the first 100 bytes, and the next 200 bytes are allocated.. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see,it
can never reuse the memory from the previous allocations since they have to
be continguous.
So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string.
Sounds like something Knuth would have noticed, but I can't find it in
TAoCP. Recognising its Knuthiness and the use of the golden ratio, it
looks like a Fibonacci thing:
+-+
|X|
+-+
+---+-+
|X X| |
+---+-+
+-----+
|X X X|
+-----+
+---------+-----+
|X X X X X| |
+---------+-----+
+---------------+
|X X X X X X X X|
+---------------+
+-------------------------+---------------+
|X X X X X X X X X X X X X |
+-------------------------+---------------+
+-----------------------------------------+
|X X X X X X X X X X X X X X X X X X X X X|
+-----------------------------------------+

This seems like a good idea if you can use a smart implementation of
realloc. I have never written any allocators so I do not know much about
it but it was my belief that most implementations of realloc simply
looked to see if there was enough extra space "in front" to satisfy the
request (so that no data have to be copied), but I suppose there might
be some implementations that looks "behind" also.

However would that work in a C++ collection where the elements have to
be copy-constructed and simple copying wont work. It was my belief that
the container must first allocate the new memory, copy-construct the old
elements, add the new, and first then can it get rid of the old
allocation. To support a scheme like above the container must assume
that the two allocations overlap but to my knowledge the C++ allocators
can not allocate overlapping memory areas.

Perhaps another use for the ~ 1.6 ratio is to avoid unnecessary wasted
space in the form of over allocation?

--
Erik Wikström- Hide quoted text -

- Show quoted text -
Another observation i found when i was browsing through the
std::string class is that probably i should be using append instead of
a +=. I have a strong feeling there could be benefit of maybe append
using a reference whereas += creating a local variable and then using
it.
Any comments on this front?

Nov 2 '07 #20
On 2007-11-02 14:09, Bala wrote:
On Nov 2, 6:51 am, Erik Wikström <Erik-wikst...@telia.comwrote:
>On 2007-11-02 04:21, Tristan Wibberley wrote:


On Thu, 2007-11-01 at 14:37 -0700, Jim Langston wrote:
>It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have to
be continguous.
>So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string.
Sounds like something Knuth would have noticed, but I can't find it in
TAoCP. Recognising its Knuthiness and the use of the golden ratio, it
looks like a Fibonacci thing:
+-+
|X|
+-+
+---+-+
|X X| |
+---+-+
+-----+
|X X X|
+-----+
+---------+-----+
|X X X X X| |
+---------+-----+
+---------------+
|X X X X X X X X|
+---------------+
+-------------------------+---------------+
|X X X X X X X X X X X X X |
+-------------------------+---------------+
+-----------------------------------------+
|X X X X X X X X X X X X X X X X X X X X X|
+-----------------------------------------+

This seems like a good idea if you can use a smart implementation of
realloc. I have never written any allocators so I do not know much about
it but it was my belief that most implementations of realloc simply
looked to see if there was enough extra space "in front" to satisfy the
request (so that no data have to be copied), but I suppose there might
be some implementations that looks "behind" also.

However would that work in a C++ collection where the elements have to
be copy-constructed and simple copying wont work. It was my belief that
the container must first allocate the new memory, copy-construct the old
elements, add the new, and first then can it get rid of the old
allocation. To support a scheme like above the container must assume
that the two allocations overlap but to my knowledge the C++ allocators
can not allocate overlapping memory areas.

Perhaps another use for the ~ 1.6 ratio is to avoid unnecessary wasted
space in the form of over allocation?
Please do not quote signature.
Another observation i found when i was browsing through the
std::string class is that probably i should be using append instead of
a +=. I have a strong feeling there could be benefit of maybe append
using a reference whereas += creating a local variable and then using
it.
Any comments on this front?
It depends, for concatenating two strings they should be equal in
performance, but if you add string literals, C strings, or just a char
then append is probably faster.

--
Erik Wikström
Nov 2 '07 #21
On Nov 2, 9:44 am, Erik Wikström <Erik-wikst...@telia.comwrote:
On 2007-11-02 14:09, Bala wrote:


On Nov 2, 6:51 am, Erik Wikström <Erik-wikst...@telia.comwrote:
On 2007-11-02 04:21, Tristan Wibberley wrote:
On Thu, 2007-11-01 at 14:37 -0700, Jim Langston wrote:
It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800,300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since theyhave to
be continguous.
So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string.
Sounds like something Knuth would have noticed, but I can't find it in
TAoCP. Recognising its Knuthiness and the use of the golden ratio, it
looks like a Fibonacci thing:
+-+
|X|
+-+
+---+-+
|X X| |
+---+-+
+-----+
|X X X|
+-----+
+---------+-----+
|X X X X X| |
+---------+-----+
+---------------+
|X X X X X X X X|
+---------------+
+-------------------------+---------------+
|X X X X X X X X X X X X X |
+-------------------------+---------------+
+-----------------------------------------+
|X X X X X X X X X X X X X X X X X X X X X|
+-----------------------------------------+
This seems like a good idea if you can use a smart implementation of
realloc. I have never written any allocators so I do not know much about
it but it was my belief that most implementations of realloc simply
looked to see if there was enough extra space "in front" to satisfy the
request (so that no data have to be copied), but I suppose there might
be some implementations that looks "behind" also.
However would that work in a C++ collection where the elements have to
be copy-constructed and simple copying wont work. It was my belief that
the container must first allocate the new memory, copy-construct the old
elements, add the new, and first then can it get rid of the old
allocation. To support a scheme like above the container must assume
that the two allocations overlap but to my knowledge the C++ allocators
can not allocate overlapping memory areas.
Perhaps another use for the ~ 1.6 ratio is to avoid unnecessary wasted
space in the form of over allocation?

Please do not quote signature.
Another observation i found when i was browsing through the
std::string class is that probably i should be using append instead of
a +=. I have a strong feeling there could be benefit of maybe append
using a reference whereas += creating a local variable and then using
it.
Any comments on this front?

It depends, for concatenating two strings they should be equal in
performance, but if you add string literals, C strings, or just a char
then append is probably faster.

--
Erik Wikström- Hide quoted text -

- Show quoted text -
Yeah, most of the data that i append are one of the following.
1. String literals
2. C style arrays.
3. Chars.

So i think, i should change the code to use append instead and then
again check the performance.
Nov 2 '07 #22
On 2007-11-02 14:44, Erik Wikström wrote:
On 2007-11-02 14:09, Bala wrote:
>On Nov 2, 6:51 am, Erik Wikström <Erik-wikst...@telia.comwrote:
>>On 2007-11-02 04:21, Tristan Wibberley wrote:

On Thu, 2007-11-01 at 14:37 -0700, Jim Langston wrote:

It was explained to me once. Consider a string with an initial capacity of
100. There are 100 bytes of memory allocated. Then if you double it, there
is a hole in the first 100 bytes, and the next 200 bytes are allocated. Now
you double it again to 400. 100 is not enough so it can't use the hole, so
it allocates 400 bytes,leaving a 300 byte hole. Double again, 800, 300 is
not enough, it allcoates 800, leaving a 700 byte hole. As we can see, it
can never reuse the memory from the previous allocations since they have to
be continguous.

So that's where the 1.6 comes in. That with future allocations eventually
the system will be able to reuse previous memory allocated for the string.

Sounds like something Knuth would have noticed, but I can't find it in
TAoCP. Recognising its Knuthiness and the use of the golden ratio, it
looks like a Fibonacci thing:

+-+
|X|
+-+

+---+-+
|X X| |
+---+-+

+-----+
|X X X|
+-----+

+---------+-----+
|X X X X X| |
+---------+-----+

+---------------+
|X X X X X X X X|
+---------------+

+-------------------------+---------------+
|X X X X X X X X X X X X X |
+-------------------------+---------------+

+-----------------------------------------+
|X X X X X X X X X X X X X X X X X X X X X|
+-----------------------------------------+

This seems like a good idea if you can use a smart implementation of
realloc. I have never written any allocators so I do not know much about
it but it was my belief that most implementations of realloc simply
looked to see if there was enough extra space "in front" to satisfy the
request (so that no data have to be copied), but I suppose there might
be some implementations that looks "behind" also.

However would that work in a C++ collection where the elements have to
be copy-constructed and simple copying wont work. It was my belief that
the container must first allocate the new memory, copy-construct the old
elements, add the new, and first then can it get rid of the old
allocation. To support a scheme like above the container must assume
that the two allocations overlap but to my knowledge the C++ allocators
can not allocate overlapping memory areas.

Perhaps another use for the ~ 1.6 ratio is to avoid unnecessary wasted
space in the form of over allocation?

Please do not quote signature.
>Another observation i found when i was browsing through the
std::string class is that probably i should be using append instead of
a +=. I have a strong feeling there could be benefit of maybe append
using a reference whereas += creating a local variable and then using
it.
Any comments on this front?

It depends, for concatenating two strings they should be equal in
performance, but if you add string literals, C strings, or just a char
then append is probably faster.
Actually, I think that on a good implementation there should be no
difference in performance between += and append(). To see whether you
can gain anything by using append() instead of += you have to measure.

--
Erik Wikström
Nov 2 '07 #23
Erik Wikström wrote:
On 2007-11-02 04:21, Tristan Wibberley wrote:
>Recognising its Knuthiness and the use of the golden ratio, it
looks like a Fibonacci thing:
[redacted]

This seems like a good idea if you can use a smart implementation of
realloc. I have never written any allocators so I do not know much about
it but it was my belief that most implementations of realloc simply
looked to see if there was enough extra space "in front" to satisfy the
request (so that no data have to be copied), but I suppose there might
be some implementations that looks "behind" also.
Many years ago, I implemented a Fibonacci buddy system allocator, based
on an algorithm published in JACM in June 1975: "A simplified
recombination scheme for the Fibonacci buddy system".
Overhead was 2 bits + block size indicator per allocated block (wound up
being 1 byte total).
Nov 2 '07 #24
On Fri, 2 Nov 2007 01:54:46 -0700, "Jim Langston" wrote:
>
OUCH! No wonder you are having performance issues! Your platform isn't
preallocating ANY extra space! This means every time you add a character it
has to reallocate memory. Very bad. Yes, in your case .reserve() should
help TREMENDOUSLY.
When you extensively use std::string you start programming against an
implementation, not an iterface.
>That, in my opionion, is a very bad implementation of std::string.
Have you looked at the Microsoft/Dinkumware implementation?
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch
Nov 2 '07 #25
"Roland Pibinger" <rp*****@yahoo.comwrote in message
news:47***************@news.utanet.at...
On Fri, 2 Nov 2007 01:54:46 -0700, "Jim Langston" wrote:
>>
OUCH! No wonder you are having performance issues! Your platform isn't
preallocating ANY extra space! This means every time you add a character
it
has to reallocate memory. Very bad. Yes, in your case .reserve() should
help TREMENDOUSLY.

When you extensively use std::string you start programming against an
implementation, not an iterface.
>>That, in my opionion, is a very bad implementation of std::string.

Have you looked at the Microsoft/Dinkumware implementation?
I am using Microsoft Visual C++ .net 2005.

Mine is doing the 1.6 allocation. This is better than this one by far.
Nov 3 '07 #26

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Eric Boutin | last post by:
Hi ! I have a strange problem with a std::ostringstream.. code : #include <sstream> /*...*/ std::ostringstream ss(); ss << "\"\"" << libpath << "\"\" \"\"" << argfilename << "\"\"...
3
by: Alexandros Frantzis | last post by:
Hello, I am trying to read a big file into a string. AFAIK std::string can read in words or "lines" from a stream. So one option is to continuously append all the "lines" from the file to the...
3
by: Chris | last post by:
Hi, I'm playing/developing a simple p2p file sharing client for an existing protocol. A 2 second description of the protocol is Message Length = int, 4 bytes Message Code = int, 4 bytes...
6
by: Christopher Benson-Manica | last post by:
Is there anything that one can do to a std::ostringstream that would make its destructor explode? I'm basically doing { std::ostringstream ss, msg; // do BUNCHES of stuff with ss and msg ...
4
by: Dave | last post by:
Hello all, The scheme shown below to move a text file's contents into a std::string works with one exception: it drops the carriage return and line feed characters. How may I, in a...
13
by: Glen Able | last post by:
Obviously the following doesn't work: int i = 5; std::string myString = "Number is " + i + " thankyou please"; So can anyone give me some idea what's the nicest way to this sort...
4
by: JKop | last post by:
The following isn't doing what I thought it would do: #include <iostream> int main() { int k = 56; std::string blah(" ");
11
by: Frank Neuhaus | last post by:
Hey Iam occaisonally using something like char buffer; sprinf(buffer,"Hello test %d %s",someint,somestring); Is there any _convenient_ and buffer overflow secure equivalent for that using...
5
by: probstm | last post by:
I am using a message handler class that implements amongst others: static void message ( std::string aHeader, std::string aMessage, int aTimeout = 0, MessageType aFlag = MSG_Information ); ...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.