By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,682 Members | 1,987 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,682 IT Pros & Developers. It's quick & easy.

Efficient string concatenation methods

P: n/a
As a realtive python newb, but an old hack in general, I've been
interested in the impact of having string objects (and other
primitives) be immutable. It seems to me that string concatenation is
a rather common operation, and in Python having immutable strings
results in a performance gotcha for anyone not aware of the impact of
doing lots of concatenation in the obvious way.

I found several sources with advice for how to do concatenation in a
pythonic way (e.g. ref#1), but I hadn't seen any measurements or
comparisons. So, I put together a little test case and ran it through
for six different methods. Here's the results, and my conclusions:

http://www.skymind.com/~ocrow/python_string/

I'd be happy to hear if anyone else has done similar tests and if
there are any other good candidate methods that I missed.

ref #1: http://manatee.mojam.com/~skip/pytho...html#stringcat

Oliver
Jul 18 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Oliver Crow wrote:
http://www.skymind.com/~ocrow/python_string/

I'd be happy to hear if anyone else has done similar tests and if
there are any other good candidate methods that I missed.


You left out the StringIO module (having done only the cStringIO
version of that).

Note also that, for any of the ones which do method calls,
you can speed up the call by saving a reference to the
bound method in a local variable. For example, in method 4
you can do "app_list = str_list.append" and then use
"app_list(`num`)" instead of str_list.append(`num`). This
saves an attribute lookup on each loop iteration. It's
not "idiomatic" to do so except in (a) cases of optimization
obsession, or (b) benchmarks. ;-)

Interesting and useful results. Thanks! :-)

-Peter
Jul 18 '05 #2

P: n/a
..................................
I was curious about this like an hour ago and googled for it, and hit
your page. Thanks! It was quite helpful.
Jul 18 '05 #3

P: n/a
Oliver Crow <oc***@skymind.com> wrote:
As a realtive python newb, but an old hack in general, I've been
interested in the impact of having string objects (and other
primitives) be immutable. It seems to me that string concatenation is
a rather common operation, and in Python having immutable strings
results in a performance gotcha for anyone not aware of the impact of
doing lots of concatenation in the obvious way.

I found several sources with advice for how to do concatenation in a
pythonic way (e.g. ref#1), but I hadn't seen any measurements or
comparisons. So, I put together a little test case and ran it through
for six different methods. Here's the results, and my conclusions:

http://www.skymind.com/~ocrow/python_string/

I'd be happy to hear if anyone else has done similar tests and if
there are any other good candidate methods that I missed.

ref #1: http://manatee.mojam.com/~skip/pytho...html#stringcat

Oliver


Try printing the integers to a file, then read it back. Should be
similar to Method 5.

--
William Park, Open Geometry Consulting, <op**********@yahoo.ca>
Linux solution/training/migration, Thin-client
Jul 18 '05 #4

P: n/a
Oliver Crow wrote:
I'd be happy to hear if anyone else has done similar tests and if
there are any other good candidate methods that I missed.


I'd like to try out another variant, but I'm unable to run your
script... Where does that timing module come from? I can't find it anywhere.

The method I propose is simply

def method7():
return ''.join(map(str, xrange(loop_count)))

I ran my own little test and it seems to be faster than method6, but I'd
like to run it in your script for more reliable results. Also, I haven't
done any memory measurement, only a timing.

--
"Codito ergo sum"
Roel Schroeven
Jul 18 '05 #5

P: n/a
Oliver Crow <oc***@skymind.com> wrote:
I found several sources with advice for how to do concatenation in a
pythonic way (e.g. ref#1), but I hadn't seen any measurements or
comparisons. So, I put together a little test case and ran it through
for six different methods. Here's the results, and my conclusions:

http://www.skymind.com/~ocrow/python_string/

I'd be happy to hear if anyone else has done similar tests and if
there are any other good candidate methods that I missed.


This

def method4():
str_list = []
for num in xrange(loop_count):
str_list.append(`num`)
return ''.join(str_list)

will run slightly faster modified to this:

def method4():
str_list = []
append = str_list.append
for num in xrange(loop_count):
append(`num`)
return ''.join(str_list)

by factoring the method lookup out of the loop. Ditto for 3 and 5.

Terry J. Reedy

PS. Changing IE's View/TextSize changes size of header fonts but not body
text, which your CSS apparently fixes at a size a bit small for me on
current system.


Jul 18 '05 #6

P: n/a
Roel Schroeven <rs****************@fastmail.fm> wrote in message news:<Mc*********************@phobos.telenet-ops.be>...

I'd like to try out another variant, but I'm unable to run your
script... Where does that timing module come from? I can't find it anywhere.
It's George Neville-Neil's timing module. It looks like it used to be
part of the python library, but has been removed in recent versions.
I'm not sure why. Perhaps timeit is the new preferred module.

The method I propose is simply

def method7():
return ''.join(map(str, xrange(loop_count)))

I ran my own little test and it seems to be faster than method6, but I'd
like to run it in your script for more reliable results. Also, I haven't
done any memory measurement, only a timing.


I'll add that test and rerun the results.

Thanks for the suggestion!

Oliver
Jul 18 '05 #7

P: n/a
Peter Hansen <pe***@engcorp.com> wrote in message news:<s8********************@powergate.ca>...
You left out the StringIO module (having done only the cStringIO
version of that).
I should probably add that one just for reference. I left it out
originally because my instinct was that it would perform less well
than the string += operator. I think it uses ordinary immutable
python strings for internal storage.
Note also that, for any of the ones which do method calls,
you can speed up the call by saving a reference to the
bound method in a local variable. For example, in method 4
you can do "app_list = str_list.append" and then use
"app_list(`num`)" instead of str_list.append(`num`). This
saves an attribute lookup on each loop iteration. It's
not "idiomatic" to do so except in (a) cases of optimization
obsession, or (b) benchmarks. ;-)
I hadn't thought of this, although it makes sense. It looks like I
could do this in methods 3, 4 and 5. But I also feel that it makes
the code a little less readable.

I think the unstated goal I had was to find a method that could be
learned by python programmers and used in real programs without having
to think *too* hard about the various performance trade-offs. So, in
that spirit I should definitely measure the difference, but perhaps
not go so far as to recommend it as part of the best overall approach.
Interesting and useful results. Thanks! :-)


Thanks!
Oliver
Jul 18 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.