473,385 Members | 1,752 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

String concatenation

Is it true that joining the string elements of a list is faster than
concatenating them via the '+' operator?

"".join(['a', 'b', 'c'])

vs

'a'+'b'+'c'

If so, can anyone explain why?

\\ jonas galvez
// jonasgalvez.com

Jul 18 '05 #1
5 3607
Jonas Galvez wrote:
Is it true that joining the string elements of a list is faster than
concatenating them via the '+' operator?

"".join(['a', 'b', 'c'])

vs

'a'+'b'+'c'

If so, can anyone explain why?


It's because the latter one has to build a temporary
string consisting of 'ab' first, then the final string
with 'c' added, while the join can (and probably does) add up
all the lengths of the strings to be joined and build the final
string all in one go.

Note that there's also '%s%s%s' % ('a', 'b', 'c'), which is
probably on par with the join technique for both performance
and lack of readability.

Note much more importantly, however, that you should probably
not pick the join approach over the concatenation approach
based on performance. Concatenation is more readable in the
above case (ignoring the fact that it's a contrived example),
as you're being more explicit about your intentions.

The reason joining lists is popular is because of the
terribly bad performance of += when one is gradually building
up a string in pieces, rather than appending to a list and
then doing join at the end.

So

l = []
l.append('a')
l.append('b')
l.append('c')
s = ''.join(l)

is _much_ faster (therefore better) in real-world cases than

s = ''
s += 'a'
s += 'b'
s += 'c'

With the latter, if you picture longer and many more strings,
and realize that each += causes a new string to be created
consisting of the contents of the two old strings joined together,
steadily growing longer and requiring lots of wasted copying,
you can see why it's very bad on memory and performance.

The list approach doesn't copy the strings at all, but just
holds references to them in a list (which does grow in a
similar but much more efficient manner). The join figures
out the sizes of all of the strings and allocates enough
space to do only a single copy from each.

Again though, other than the += versus .append() case, you should
probably not pick ''.join() over + since readability will
suffer more than your performance will improve.

-Peter
Jul 18 '05 #2
Peter Hansen <pe***@engcorp.com> wrote in news:xvydnWNN7t2X50vdRVn-
gw@powergate.ca:
Jonas Galvez wrote:
Is it true that joining the string elements of a list is faster than
concatenating them via the '+' operator?

Note that there's also '%s%s%s' % ('a', 'b', 'c'), which is
probably on par with the join technique for both performance
and lack of readability.


A few more points.

Yes, the format string in this example isn't the clearest, but if you have
a case where some of the strings are fixed and others vary, then the format
string can be the clearest.

e.g.

'<a href="%s" alt="%s">%s</a>' % (uri, alt, text)

rather than:

'<a href="'+uri+'" alt="'+alt+'">'+text+'</a>'

In many situations I find I use a combination of all three techniques.
Build a list of strings to be concatenated to produce the final output, but
each of these strings might be built from a format string or simple
addition as above.

On the readability of ''.join(), I would suggest never writing it more than
once. That means I tend to do something like:

concatenate = ''.join
...
concatenate(myList)

Or

def concatenate(*args):
return ''.join(args)
...
concatenate('a', 'b', 'c')

depending on how it is to be used.

It's also worth saying that a lot of the time you find you don't want the
empty separator at all, (e.g. maybe newline is more appropriate), and in
this case the join really does become easier than simple addition, but
again it is worth wrapping it so that your intention at the point of call
is clear.

Finally, a method call on a bare string (''.join, or '\n'.join) looks
sufficiently bad that if, for some reason, you don't want to give it a name
as above, I would suggest using the alternative form for calling it:

str.join('\n', aList)

rather than:

'\n'.join(aList)
Jul 18 '05 #3
Peter Hansen wrote:
Jonas Galvez wrote:
Is it true that joining the string elements of a list is faster than
concatenating them via the '+' operator?

"".join(['a', 'b', 'c'])

vs

'a'+'b'+'c'

If so, can anyone explain why?

It's because the latter one has to build a temporary
string consisting of 'ab' first, then the final string
with 'c' added, while the join can (and probably does) add up
all the lengths of the strings to be joined and build the final
string all in one go.


Idea sprang to mind: Often (particularly in generating web pages) one
wants to do lots of += without thinking about "".join.
So what about creating a class that will do this quickly?
The following class does this and is much faster when adding together
lots of strings. Only seem to see performance gains above about 6000
strings...

David

class faststr(str):
def __init__(self, *args, **kwargs):
self.appended = []
str.__init__(self, *args, **kwargs)
def __add__(self, otherstr):
self.appended.append(otherstr)
return self
def getstr(self):
return str(self) + "".join(self.appended)

def testadd(start, n):
for i in range(n):
start += str(i)
if hasattr(start, "getstr"):
return start.getstr()
else:
return start

if __name__ == "__main__":
import sys
if len(sys.argv) >= 3 and sys.argv[2] == "fast":
start = faststr("test")
else:
start = "test"
s = testadd(start, int(sys.argv[1]))
Jul 18 '05 #4

Let's try this :

def test_concat():
s = ''
for i in xrange( test_len ):
s += str( i )
return s

def test_join():
s = []
for i in xrange( test_len ):
s.append( str( i ))
return ''.join(s)

def test_join2():
return ''.join( map( str, range( test_len ) ))

Results, with and without psyco :
test_len = 1000
String concatenation (normal) 4.85290050507 ms.
[] append + join (normal) 4.27646517754 ms.
map + join (normal) 2.37970948219 ms.

String concatenation (psyco) 2.0838675499 ms.
[] append + join (psyco) 2.29129695892 ms.
map + join (psyco) 2.21130692959 ms.

test_len = 5000
String concatenation (normal) 40.3251230717 ms.
[] append + join (normal) 23.3911275864 ms.
map + join (normal) 13.844203949 ms.

String concatenation (psyco) 9.65108215809 ms.
[] append + join (psyco) 13.0564379692 ms.
map + join (psyco) 13.342962265 ms.

test_len = 10000
String concatenation (normal) 163.02690506 ms.
[] append + join (normal) 47.6168513298 ms.
map + join (normal) 28.5276055336 ms.

String concatenation (psyco) 19.6494650841 ms.
[] append + join (psyco) 26.637775898 ms.
map + join (psyco) 26.7823898792 ms.

test_len = 20000
String concatenation (normal) 4556.57429695 ms.
[] append + join (normal) 92.0199871063 ms.
map + join (normal) 56.7145824432 ms.

String concatenation (psyco) 42.247030735 ms.
[] append + join (psyco) 58.3201909065 ms.
map + join (psyco) 53.8239884377 ms.
Conclusion :

- join is faster but worth the annoyance only if you join 1000s of strings
- map is useful
- psyco makes join useless if you can use it (depends on which web
framework you use)
- python is really pretty fast even without psyco (it runs about one mips
!)

Note :

Did I mention psyco has a special optimization for string concatenation ?


Jul 18 '05 #5
Duncan Booth wrote:

[...]
Finally, a method call on a bare string (''.join, or '\n'.join) looks
sufficiently bad that if, for some reason, you don't want to give it a name
as above, I would suggest using the alternative form for calling it:

str.join('\n', aList)

rather than:

'\n'.join(aList)


This is, of course, pure prejudice. Not that there's anything wrong with
that ...

regards
Steve
Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
by: hagai26 | last post by:
I am looking for the best and efficient way to replace the first word in a str, like this: "aa to become" -> "/aa/ to become" I know I can use spilt and than join them but I can also use regular...
3
by: John Ford | last post by:
For simple string concatenation, is there a difference between... Dim s As String s += "add this to string" ....and... Dim s As String s = String.Concat(s, "add this to string")
9
by: Justin M. Keyes | last post by:
Hi, Please read carefully before assuming that this is the same old question about string concatenation in C#! It is well-known that the following concatenation produces multiple immutable...
16
by: Mark A. Sam | last post by:
Hello, I am having a problem with imputting into a string variable: Dim strSQL As String = "INSERT INTO tblContactForm1 (txtName, txtCompany, txtPhone, txtEmail, txtComment, chkGrower,...
33
by: genc_ymeri | last post by:
Hi over there, Propably this subject is discussed over and over several times. I did google it too but I was a little bit surprised what I read on internet when it comes 'when to use what'. Most...
12
by: Richard Lewis Haggard | last post by:
I thought that the whole point of StringBuilder was that it was supposed to be a faster way of building strings than string. However, I just put together a simple little application to do a...
34
by: Larry Hastings | last post by:
This is such a long posting that I've broken it out into sections. Note that while developing this patch I discovered a Subtle Bug in CPython, which I have discussed in its own section below. ...
10
by: =?Utf-8?B?RWxlbmE=?= | last post by:
I am surprised to discover that c# automatically converts an integer to a string when concatenating with the "+" operator. I thought c# was supposed to be very strict about types. Doesn't it seem...
34
by: raylopez99 | last post by:
StringBuilder better and faster than string for adding many strings. Look at the below. It's amazing how much faster StringBuilder is than string. The last loop below is telling: for adding...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.