470,831 Members | 1,668 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,831 developers. It's quick & easy.

which is better, string concatentation or substitution?

My initial feeling is that concatenation might take longer than
substitution, but that it is also easier to read:
def p(self, paragraph):
self.source += '<p>' + paragraph + '</p>\n\n'

vs.

def p(self, paragraph):
self.source += '<p>%s</p>\n\n' % paragraph

Is there a preference between these two ways?
May 8 '06 #1
16 1650
John Salerno wrote:
My initial feeling is that concatenation might take longer than
substitution
Doesn't look that way:

leif@ubuntu:~$ python -m timeit "'<p>%s</p>\n\n' % 'foobar'"
1000000 loops, best of 3: 0.6 usec per loop
leif@ubuntu:~$ python -m timeit "'<p>' + 'foobar' + '</p>\n\n'"
1000000 loops, best of 3: 0.358 usec per loop

but that it is also easier to read:


I prefer string formatting for readability, but it's a matter of
personal preference.
May 8 '06 #2
niether .join() is the fastest

May 8 '06 #3
fuzzylollipop wrote:
niether .join() is the fastest


Please quote what you're replying to.

No, it's the slowest:

leif@ubuntu:~$ python -m timeit "'<p>%s</p>\n\n' % 'foobar'"
1000000 loops, best of 3: 0.607 usec per loop
leif@ubuntu:~$ python -m timeit "'<p>' + 'foobar' + '</p>\n\n'"
1000000 loops, best of 3: 0.38 usec per loop
leif@ubuntu:~$ python -m timeit "''.join(['<p>', 'foobar', '</p>\n\n'])"
1000000 loops, best of 3: 0.817 usec per loop
May 8 '06 #4
Leif K-Brooks wrote:
fuzzylollipop wrote:
niether .join() is the fastest


Please quote what you're replying to.

No, it's the slowest:

leif@ubuntu:~$ python -m timeit "'<p>%s</p>\n\n' % 'foobar'"
1000000 loops, best of 3: 0.607 usec per loop
leif@ubuntu:~$ python -m timeit "'<p>' + 'foobar' + '</p>\n\n'"
1000000 loops, best of 3: 0.38 usec per loop
leif@ubuntu:~$ python -m timeit "''.join(['<p>', 'foobar', '</p>\n\n'])"
1000000 loops, best of 3: 0.817 usec per loop


If you are only concatenating a few strings together, then straight
concatenation will be faster, but when joining many strings together
concatenating strings can be much slower compared to join. In the OP's
original example:

def p(self, paragraph):
self.source += '<p>' + paragraph + '</p>\n\n'

it is the concatenation to self.source which is could become the
bottleneck, it doesn't really matter how the text of the paragraph
assembled.

For most purposes use what looks clearest at the time: it isn't worth the
hassle of obfuscating your code until you've identified a real cpu hog. On
the other hand, the str.join idiom is sufficiently common in Python that
sometimes it wins on clarity and simplicity as well. e.g. If you build a
list of lines to join then you don't have to repeat '\n' on the end of each
component line.

BTW, be careful using timeit. I nearly got caught out running your tests:

C:\Python25>python -m timeit "''.join(['<p>', 'foobar', '</p>\n\n'])"
1000000 loops, best of 3: 0.872 usec per loop

C:\Python25>python -m timeit "'<p>' + 'foobar' + '</p>\n\n'"
10000000 loops, best of 3: 0.049 usec per loop

C:\Python25>python -m timeit "'<p>%s</p>\n\n' % 'foobar'"
10000000 loops, best of 3: 0.0495 usec per loop

C:\Python25>cd \python24

C:\Python24>python -m timeit "''.join(['<p>', 'foobar', '</p>\n\n'])"
1000000 loops, best of 3: 1.05 usec per loop

C:\Python24>python -m timeit "'<p>' + 'foobar' + '</p>\n\n'"
1000000 loops, best of 3: 0.359 usec per loop

Spot the really fast concatenations in Python 2.5 which is now detecting
the constant strings and concatenating them once only. It also does that
for the string formatting which only leaves poor old join to actually do
any work in these tests.
May 8 '06 #5
In article <44*********************@news.astraweb.com>,
John Salerno <jo******@NOSPAMgmail.com> wrote:
My initial feeling is that concatenation might take longer than
substitution, but that it is also easier to read:
def p(self, paragraph):
self.source += '<p>' + paragraph + '</p>\n\n'

vs.

def p(self, paragraph):
self.source += '<p>%s</p>\n\n' % paragraph

Is there a preference between these two ways?


One may be marginally faster, but they both require copying the source
string, and are thus both O(n). If you're just doing one or a small fixed
number of these, it really doesn't matter. Pick whichever one you think is
easier to read.

On the other hand, if you're doing a lot of them (i.e. in a loop), the
entire loop will now be O(n^2), which is a killer. If that's the case,
what you want to do is accumulate the individual substrings in a list, then
join the list elements all at once:

parts = []
for paragraph in foo:
parts.append ('<p>')
parts.append (paragraph)
parts.append ('<p>\n\n')
# or maybe instead of that ...
# parts += ['<p>', paragraph, '<p>\n\n']

self.source = "".join (parts)

This only requires a single copy, and thus you're back to being O(n), which
beats the heck out of O(n^2).
May 8 '06 #6
Roy Smith wrote:
One may be marginally faster, but they both require copying the source
string, and are thus both O(n).
Sorry, I'm not familiar with the O(n) notation.
If you're just doing one or a small fixed
number of these, it really doesn't matter. Pick whichever one you think is
easier to read.


Thanks guys. I have a handful of methods that each do this task once per
call, so I suppose this counts as not a lot, at least not at one time.
And it seems like a good point that the real problem could be constantly
concatenating with self.source, rather than the smaller pieces being put
together.
May 8 '06 #7
Duncan Booth wrote:
If you build a
list of lines to join then you don't have to repeat '\n' on the end of each
component line.


How would that work? Wouldn't the last line in the list still need the
newlines?
May 8 '06 #8
John Salerno wrote:
Duncan Booth wrote:
If you build a
list of lines to join then you don't have to repeat '\n' on the end of
each component line.


How would that work? Wouldn't the last line in the list still need the
newlines?

chunks = ["alpha", "beta", "gamma"]
"\n".join(chunks) 'alpha\nbeta\ngamma'

You mean a '\n' after 'gamma'?
chunks.append("")
"\n".join(chunks)

'alpha\nbeta\ngamma\n'

Peter
May 8 '06 #9
John Salerno <jo******@NOSPAMgmail.com> wrote:
Roy Smith wrote:
One may be marginally faster, but they both require copying the source
string, and are thus both O(n).


Sorry, I'm not familiar with the O(n) notation.


OK, here's a quick tutorial to "big-oh" notation. It's a way of
measuring algorithmic complexity. The O stands for "on the Order
of".

For any algorithm, if you process n things (in the case of the strings
we're talking about, n would be the total number of characters in all
the strings), you can compute the number of steps it takes to complete
all the processing as some function of n.

For example, let's say a given algorithm took 4n^2 + 5n + 17 steps to
complete. It doesn't take much experimentation to prove to yourself
that for all but the very smallest values of n, the constant 17 is
completely insignificant. In fact, n doesn't have to get very big
before the 5n term becomes insignificant too. To a reasonable
approximation, you could say that the algorithm takes 4n^2 steps and
be close enough. For small values of n, this will be a bad
approximation, but it doesn't really matter for small values of n.
For large values of n (think hundreds, thousands or even millions),
it's just fine.

So, the first rule of O() notation is that when looking at how fast an
algorithm runs, you need only consider the highest order term
(i.e. the one with the biggest exponent).

But, it gets even better. Let's compare two algorithms, the one
above, which takes (approximately) 4n^2 steps, and another one which
takes 10n steps, for varous values of n:

n 10n 4n^2
--- ------ ------
1 10 4
2 20 16
3 30 36
4 40 64
5 50 100
6 60 144
7 70 196
8 80 256
9 90 324
10 100 400
11 110 484
12 120 576
13 130 676
14 140 784
15 150 900
16 160 1024
17 170 1156
18 180 1296
19 190 1444
20 200 1600

Notice that it doesn't take long for the fact that n^2 grows a lot
faster than n to swamp the fact that 10 is bigger than 4. So the next
step in simplification is to say that not only don't the lower-order
terms matter, but the coefficient on the highest order term doesn't
even matter. For a large enough n, all that matters is the exponent
(for the moment, I'm making a further simplification that all
algorithms can be described by polynomials with integer exponents).

So, we get things like:

O(n^0), which is almost always written as O(1). This is a "constant
time" algorithm, one which takes the same amount of steps to execute
no matter how big the input is. For example, in python, you can
write, "x = 'foo'". That assignment statement takes the same amount
of time no matter how long the string is. All of these execute in the
same number of steps:

x = ''
x = 'foo'
x = 'a very long string with lots and lots of characters'

We can say that "assignment is constant time", or "assignment is
O(1)".

The next step up is O(n). This is "linear time" algorithm. It takes
a number of steps which is directly proportional to the size of the
input. A good example might be "if 'x' in 'bar'". The obvious way to
implement this (and, I assume, the way it is implemented in Python),
is to loop over the string, comparing 'x' to each character in 'bar',
one by one. This takes as many steps as there are characters in the
string.

Next comes O(n^2), also knows as "quadratic time". This means if your
input is of size n, the algorithm will take n^2 steps to run.
Quadratic algorithms are generally considered very slow, and to be
avoided if at all possible.

Once you get used to thinking like this, it's easy to look at a piece
of code and say, "oh, that's quadratic", or "that's linear", and
instantly know which is faster for some some large input. And once
you've started thinking like that, you've made one of the big jumps
from thinking like a coder to thinking like a computer scientist (or,
if you prefer, like a software engineer).

Google "algorithmic complexity" or "big o notation" (perhaps spelled
"big oh") for more info. I would normally recommend Wikipedia, but I
just took a quick look at their "Big O notation" article, and noticed
it's full of formal mathematical gobbledygook which just serves to
obscure what is really a fairly easy concept.
May 8 '06 #10
Ted
Thank you Roy.

It seems if you lurk here long enough you eventually get all you
questions answered without even asking!
;-)

Roy Smith wrote:
John Salerno <jo******@NOSPAMgmail.com> wrote:
Roy Smith wrote:
One may be marginally faster, but they both require copying the source
string, and are thus both O(n).


Sorry, I'm not familiar with the O(n) notation.


OK, here's a quick tutorial to "big-oh" notation. It's a way of
measuring algorithmic complexity. The O stands for "on the Order
of".

For any algorithm, if you process n things (in the case of the strings
we're talking about, n would be the total number of characters in all
the strings), you can compute the number of steps it takes to complete
all the processing as some function of n.

For example, let's say a given algorithm took 4n^2 + 5n + 17 steps to
complete. It doesn't take much experimentation to prove to yourself
that for all but the very smallest values of n, the constant 17 is
completely insignificant. In fact, n doesn't have to get very big
before the 5n term becomes insignificant too. To a reasonable
approximation, you could say that the algorithm takes 4n^2 steps and
be close enough. For small values of n, this will be a bad
approximation, but it doesn't really matter for small values of n.
For large values of n (think hundreds, thousands or even millions),
it's just fine.

So, the first rule of O() notation is that when looking at how fast an
algorithm runs, you need only consider the highest order term
(i.e. the one with the biggest exponent).

But, it gets even better. Let's compare two algorithms, the one
above, which takes (approximately) 4n^2 steps, and another one which
takes 10n steps, for varous values of n:

n 10n 4n^2
--- ------ ------
1 10 4
2 20 16
3 30 36
4 40 64
5 50 100
6 60 144
7 70 196
8 80 256
9 90 324
10 100 400
11 110 484
12 120 576
13 130 676
14 140 784
15 150 900
16 160 1024
17 170 1156
18 180 1296
19 190 1444
20 200 1600

Notice that it doesn't take long for the fact that n^2 grows a lot
faster than n to swamp the fact that 10 is bigger than 4. So the next
step in simplification is to say that not only don't the lower-order
terms matter, but the coefficient on the highest order term doesn't
even matter. For a large enough n, all that matters is the exponent
(for the moment, I'm making a further simplification that all
algorithms can be described by polynomials with integer exponents).

So, we get things like:

O(n^0), which is almost always written as O(1). This is a "constant
time" algorithm, one which takes the same amount of steps to execute
no matter how big the input is. For example, in python, you can
write, "x = 'foo'". That assignment statement takes the same amount
of time no matter how long the string is. All of these execute in the
same number of steps:

x = ''
x = 'foo'
x = 'a very long string with lots and lots of characters'

We can say that "assignment is constant time", or "assignment is
O(1)".

The next step up is O(n). This is "linear time" algorithm. It takes
a number of steps which is directly proportional to the size of the
input. A good example might be "if 'x' in 'bar'". The obvious way to
implement this (and, I assume, the way it is implemented in Python),
is to loop over the string, comparing 'x' to each character in 'bar',
one by one. This takes as many steps as there are characters in the
string.

Next comes O(n^2), also knows as "quadratic time". This means if your
input is of size n, the algorithm will take n^2 steps to run.
Quadratic algorithms are generally considered very slow, and to be
avoided if at all possible.

Once you get used to thinking like this, it's easy to look at a piece
of code and say, "oh, that's quadratic", or "that's linear", and
instantly know which is faster for some some large input. And once
you've started thinking like that, you've made one of the big jumps
from thinking like a coder to thinking like a computer scientist (or,
if you prefer, like a software engineer).

Google "algorithmic complexity" or "big o notation" (perhaps spelled
"big oh") for more info. I would normally recommend Wikipedia, but I
just took a quick look at their "Big O notation" article, and noticed
it's full of formal mathematical gobbledygook which just serves to
obscure what is really a fairly easy concept.


May 8 '06 #11
Roy Smith wrote:
OK, here's a quick tutorial to "big-oh" notation.


Wow, that was fantastic (and interesting)! Did you just write that? That
should be saved somewhere! Mind if I post it on my website? (don't
worry, no one goes there anyway) :)
May 8 '06 #12
In article <aM******************@news.tufts.edu>,
John Salerno <jo******@NOSPAMgmail.com> wrote:
Roy Smith wrote:
OK, here's a quick tutorial to "big-oh" notation.


Wow, that was fantastic (and interesting)! Did you just write that? That
should be saved somewhere! Mind if I post it on my website? (don't
worry, no one goes there anyway) :)


Sure, feel free. Once you post something on usenet, anybody can
pretty much do anything they want with it. I do prefer if people
don't use what I write to line birdcages with, but have no control
over that either :-)
May 8 '06 #13
Roy Smith wrote:
In article <aM******************@news.tufts.edu>,
John Salerno <jo******@NOSPAMgmail.com> wrote:
Roy Smith wrote:
OK, here's a quick tutorial to "big-oh" notation.

Wow, that was fantastic (and interesting)! Did you just write that? That
should be saved somewhere! Mind if I post it on my website? (don't
worry, no one goes there anyway) :)


Sure, feel free. Once you post something on usenet, anybody can
pretty much do anything they want with it. I do prefer if people
don't use what I write to line birdcages with, but have no control
over that either :-)


Thanks! And don't worry about that last part, no birds here. :)
May 8 '06 #14
ro*@panix.com (Roy Smith) wrote:
O(n^0), which is almost always written as O(1). This is a "constant
time" algorithm, one which takes the same amount of steps to execute
no matter how big the input is. For example, in python, you can
write, "x = 'foo'". That assignment statement takes the same amount
of time no matter how long the string is. All of these execute in the
same number of steps:

x = ''
x = 'foo'
x = 'a very long string with lots and lots of characters'

We can say that "assignment is constant time", or "assignment is
O(1)".


Constant time if converted to byte code or compiled?

O(n) in string length if being interpreted?
--
Regards,
Casey
May 10 '06 #15
In article <en********************************@4ax.com>,
Casey Hawthorne <ca***************@istar.ca> wrote:
ro*@panix.com (Roy Smith) wrote:
O(n^0), which is almost always written as O(1). This is a "constant
time" algorithm, one which takes the same amount of steps to execute
no matter how big the input is. For example, in python, you can
write, "x = 'foo'". That assignment statement takes the same amount
of time no matter how long the string is. All of these execute in the
same number of steps:

x = ''
x = 'foo'
x = 'a very long string with lots and lots of characters'

We can say that "assignment is constant time", or "assignment is
O(1)".


Constant time if converted to byte code or compiled?

O(n) in string length if being interpreted?


Compiled, byte-code, or interpreted has nothing to do with it.

Assignment would be O(n) if it involved copying the string (as it might in
C++ using std:string, for example), but in Python, assignnment simply
involves binding a new name to an existing object.
May 10 '06 #16
Ted wrote:
Thank you Roy.

It seems if you lurk here long enough you eventually get all you
questions answered without even asking!
;-)

+1 QOTW

<OT>
please avoid top-posting, and please avoid posting back a long message
just to add three lines.
</OT>

--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'o****@xiludom.gro'.split('@')])"
May 12 '06 #17

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

11 posts views Thread by Gobo Borz | last post: by
3 posts views Thread by Brian | last post: by
4 posts views Thread by ThunderMusic | last post: by
17 posts views Thread by ramadu | last post: by
8 posts views Thread by gthorpe | last post: by
6 posts views Thread by Generic Usenet Account | last post: by
13 posts views Thread by Tony Johansson | last post: by
reply views Thread by mihailmihai484 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.