473,856 Members | 1,690 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How naive is Python?

How naive (in the sense that compiler people use the term)
is the current Python system? For example:

def foo() :
s = "This is a test"
return(s)

s2 = foo()

How many times does the string get copied?

Or, for example:

s1 = "Test1"
s2 = "Test2"
s3 = "Test3"
s = s1 + s2 + s3

Any redundant copies performed, or is that case optimized?

How about this?

kcount = 1000
s = ''
for i in range(kcount) :
s += str(i) + ' '

Is this O(N) or O(N^2) because of recopying of "s"?

I just want a sense of what's unusually inefficient in the
current implementation. Thanks.

John Nagle
Jan 15 '07 #1
11 1264

JohnHow naive (in the sense that compiler people use the term) is the
Johncurrent Python system? For example:

John def foo() :
John s = "This is a test"
John return(s)

John s2 = foo()

JohnHow many times does the string get copied?

Never. s and s2 just refer to the same object (strings are immutable).
Executing this:

def foo() :
print id("This is a test")
s = "This is a test"
print id(s)
return(s)

s2 = foo()
print id(s2)

prints the same value three times.

JohnOr, for example:

John s1 = "Test1"
John s2 = "Test2"
John s3 = "Test3"
John s = s1 + s2 + s3

JohnAny redundant copies performed, or is that case optimized?

Not optimized. You can see that using the dis module:

4 0 LOAD_CONST 1 ('Test1')
3 STORE_FAST 0 (s1)

5 6 LOAD_CONST 2 ('Test2')
9 STORE_FAST 1 (s2)

6 12 LOAD_CONST 3 ('Test3')
15 STORE_FAST 2 (s3)

7 18 LOAD_FAST 0 (s1)
21 LOAD_FAST 1 (s2)
24 BINARY_ADD
25 LOAD_FAST 2 (s3)
28 BINARY_ADD
29 STORE_FAST 3 (s)
32 LOAD_CONST 0 (None)
35 RETURN_VALUE

The BINARY_ADD opcode creates a new string.

JohnHow about this?

John kcount = 1000
John s = ''
John for i in range(kcount) :
John s += str(i) + ' '

JohnIs this O(N) or O(N^2) because of recopying of "s"?

O(N). Here's a demonstration of that:

#!/usr/bin/env python

from __future__ import division

def foo(kcount):
s = ''
for i in xrange(kcount) :
s += str(i) + ' '

import time

for i in xrange(5,25):
t = time.time()
foo(2**i)
t = time.time() - t
print 2**i, t, t/2**i

On my laptop t roughly doubles for each iteration and prints around 5e-06
for t/2**i in all cases.

Skip

Jan 15 '07 #2
In article <hu************ ****@newssvr11. news.prodigy.ne t>,
John Nagle <na***@animats. comwrote:
How naive (in the sense that compiler people use the term)
is the current Python system? For example:

def foo() :
s = "This is a test"
return(s)

s2 = foo()

How many times does the string get copied?
All of those just move around pointers to the same (interned) string.
How about this?

kcount = 1000
s = ''
for i in range(kcount) :
s += str(i) + ' '

Is this O(N) or O(N^2) because of recopying of "s"?
This is a well-known (indeed, the canonical) example of quadratic behavior
in Python. The standard solution is to store all the strings (again,
really just pointers to the strings) in a list, then join all the elements:

temp = []
for i in range (1000):
temp.append (str(i))
s = "".join (temp)

That ends up copying each string once (during the join operation), and is
O(N) overall.
Jan 15 '07 #3
On Mon, 15 Jan 2007 03:25:01 +0000, John Nagle wrote:
How naive (in the sense that compiler people use the term)
is the current Python system? For example:

def foo() :
s = "This is a test"
return(s)

s2 = foo()

How many times does the string get copied?

Let's find out. Results below for Python 2.3 -- other versions may vary.
>>def foo():
.... s = "This is a test"
.... return s
....
>>def foo2():
.... return "This is a test"
....
>>import dis
dis.dis(foo )
2 0 LOAD_CONST 1 ('This is a test')
3 STORE_FAST 0 (s)

3 6 LOAD_FAST 0 (s)
9 RETURN_VALUE
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>dis.dis(foo 2)
2 0 LOAD_CONST 1 ('This is a test')
3 RETURN_VALUE
4 LOAD_CONST 0 (None)
7 RETURN_VALUE

foo and foo2 functions compile to different byte-code. foo does a little
more work than foo2, but it is likely to be a trivial difference.
>>s1 = foo()
s2 = foo()
s1 == s2, s1 is s2
(True, True)

So the string "This is a test" within foo is not copied each time the
function is called. However, the string "This is a test" is duplicated
between foo and foo2 (the two functions don't share the same string
instance):
>>s3 = foo2()
s3 == s1, s3 is s1
(True, False)
Or, for example:

s1 = "Test1"
s2 = "Test2"
s3 = "Test3"
s = s1 + s2 + s3

Any redundant copies performed, or is that case optimized?
I don't believe it is optimized. I believe that in Python 2.5 simple
numeric optimizations are performed, so that "x = 1 + 3" would compile to
"x = 4", but I don't think that holds for strings. If you are running 2.5,
you could find out with dis.dis.
How about this?

kcount = 1000
s = ''
for i in range(kcount) :
s += str(i) + ' '

Is this O(N) or O(N^2) because of recopying of "s"?
That will be O(N**2), except that CPython version 2.4 (or is it 2.5?) can,
sometimes, optimize it. Note that this is an implementation detail, and
doesn't hold for other Pythons like Jython, IronPython or PyPy.

--
Steven D'Aprano

Jan 15 '07 #4

sk**@pobox.com wrote:
JohnHow naive (in the sense that compiler people use the term) is the
Johncurrent Python system? For example:

John def foo() :
John s = "This is a test"
John return(s)

John s2 = foo()

JohnHow many times does the string get copied?

Never. s and s2 just refer to the same object (strings are immutable).
Executing this:

def foo() :
print id("This is a test")
s = "This is a test"
print id(s)
return(s)

s2 = foo()
print id(s2)

prints the same value three times.

JohnOr, for example:

John s1 = "Test1"
John s2 = "Test2"
John s3 = "Test3"
John s = s1 + s2 + s3

JohnAny redundant copies performed, or is that case optimized?

Not optimized. You can see that using the dis module:

4 0 LOAD_CONST 1 ('Test1')
3 STORE_FAST 0 (s1)

5 6 LOAD_CONST 2 ('Test2')
9 STORE_FAST 1 (s2)

6 12 LOAD_CONST 3 ('Test3')
15 STORE_FAST 2 (s3)

7 18 LOAD_FAST 0 (s1)
21 LOAD_FAST 1 (s2)
24 BINARY_ADD
25 LOAD_FAST 2 (s3)
28 BINARY_ADD
29 STORE_FAST 3 (s)
32 LOAD_CONST 0 (None)
35 RETURN_VALUE

The BINARY_ADD opcode creates a new string.

JohnHow about this?

John kcount = 1000
John s = ''
John for i in range(kcount) :
John s += str(i) + ' '

JohnIs this O(N) or O(N^2) because of recopying of "s"?

O(N). Here's a demonstration of that:

#!/usr/bin/env python

from __future__ import division

def foo(kcount):
s = ''
for i in xrange(kcount) :
s += str(i) + ' '

import time

for i in xrange(5,25):
t = time.time()
foo(2**i)
t = time.time() - t
print 2**i, t, t/2**i

On my laptop t roughly doubles for each iteration and prints around 5e-06
for t/2**i in all cases.
Sorry, Skip, but I find that very hard to believe. The foo() function
would take quadratic time if it were merely adding on pieces of
constant size -- however len(str(i)) is not a constant, it is
O(log10(i)), so the time should be super-quadratic. Following are the
results I got with Python 2.5, Windows XP Pro, 3.2Ghz Intel dual-core
processor with the last line changed to print i, 2**i, t, t/2**i coz I
can't do log2() in my head :-)

[snip]
18 262144 0.889999866486 3.39508005709e-006
19 524288 3.21900010109 6.13975544184e-006
20 1048576 12.2969999313 1.17273330034e-005
21 2097152 54.25 2.58684158325e-005
22 4194304 229.0 5.45978546143e-005
[k/b interrupt]

Cheers,
John

Jan 15 '07 #5
Roy Smith <ro*@panix.comw rote:
All of those just move around pointers to the same (interned) string.
Correct about the pointers, but the string is not interned:

Steven D'Aprano <st***@REMOVEME .cybersource.co m.auwrote:
>>>s1 = foo()
s2 = foo()
s1 == s2, s1 is s2
(True, True)

So the string "This is a test" within foo is not copied each time the
function is called. However, the string "This is a test" is duplicated
between foo and foo2 (the two functions don't share the same string
instance):
>>>s3 = foo2()
s3 == s1, s3 is s1
(True, False)
In this specific example the two functions don't share the same string, but
that won't always be the case: if the string had been interned this would
have printed (True, True).

e.g. Removing all the spaces from the string produces a string which is
interned.
>>def foo():
s = "Thisisates t"
return s
>>def foo2():
return "Thisisates t"
>>s1 = foo()
s2 = foo2()
s1 is s2
True

Bottom line, never make assumptions about when two literal strings will
share the same object and when they won't.
Jan 15 '07 #6

JohnSorry, Skip, but I find that very hard to believe. The foo()
Johnfunction would take quadratic time if it were merely adding on
Johnpieces of constant size -- however len(str(i)) is not a constant,
Johnit is O(log10(i)), so the time should be
Johnsuper-quadratic.

Sorry, I should have pointed out that I'm using the latest version of
Python. I believe += for strings has been optimized to simply extend s when
there is only a single reference to it.

Skip
Jan 15 '07 #7

JohnSorry, Skip, but I find that very hard to believe. The foo()
Johnfunction would take quadratic time if it were merely adding on
Johnpieces of constant size -- however len(str(i)) is not a constant,
Johnit is O(log10(i)), so the time should be super-quadratic.

meSorry, I should have pointed out that I'm using the latest version
meof Python. I believe += for strings has been optimized to simply
meextend s when there is only a single reference to it.

Actually, it isn't until I work my way back to 2.3 that I start to see
quadratic behavior:

% python2.6 strcopy.py
32 0.0002229213714 6 6.96629285812e-06
64 0.0009078979492 19 1.41859054565e-05
128 0.0006499290466 31 5.0775706768e-06
256 0.0011179447174 1 4.36697155237e-06
512 0.0026080608367 9 5.09386882186e-06
1024 0.0043799877166 7 4.27733175457e-06
2048 0.0092160701751 7 4.50003426522e-06
4096 0.0191979408264 4.68699727207e-06
8192 0.0694131851196 8.47328919917e-06
16384 0.0976829528809 5.96209429204e-06
32768 0.194766998291 5.94381708652e-06
% python2.5 strcopy.py
32 0.0004391670227 05 1.37239694595e-05
64 0.0003030300140 38 4.73484396935e-06
128 0.0006318092346 19 4.93600964546e-06
256 0.0011231899261 5 4.38746064901e-06
512 0.0027930736541 7 5.45522198081e-06
1024 0.0044639110565 2 4.35928814113e-06
2048 0.0095310211181 6 4.65381890535e-06
4096 0.0198018550873 4.83443727717e-06
8192 0.175454854965 2.14178289752e-05
16384 0.103327989578 6.30663998891e-06
32768 0.191411972046 5.84142981097e-06
% python2.4 strcopy.py
32 0.0002300739288 33 7.18981027603e-06
64 0.0003070831298 83 4.79817390442e-06
128 0.0011410713195 8 8.91461968422e-06
256 0.0011601448059 1 4.53181564808e-06
512 0.0023131370544 4 4.51784580946e-06
1024 0.0045900344848 6 4.48245555162e-06
2048 0.0097417831420 9 4.75673004985e-06
4096 0.019122838974 4.66866185889e-06
8192 0.0717267990112 8.75571276993e-06
16384 0.112125873566 6.84362021275e-06
32768 0.188065052032 5.73928991798e-06
% python2.3 strcopy.py
32 0.0002238750457 76 6.99609518051e-06
64 0.0003859996795 65 6.03124499321e-06
128 0.0007660388946 53 5.98467886448e-06
256 0.0015430450439 5 6.02751970291e-06
512 0.0030918121337 9 6.03869557381e-06
1024 0.0065929889679 6.43846578896e-06
2048 0.0147500038147 7.20215030015e-06
4096 0.14589214325 3.56181990355e-05
8192 0.778324127197 9.50102694333e-05
16384 3.20213103294 0.0001954425679 29
32768 14.7389230728 0.0004497962363 53
% python2.2 strcopy.py
32 0.0003159046173 1 9.87201929092e-06
64 0.0004940032958 98 7.71880149841e-06
128 0.0009021759033 2 7.04824924469e-06
256 0.0017321109771 7 6.76605850458e-06
512 0.0036261081695 6 7.08224251866e-06
1024 0.0071160793304 4 6.94929622114e-06
2048 0.0158200263977 7.7246222645e-06
4096 0.152237892151 3.71674541384e-05
8192 0.89648604393 0.0001094343315 34
16384 3.14483094215 0.0001919452479 34
32768 13.3367011547 0.0004070038194 19

Skip
Jan 15 '07 #8
sk**@pobox.com wrote:
JohnSorry, Skip, but I find that very hard to believe. The foo()
Johnfunction would take quadratic time if it were merely adding on
Johnpieces of constant size -- however len(str(i)) is not a constant,
Johnit is O(log10(i)), so the time should be
Johnsuper-quadratic.

Sorry, I should have pointed out that I'm using the latest version of
Python. I believe += for strings has been optimized to simply extend s when
there is only a single reference to it.
I'd heard that, but wasn't sure. That's the kind of answer I was
looking for.

That kind of optimization is a huge win.

John Nagle
Jan 15 '07 #9

sk**@pobox.com wrote:
JohnSorry, Skip, but I find that very hard to believe. The foo()
Johnfunction would take quadratic time if it were merely adding on
Johnpieces of constant size -- however len(str(i)) is not a constant,
Johnit is O(log10(i)), so the time should be super-quadratic.

meSorry, I should have pointed out that I'm using the latest version
meof Python. I believe += for strings has been optimized to simply
meextend s when there is only a single reference to it.
Sorry, I should have pointed out that you need to read up about
time.time() -- what is its resolution on your box? -- and time.clock()
and the timeit module.
>
Actually, it isn't until I work my way back to 2.3 that I start to see
quadratic behavior:

% python2.6 strcopy.py
32 0.0002229213714 6 6.96629285812e-06
64 0.0009078979492 19 1.41859054565e-05
128 0.0006499290466 31 5.0775706768e-06
256 0.0011179447174 1 4.36697155237e-06
512 0.0026080608367 9 5.09386882186e-06
1024 0.0043799877166 7 4.27733175457e-06
2048 0.0092160701751 7 4.50003426522e-06
4096 0.0191979408264 4.68699727207e-06
8192 0.0694131851196 8.47328919917e-06
16384 0.0976829528809 5.96209429204e-06
32768 0.194766998291 5.94381708652e-06
% python2.5 strcopy.py
32 0.0004391670227 05 1.37239694595e-05
64 0.0003030300140 38 4.73484396935e-06
128 0.0006318092346 19 4.93600964546e-06
256 0.0011231899261 5 4.38746064901e-06
512 0.0027930736541 7 5.45522198081e-06
1024 0.0044639110565 2 4.35928814113e-06
2048 0.0095310211181 6 4.65381890535e-06
4096 0.0198018550873 4.83443727717e-06
8192 0.175454854965 2.14178289752e-05
16384 0.103327989578 6.30663998891e-06
32768 0.191411972046 5.84142981097e-06
[snip]

Your ability to "see" quadratic behavoiur appears to be impaired by (1)
the low resolution and/or erratic nature of time.time() on your system
(2) stopping the experiment just before the results become interesting.

Try this with the latest *production* release (2.5):

C:\junk>cat fookount.py
def foo(kcount):
s = ''
for i in xrange(kcount) :
s += str(i) + ' '

C:\junk>for /L %n in (10,1,19) do \python25\pytho n -mtimeit -s"from
fookount import foo" "foo(2**%n) "

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**10) "
100 loops, best of 3: 1.1 msec per loop

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**11) "
100 loops, best of 3: 2.34 msec per loop

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**12) "
100 loops, best of 3: 4.79 msec per loop

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**13) "
10 loops, best of 3: 9.57 msec per loop

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**14) "
10 loops, best of 3: 19.8 msec per loop

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**15) "
10 loops, best of 3: 40.7 msec per loop

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**16) "
10 loops, best of 3: 82.1 msec per loop

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**17) "
10 loops, best of 3: 242 msec per loop

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**18) "
10 loops, best of 3: 886 msec per loop

C:\junk>\python 25\python -mtimeit -s"from fookount import foo"
"foo(2**19) "
10 loops, best of 3: 3.21 sec per loop

Cheers,
John

Jan 15 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1158
by: Gabriel B. | last post by:
Is it just me that can't find a full reference in the docs? I wanted a list of all the methods of dict for example... where can i find it? Thanks, and sorry if this question is just dumb, i really can't find it
5
1465
by: Waqas | last post by:
HI guys, sorry for being naive, but i have wasted alot of time searching for a undergrad. project but cannot come up with anything. can someone generously suggect me something so that i can move forward. I am, I think, in a lot of mess. :P I will be waiting for a response. :) takecare.
1
2787
by: Adam Monsen | last post by:
Say I have the following datetime object: >>> from datetime import datetime >>> d = datetime(2005, 8, 10, 15, 43) I happen to know this is a local time from the Netherlands, so I assume the tzinfo (if it were present) should indicate Central European Summer Time ("Summer" indicates daylight savings). How do I convert this date/time information to UTC?
9
1573
by: JohnSouth104 | last post by:
Hi I've an asp.net, c#, sql server website and the volumes are starting to rise quickly. I've been asked to look at improving the memory usage of the application. I've not looked at this before so these are probably naive questions: 1. If I close the sql connection every time do I need to close or dispose of the dataset, command or any other objects? 2. Should I use data adaptor instaed of dataset?
7
1521
by: sillyhat | last post by:
Hello, Can someone please help. I have come across code similar to this which I prepared as an example:- /*-------------8<------------------*/ #define ASIZE 10 int main() {
7
1426
by: Rory Campbell-Lange | last post by:
I have a number of web applications which have a large amount of their logic written in plpgsql. For each particular application I have a separate instance of a database. If I need to update functions I have to connect to each database and then run \i fn_function_update.sql. I imagined schemas might allow me to globally update functions across a database hosting many schemas with the same structure. In this scenario my webapp would...
2
1461
by: Rob Clark | last post by:
Hello all, I have a design problem for which my current C++ solution feels more like a hack. :-( I'd like to change that :-) The real problem can be destilled to something like this: One client wants to send data using one specific top layer protocol. At this top layer, there are a number of different protocols, where each one inherits the main Protocol class. And every protocol subclass in turn contains several different
3
1672
by: johan2sson | last post by:
The documentation for PyThreadState_SetAsyncExc says "To prevent naive misuse, you must write your own C extension to call this". Anyone care to list a few examples of such naive misuse? Johan
3
4895
by: pragy | last post by:
Hey, can any one help me for writing a program of naive gauss elimintaion technique? It's a technique to solve system of simultaneous linear equations using matrix. thanks
0
9916
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9762
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10696
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10782
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10384
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9531
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
7093
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5958
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4174
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.