473,394 Members | 1,737 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Performance on local constants?

Hi all,

I'm pretty new to Python (a little over a month). I was wondering -- is
something like this:

s = re.compile('whatever')

def t(whatnot):
return s.search(whatnot)

for i in xrange(1000):
print t(something[i])

significantly faster than something like this:

def t(whatnot):
s = re.compile('whatever')
return s.search(whatnot)

for i in xrange(1000):
result = t(something[i])

? Or is Python clever enough to see that the value of s will be the same
on every call, and thus only compile it once?

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 -- pass it on
Dec 22 '07 #1
13 1309
On Dec 22, 10:53 am, William McBrine <wmcbr...@users.sf.netwrote:
Hi all,

I'm pretty new to Python (a little over a month). I was wondering -- is
something like this:

s = re.compile('whatever')

def t(whatnot):
return s.search(whatnot)

for i in xrange(1000):
print t(something[i])

significantly faster than something like this:

def t(whatnot):
s = re.compile('whatever')
return s.search(whatnot)

for i in xrange(1000):
result = t(something[i])

? Or is Python clever enough to see that the value of s will be the same
on every call, and thus only compile it once?

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 -- pass it on
Python RE's do have a cache but telling it to compile multiple times
is going to take time.

Best to do as the docs say and compile your RE's once before use if
you can.

The timeit module: http://www.diveintopython.org/perfor...ng/timeit.html
will allow you to do your own timings.

- Paddy.
Dec 22 '07 #2
On Dec 22, 9:53 pm, William McBrine <wmcbr...@users.sf.netwrote:
Hi all,

I'm pretty new to Python (a little over a month). I was wondering -- is
something like this:

s = re.compile('whatever')

def t(whatnot):
return s.search(whatnot)

for i in xrange(1000):
print t(something[i])

significantly faster than something like this:

def t(whatnot):
s = re.compile('whatever')
return s.search(whatnot)

for i in xrange(1000):
result = t(something[i])

?
No.

Or is Python clever enough to see that the value of s will be the same
on every call,
No. It doesn't have a crystal ball.
and thus only compile it once?
But it is smart enough to maintain a cache, which achieves the desired
result.

Why don't you do some timings?

While you're at it, try this:

def t2(whatnot):
return re.search('whatever', whatnot)

and this:

t3 = re.compile('whatever').search

HTH,
John
Dec 22 '07 #3
William McBrine <wm******@users.sf.netwrote:
Hi all,

I'm pretty new to Python (a little over a month). I was wondering -- is
something like this:

s = re.compile('whatever')

def t(whatnot):
return s.search(whatnot)

for i in xrange(1000):
print t(something[i])

significantly faster than something like this:

def t(whatnot):
s = re.compile('whatever')
return s.search(whatnot)

for i in xrange(1000):
result = t(something[i])

? Or is Python clever enough to see that the value of s will be the same
on every call, and thus only compile it once?
The best way to answer these questions is always to try it out for
yourself. Have a look at 'timeit.py' in the library: you can run
it as a script to time simple things or import it from longer scripts.

C:\Python25>python lib/timeit.py -s "import re;s=re.compile('whatnot')" "s.search('some long string containing a whatnot')"
1000000 loops, best of 3: 1.05 usec per loop

C:\Python25>python lib/timeit.py -s "import re" "re.compile('whatnot').search('some long string containing a whatnot')"
100000 loops, best of 3: 3.76 usec per loop

C:\Python25>python lib/timeit.py -s "import re" "re.search('whatnot', 'some long string containing a whatnot')"
100000 loops, best of 3: 3.98 usec per loop

So it looks like it takes a couple of microseconds overhead if you
don't pre-compile the regular expression. That could be significant
if you have simple matches as above, or irrelevant if the match is
complex and slow.

You can also try measuring the compile time separately:

C:\Python25>python lib/timeit.py -s "import re" "re.compile('whatnot')"
100000 loops, best of 3: 2.36 usec per loop

C:\Python25>python lib/timeit.py -s "import re" "re.compile('<(?:p|div)[^>]*>(?P<pat0>(?:(?P<atag0>\\<a[^>]*\\>)\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>)</(?:p|div)>|(?P<pat1>(?:(?P<atag1>\\<a[^>]*\\>)\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*captioned[^>]+\\>)')"
100000 loops, best of 3: 2.34 usec per loop

It makes no difference whether you use a trivial regular expression
or a complex one: Python remembers (if I remember correctly) the last
100 expressions it compiled,so the compilation overhead will be pretty
constant.
Dec 22 '07 #4
On Sat, 22 Dec 2007 10:53:39 +0000, William McBrine wrote:
Hi all,

I'm pretty new to Python (a little over a month). I was wondering -- is
something like this:

s = re.compile('whatever')

def t(whatnot):
return s.search(whatnot)

for i in xrange(1000):
print t(something[i])

significantly faster than something like this:

def t(whatnot):
s = re.compile('whatever')
return s.search(whatnot)

for i in xrange(1000):
result = t(something[i])

? Or is Python clever enough to see that the value of s will be the same
on every call, and thus only compile it once?

Let's find out:

>>import re
import dis

def spam(x):
.... s = re.compile('nobody expects the Spanish Inquisition!')
.... return s.search(x)
....
>>dis.dis(spam)
2 0 LOAD_GLOBAL 0 (re)
3 LOAD_ATTR 1 (compile)
6 LOAD_CONST 1 ('nobody expects the Spanish
Inquisition!')
9 CALL_FUNCTION 1
12 STORE_FAST 1 (s)

3 15 LOAD_FAST 1 (s)
18 LOAD_ATTR 2 (search)
21 LOAD_FAST 0 (x)
24 CALL_FUNCTION 1
27 RETURN_VALUE

No, the Python compiler doesn't know anything about regular expression
objects, so it compiles a call to the RE engine which is executed every
time the function is called.

However, the re module keeps its own cache, so in fact the regular
expression itself may only get compiled once regardless.

Here's another approach that avoids the use of a global variable for the
regular expression:
>>def spam2(x, s=re.compile('nobody expects the Spanish Inquisition!')):
.... return s.search(x)
....
>>dis.dis(spam2)
2 0 LOAD_FAST 1 (s)
3 LOAD_ATTR 0 (search)
6 LOAD_FAST 0 (x)
9 CALL_FUNCTION 1
12 RETURN_VALUE

What happens now is that the regex is compiled by the RE engine once, at
Python-compile time, then stored as the default value for the argument s.
If you don't supply another value for s when you call the function, the
default regex is used. If you do, the over-ridden value is used instead:
>>spam2("nothing")
spam2("nothing", re.compile('thing'))
<_sre.SRE_Match object at 0xb7c29c28>
I suspect that this will be not only the fastest solution, but also the
most flexible.

--
Steven
Dec 22 '07 #5

"Steven D'Aprano" <st***@REMOVE-THIS-cybersource.com.auwrote in message
news:13*************@corp.supernews.com...
| >>def spam2(x, s=re.compile('nobody expects the Spanish
Inquisition!')):
| ... return s.search(x)
|
| I suspect that this will be not only the fastest solution, but also the
| most flexible.

'Most flexible' in a different way is

def searcher(rex):
crex = re.compile(rex)
def _(txt):
return crex.search(txt)
return _

One can then create and keep around multiple searchers based on different
patterns, to be used as needed.

tjr

Dec 22 '07 #6
On Dec 23, 5:38 am, "Terry Reedy" <tjre...@udel.eduwrote:
"Steven D'Aprano" <st...@REMOVE-THIS-cybersource.com.auwrote in message

news:13*************@corp.supernews.com...
| >>def spam2(x, s=re.compile('nobody expects the Spanish
Inquisition!')):
| ... return s.search(x)
|
| I suspect that this will be not only the fastest solution, but also the
| most flexible.

'Most flexible' in a different way is

def searcher(rex):
crex = re.compile(rex)
def _(txt):
return crex.search(txt)
return _
I see your obfuscatory ante and raise you several dots and
underscores:

class Searcher(object):
def __init__(self, rex):
self.crex = re.compile(rex)
def __call__(self, txt):
return self.crex.search(txt)

Cheers,
John

Dec 22 '07 #7

"John Machin" <sj******@lexicon.netwrote in message
news:ab**********************************@e25g2000 prg.googlegroups.com...
| On Dec 23, 5:38 am, "Terry Reedy" <tjre...@udel.eduwrote:
| 'Most flexible' in a different way is
| >
| def searcher(rex):
| crex = re.compile(rex)
| def _(txt):
| return crex.search(txt)
| return _
| >
|
| I see your obfuscatory ante and raise you several dots and
| underscores:

I will presume you are merely joking, but for the benefit of any beginning
programmers reading this, the closure above is a standard functional idiom
for partial evaluation of a function (in this this, re.search(crex,txt))

| class Searcher(object):
| def __init__(self, rex):
| self.crex = re.compile(rex)
| def __call__(self, txt):
| return self.crex.search(txt)

while this is, the equivalent OO version. Intermdiate Python programmers
should know both.

tjr

Dec 23 '07 #8
On Dec 23, 2:39 pm, "Terry Reedy" <tjre...@udel.eduwrote:
"John Machin" <sjmac...@lexicon.netwrote in message

news:ab**********************************@e25g2000 prg.googlegroups.com...
| On Dec 23, 5:38 am, "Terry Reedy" <tjre...@udel.eduwrote:
| 'Most flexible' in a different way is
| >
| def searcher(rex):
| crex = re.compile(rex)
| def _(txt):
| return crex.search(txt)
| return _
| >
|
| I see your obfuscatory ante and raise you several dots and
| underscores:

I will presume you are merely joking, but for the benefit of any beginning
programmers reading this, the closure above is a standard functional idiom
for partial evaluation of a function (in this this, re.search(crex,txt))

| class Searcher(object):
| def __init__(self, rex):
| self.crex = re.compile(rex)
| def __call__(self, txt):
| return self.crex.search(txt)

while this is, the equivalent OO version. Intermdiate Python programmers
should know both.
Semi-joking; I thought that your offering of this:

def searcher(rex):
crex = re.compile(rex)
def _(txt):
return crex.search(txt)
return _
foo_searcher = searcher('foo')

was somewhat over-complicated, and possibly slower than already-
mentioned alternatives. The standard idiom etc etc it may be, but the
OP was interested in getting overhead out of his re searching loop.
Let's trim it a bit.

step 1:
def searcher(rex):
crexs = re.compile(rex).search
def _(txt):
return crexs(txt)
return _
foo_searcher = searcher('foo')

step 2:
def searcher(rex):
return re.compile(rex).search
foo_searcher = searcher('foo')

step 3:
foo_searcher = re.compile('foo').search

HTH,
John
Dec 23 '07 #9
Thanks for all the answers on this. (And, sorry the lousy Subject line; I
couldn't think of a better one.)

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0 -- pass it on
Dec 26 '07 #10
En Sun, 23 Dec 2007 03:55:07 -0300, John Machin <sj******@lexicon.net>
escribió:
On Dec 23, 2:39 pm, "Terry Reedy" <tjre...@udel.eduwrote:
>I will presume you are merely joking, but for the benefit of any
beginning
programmers reading this, the closure above is a standard functional
idiom
for partial evaluation of a function (in this this, re.search(crex,txt))

Semi-joking; I thought that your offering of this:

def searcher(rex):
crex = re.compile(rex)
def _(txt):
return crex.search(txt)
return _
foo_searcher = searcher('foo')

was somewhat over-complicated, and possibly slower than already-
mentioned alternatives. The standard idiom etc etc it may be, but the
OP was interested in getting overhead out of his re searching loop.
Let's trim it a bit.

step 1:
def searcher(rex):
crexs = re.compile(rex).search
def _(txt):
return crexs(txt)
return _
foo_searcher = searcher('foo')

step 2:
def searcher(rex):
return re.compile(rex).search
foo_searcher = searcher('foo')

step 3:
foo_searcher = re.compile('foo').search
Nice derivation! Like the word-stairs game: love -rove -rave -have
-hate

--
Gabriel Genellina

Dec 27 '07 #11
I get class Searcher(object) but can't for the life of me see why
(except to be intentionally obtuse) one would use the def
searcher(rex) pattern which I assure you would call with
searcher(r)(t) right?

- mdf


'Most flexible' in a different way is

def searcher(rex):
crex = re.compile(rex)
def _(txt):
return crex.search(txt)
return _

I see your obfuscatory ante and raise you several dots and
underscores:

class Searcher(object):
def __init__(self, rex):
self.crex = re.compile(rex)
def __call__(self, txt):
return self.crex.search(txt)
Dec 27 '07 #12
On Dec 28, 7:53 am, "Matthew Franz" <mdfr...@gmail.comwrote:
I get class Searcher(object) but can't for the life of me see why
(except to be intentionally obtuse) one would use the def
searcher(rex) pattern which I assure you would call with
searcher(r)(t) right?
The whole point of the thread was performance across multiple searches
for the one pattern. Thus one would NOT do
searcher(r)(t)
each time a search was required; one would do
s = searcher(r)
ONCE, and then do
s(t)
each time ...
Dec 27 '07 #13
Thanks, that makes more sense. I got tripped up by the function
returning a function thing and (for a while) thought _ was some sort
of spooky special variable.

- mdf
On Dec 28, 7:53 am, "Matthew Franz" <mdfr...@gmail.comwrote:
I get class Searcher(object) but can't for the life of me see why
(except to be intentionally obtuse) one would use the def
searcher(rex) pattern which I assure you would call with
searcher(r)(t) right?

The whole point of the thread was performance across multiple searches
for the one pattern. Thus one would NOT do
searcher(r)(t)
each time a search was required; one would do
s = searcher(r)
ONCE, and then do
s(t)
each time ...

--
http://mail.python.org/mailman/listinfo/python-list


--
Matthew Franz
http://www.threatmind.net/
Dec 27 '07 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Steven Bethard | last post by:
I wrote: > If you really want locals that don't contribute to arguments, I'd be > much happier with something like a decorator, e.g.: > > @with_consts(i=1, deftime=time.ctime()) > def foo(x,...
3
by: Andy Tran | last post by:
I built a system using mysql innodb to archive SMS messages but the innodb databases are not keeping up with the number of SMS messages coming in. I'm looking for performance of 200 msgs/sec where...
8
by: DraguVaso | last post by:
Hi, I'm new to WebServices, and I'm doing some tests (with a small VB.NET-application) to know the performance-difference between a WebService and the 'normal'-way of getting data (just...
59
by: kk_oop | last post by:
Hi. I wanted to use exceptions to handle error conditions in my code. I think doing that is useful, as it helps to separate "go" paths from error paths. However, a coding guideline has been...
8
by: Jack | last post by:
I have a test database that I have built in a 3 partition (and 3 node) environment. I have defined all the tables so they have the same partition key. The tables (7 of them) form a hierarchical...
9
by: Rob | last post by:
Scenario: O/S: Win XP Professional Back-end: Access 2002 on network server I have an Access 97 application, in production on our network, that takes appoximately 5 minutes to process monthly...
16
by: David W. Fenton | last post by:
http://www.granite.ab.ca/access/performancefaq.htm I hope Tony doesn't mind my opening a discussion of some issues on his performance FAQ page here in the newsgroup. This is not meant as...
1
by: Doug | last post by:
This may sound like a silly or odd question but I was wondering if there is any performance benefit to using constants? I have a lot of string constants I created in a component and am wondering...
3
by: farseer | last post by:
if an enum requires boxing often, i'd assume constants would win on performance, is that true? Further, it appears that if you need to pass enum values to functions that accept only uint, int or...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.