I was testing this piece of code and it takes about 24-30 seconds to do
a look up on a list(m) of size 1000x1000
m -> list of size 1000x1000
import time
print time.ctime()
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
if box.has_key(e):
params = box[e]
box[e] = ( min(c, params[0]), min(r, params[1]),
max(c, params[2]), max(r, params[3] ) )
else:
box[e] = [c, r, c, r]
print time.ctime()
Can some one tell me what exactly is taking more time here. Is it
because I am using dictionaries or what is the problem. Can some one
help me improve this .Is there a better way to write this. 9 1458
PyPK wrote: I was testing this piece of code and it takes about 24-30 seconds to do a look up on a list(m) of size 1000x1000 m -> list of size 1000x1000 import time print time.ctime() box = {} for r,row in enumerate(m): for c,e in enumerate(row): if box.has_key(e): params = box[e] box[e] = ( min(c, params[0]), min(r, params[1]), max(c, params[2]), max(r, params[3] ) ) else: box[e] = [c, r, c, r] print time.ctime()
Can some one tell me what exactly is taking more time here. Is it because I am using dictionaries or what is the problem. Can some one help me improve this .Is there a better way to write this.
Without gross changes to the algorithm, and in no particular order:
0. Stop worrying. Find something else to do during the 30 seconds.
1. Use psyco, if your [unspecified] platform supports it.
2. Why has_key()? Try "if e in box:" instead, if your [unspecified]
version of Python supports it. If it doesn't, (a) consider upgrading (b)
try doing box_has_key = box.has_key outside the loops and using the
result inside the loops.
3. Ensure this code is inside a function, not at global level in a
script [not apparent from your post].
4. Outside the loops, put "local_min = min", ditto for max. Then use
box[e] = (local_min(c, etc etc
5. Upgrade your [unspecified] hardware.
6. Use try/except, if indicated by the [unspecified] percentage of times
"e in box" is true. Hint: printing len(box) at the end might be useful.
7. Consider using pyrex.
8. Consider using numarray/mumeric.
Info req'd for discussing algorithm changes:
1. How much free memory do you have?
2. What can you tell us about the distribution of "e"?
3. Is "m" a rectangle, or are the rows of variable length?
4. Do you have control over "m" i.e. are you stuck with that data
structure, or can we design a different one?
>Info req'd for discussing algorithm changes: 1. How much free memory do you have?
- Memory is not a problem but I am working on some timing
constraints.Thats is the reason I am a bit concerned abt these 30
seconds
2. What can you tell us about the distribution of "e"?
- the distribution of e is not fixed.It changes based on what comes
first in the scan of the list.
3. Is "m" a rectangle, or are the rows of variable length?
- m is not a restnagle .its of varied dimensions.
4. Do you have control over "m" i.e. are you stuck with that data
structure, or can we design a different one?
-I can use 'm' as an array or a list. Other than that I am stuck.
I am suspecious abput dictionary approach here. Is this slower than if
i do it with arrays or lists with index of array as my key ?
PyPK wrote: I am suspecious abput dictionary approach here. Is this slower than if i do it with arrays or lists with index of array as my key ?
"with index of array as my key" is either redundant -- indicating that
your answer to my question about "e" should have included something like
"e is an integer that can range from blah0 to blah_max" -- or not
understandable.
In any case, seeing that the code in question is only a few lines, why
don't *you* write an alternative version and see how fast it runs? And
let us know?
PyPK wrote: I was testing this piece of code and it takes about 24-30 seconds to do a look up on a list(m) of size 1000x1000 m -> list of size 1000x1000 import time print time.ctime() box = {} for r,row in enumerate(m): for c,e in enumerate(row): if box.has_key(e): params = box[e] box[e] = ( min(c, params[0]), min(r, params[1]), max(c, params[2]), max(r, params[3] ) ) else: box[e] = [c, r, c, r] print time.ctime()
Can some one tell me what exactly is taking more time here. Is it because I am using dictionaries or what is the problem. Can some one help me improve this .Is there a better way to write this.
AFAICT the row index 'r' can only grow. Therefore one min() and one max()
should be superfluous (untested):
def make_box(m):
box = {}
for r, row in enumerate(m):
for c, e in enumerate(row):
if e in box:
left, top, right, bottom = box[e]
box[e] = (min(c, left), top, max(c, right), r)
else:
box[e] = (c, r, c, r)
return box
Peter
Yep that improved the speed by about 50% now it takes about 10 secs
instead of 24 seconds..Thanks much. I guess that is the best we could
do right.It would be really helpful if I could get it less than 5
seconds. Any suggestions on that??
PyPK wrote: Yep that improved the speed by about 50% now it takes about 10 secs instead of 24 seconds..Thanks much. I guess that is the best we could do right.It would be really helpful if I could get it less than 5 seconds. Any suggestions on that??
Things to try:
* in-lining the min and max expressions
* depending on the distribution of e, it may be faster to catch KeyErrors
def search1(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try:
minc, minr, maxc, maxr = box[e]
box[e] = ( c < minc and c or minc,
r < minr and r or minr,
c > maxc and c or maxc,
r > maxr and r or maxr)
except KeyError:
box[e] = (c, r, c, r)
return box
Michael
Michael Spencer wrote: def search1(m): box = {} for r,row in enumerate(m): for c,e in enumerate(row): try: minc, minr, maxc, maxr = box[e] box[e] = ( c < minc and c or minc, r < minr and r or minr, c > maxc and c or maxc, r > maxr and r or maxr) except KeyError: box[e] = (c, r, c, r) return box
Michael
Sorry, my earlier solution was incorrect since:
c < minc and c or minc # evaluates to minc if c == 0
Two of the tests are unnecessary, as pointed out by Peter
The remaining test:
c > maxc and c or maxc
does not suffer from the same problem, since c cannot both be 0 and greater than
maxc
The following version, still untested ;-) has the correction:
def search2(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try: # use this form if e will be found 90%+ of the time
# otherwise, use: if e in box:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
box[e] = ( c and (c < minc and c or minc),
minr,
c > maxc and c or maxc,
r)
except KeyError:
box[e] = (c, r, c, r)
return box
Michael
Michael Spencer wrote: minc, minr, maxc, maxr = box[e] # note correction for c == 0 # also Peter's simplification box[e] = ( c and (c < minc and c or minc), minr, c > maxc and c or maxc, r)
This may be slightly faster (when c > 0), and slightly less opaque:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
if c < minc:
minc = c
box[e] = ( minc,
minr,
c > maxc and c or maxc,
r)
Michael Spencer wrote: minc, minr, maxc, maxr = box[e] # note correction for c == 0 # also Peter's simplification box[e] = ( c and (c < minc and c or minc), minr, c > maxc and c or maxc, r)
This may be slightly faster (when c > 0), and slightly less opaque:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
if c < minc:
minc = c
box[e] = ( minc,
minr,
c > maxc and c or maxc,
r) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Bob |
last post by:
Are there any known applications out there used to test the performance of the .NET garbage collector over a long period of time? Basically I need an application that creates objects, uses them, and...
|
by: bluedolphin |
last post by:
Hello All:
I have been brought onboard to help on a project that had some
performance problems last year. I have taken some steps to address
the issues in question, but a huge question mark...
|
by: Mark Shelor |
last post by:
I've encountered a troublesome inconsistency in the C-language Perl
extension I've written for CPAN (Digest::SHA). The problem involves the
use of a static array within a performance-critical...
|
by: Martin |
last post by:
I am using graphics as backgrounds for forms,buttons,labels etc.
The question is: is it faster to load all graphics from files on app start
or
to use it embeded (places in editor during design)....
|
by: Curtis Justus |
last post by:
Hi,
I've been searching for solutions to two issues that are undoubtedly common
to everybody. The first is how do my team and I adequately perform unit
testing. The second is how can I measure...
|
by: Lakshmi |
last post by:
Hi All,
I am having performance issues with the .NET client calling the Java
Webservice running on axis. Have detailed the problem below. Please
help.
I wrote a webservice in Java. Lets name...
|
by: Brian Tabios |
last post by:
Hello Everyone,
I have a very complex performance issue with our production database.
Here's the scenario. We have a production webserver server and a
development web server. Both are running...
|
by: marcfischman |
last post by:
Please help. I have a website running on a linux/apache/mysql/php
server. I receive about 8,000-10,000 visitors a day with about 200,000
to 300,000 page views. The server is a RedHat Linux...
|
by: Ryan |
last post by:
My apologies if this is not the forum to post questions regarding .NET
Remoting, but I figured WebServices would be the most appropriate forum of
the bunch.
We're currently completely re-arching...
|
by: =?Utf-8?B?bXVkR2ls?= |
last post by:
Hi,
I have the Enterprise Architect version installed on one machine, and the
developper edition on another (VS.Net 2005).
To my surprise, the performance tools are only available (in the menu...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |