I was testing this piece of code and it takes about 24-30 seconds to do
a look up on a list(m) of size 1000x1000
m -> list of size 1000x1000
import time
print time.ctime()
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
if box.has_key(e):
params = box[e]
box[e] = ( min(c, params[0]), min(r, params[1]),
max(c, params[2]), max(r, params[3] ) )
else:
box[e] = [c, r, c, r]
print time.ctime()
Can some one tell me what exactly is taking more time here. Is it
because I am using dictionaries or what is the problem. Can some one
help me improve this .Is there a better way to write this. 9 1425
PyPK wrote: I was testing this piece of code and it takes about 24-30 seconds to do a look up on a list(m) of size 1000x1000 m -> list of size 1000x1000 import time print time.ctime() box = {} for r,row in enumerate(m): for c,e in enumerate(row): if box.has_key(e): params = box[e] box[e] = ( min(c, params[0]), min(r, params[1]), max(c, params[2]), max(r, params[3] ) ) else: box[e] = [c, r, c, r] print time.ctime()
Can some one tell me what exactly is taking more time here. Is it because I am using dictionaries or what is the problem. Can some one help me improve this .Is there a better way to write this.
Without gross changes to the algorithm, and in no particular order:
0. Stop worrying. Find something else to do during the 30 seconds.
1. Use psyco, if your [unspecified] platform supports it.
2. Why has_key()? Try "if e in box:" instead, if your [unspecified]
version of Python supports it. If it doesn't, (a) consider upgrading (b)
try doing box_has_key = box.has_key outside the loops and using the
result inside the loops.
3. Ensure this code is inside a function, not at global level in a
script [not apparent from your post].
4. Outside the loops, put "local_min = min", ditto for max. Then use
box[e] = (local_min(c, etc etc
5. Upgrade your [unspecified] hardware.
6. Use try/except, if indicated by the [unspecified] percentage of times
"e in box" is true. Hint: printing len(box) at the end might be useful.
7. Consider using pyrex.
8. Consider using numarray/mumeric.
Info req'd for discussing algorithm changes:
1. How much free memory do you have?
2. What can you tell us about the distribution of "e"?
3. Is "m" a rectangle, or are the rows of variable length?
4. Do you have control over "m" i.e. are you stuck with that data
structure, or can we design a different one?
>Info req'd for discussing algorithm changes: 1. How much free memory do you have?
- Memory is not a problem but I am working on some timing
constraints.Thats is the reason I am a bit concerned abt these 30
seconds
2. What can you tell us about the distribution of "e"?
- the distribution of e is not fixed.It changes based on what comes
first in the scan of the list.
3. Is "m" a rectangle, or are the rows of variable length?
- m is not a restnagle .its of varied dimensions.
4. Do you have control over "m" i.e. are you stuck with that data
structure, or can we design a different one?
-I can use 'm' as an array or a list. Other than that I am stuck.
I am suspecious abput dictionary approach here. Is this slower than if
i do it with arrays or lists with index of array as my key ?
PyPK wrote: I am suspecious abput dictionary approach here. Is this slower than if i do it with arrays or lists with index of array as my key ?
"with index of array as my key" is either redundant -- indicating that
your answer to my question about "e" should have included something like
"e is an integer that can range from blah0 to blah_max" -- or not
understandable.
In any case, seeing that the code in question is only a few lines, why
don't *you* write an alternative version and see how fast it runs? And
let us know?
PyPK wrote: I was testing this piece of code and it takes about 24-30 seconds to do a look up on a list(m) of size 1000x1000 m -> list of size 1000x1000 import time print time.ctime() box = {} for r,row in enumerate(m): for c,e in enumerate(row): if box.has_key(e): params = box[e] box[e] = ( min(c, params[0]), min(r, params[1]), max(c, params[2]), max(r, params[3] ) ) else: box[e] = [c, r, c, r] print time.ctime()
Can some one tell me what exactly is taking more time here. Is it because I am using dictionaries or what is the problem. Can some one help me improve this .Is there a better way to write this.
AFAICT the row index 'r' can only grow. Therefore one min() and one max()
should be superfluous (untested):
def make_box(m):
box = {}
for r, row in enumerate(m):
for c, e in enumerate(row):
if e in box:
left, top, right, bottom = box[e]
box[e] = (min(c, left), top, max(c, right), r)
else:
box[e] = (c, r, c, r)
return box
Peter
Yep that improved the speed by about 50% now it takes about 10 secs
instead of 24 seconds..Thanks much. I guess that is the best we could
do right.It would be really helpful if I could get it less than 5
seconds. Any suggestions on that??
PyPK wrote: Yep that improved the speed by about 50% now it takes about 10 secs instead of 24 seconds..Thanks much. I guess that is the best we could do right.It would be really helpful if I could get it less than 5 seconds. Any suggestions on that??
Things to try:
* in-lining the min and max expressions
* depending on the distribution of e, it may be faster to catch KeyErrors
def search1(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try:
minc, minr, maxc, maxr = box[e]
box[e] = ( c < minc and c or minc,
r < minr and r or minr,
c > maxc and c or maxc,
r > maxr and r or maxr)
except KeyError:
box[e] = (c, r, c, r)
return box
Michael
Michael Spencer wrote: def search1(m): box = {} for r,row in enumerate(m): for c,e in enumerate(row): try: minc, minr, maxc, maxr = box[e] box[e] = ( c < minc and c or minc, r < minr and r or minr, c > maxc and c or maxc, r > maxr and r or maxr) except KeyError: box[e] = (c, r, c, r) return box
Michael
Sorry, my earlier solution was incorrect since:
c < minc and c or minc # evaluates to minc if c == 0
Two of the tests are unnecessary, as pointed out by Peter
The remaining test:
c > maxc and c or maxc
does not suffer from the same problem, since c cannot both be 0 and greater than
maxc
The following version, still untested ;-) has the correction:
def search2(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try: # use this form if e will be found 90%+ of the time
# otherwise, use: if e in box:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
box[e] = ( c and (c < minc and c or minc),
minr,
c > maxc and c or maxc,
r)
except KeyError:
box[e] = (c, r, c, r)
return box
Michael
Michael Spencer wrote: minc, minr, maxc, maxr = box[e] # note correction for c == 0 # also Peter's simplification box[e] = ( c and (c < minc and c or minc), minr, c > maxc and c or maxc, r)
This may be slightly faster (when c > 0), and slightly less opaque:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
if c < minc:
minc = c
box[e] = ( minc,
minr,
c > maxc and c or maxc,
r)
Michael Spencer wrote: minc, minr, maxc, maxr = box[e] # note correction for c == 0 # also Peter's simplification box[e] = ( c and (c < minc and c or minc), minr, c > maxc and c or maxc, r)
This may be slightly faster (when c > 0), and slightly less opaque:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
if c < minc:
minc = c
box[e] = ( minc,
minr,
c > maxc and c or maxc,
r) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Bob |
last post by:
Are there any known applications out there used to test the performance of the .NET garbage collector over a long period of time? Basically I need an application that creates objects, uses them, and...
|
by: bluedolphin |
last post by:
Hello All:
I have been brought onboard to help on a project that had some
performance problems last year. I have taken some steps to address
the issues in question, but a huge question mark...
|
by: Mark Shelor |
last post by:
I've encountered a troublesome inconsistency in the C-language Perl
extension I've written for CPAN (Digest::SHA). The problem involves the
use of a static array within a performance-critical...
|
by: Martin |
last post by:
I am using graphics as backgrounds for forms,buttons,labels etc.
The question is: is it faster to load all graphics from files on app start
or
to use it embeded (places in editor during design)....
|
by: Curtis Justus |
last post by:
Hi,
I've been searching for solutions to two issues that are undoubtedly common
to everybody. The first is how do my team and I adequately perform unit
testing. The second is how can I measure...
|
by: Lakshmi |
last post by:
Hi All,
I am having performance issues with the .NET client calling the Java
Webservice running on axis. Have detailed the problem below. Please
help.
I wrote a webservice in Java. Lets name...
|
by: Brian Tabios |
last post by:
Hello Everyone,
I have a very complex performance issue with our production database.
Here's the scenario. We have a production webserver server and a
development web server. Both are running...
|
by: marcfischman |
last post by:
Please help. I have a website running on a linux/apache/mysql/php
server. I receive about 8,000-10,000 visitors a day with about 200,000
to 300,000 page views. The server is a RedHat Linux...
|
by: Ryan |
last post by:
My apologies if this is not the forum to post questions regarding .NET
Remoting, but I figured WebServices would be the most appropriate forum of
the bunch.
We're currently completely re-arching...
|
by: =?Utf-8?B?bXVkR2ls?= |
last post by:
Hi,
I have the Enterprise Architect version installed on one machine, and the
developper edition on another (VS.Net 2005).
To my surprise, the performance tools are only available (in the menu...
|
by: Kemmylinns12 |
last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
|
by: Naresh1 |
last post by:
What is WebLogic Admin Training?
WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
|
by: WisdomUfot |
last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
|
by: Oralloy |
last post by:
Hello Folks,
I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA.
My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
|
by: Carina712 |
last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
|
by: BLUEPANDA |
last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
|
by: Johno34 |
last post by:
I have this click event on my form. It speaks to a Datasheet Subform
Private Sub Command260_Click()
Dim r As DAO.Recordset
Set r = Form_frmABCD.Form.RecordsetClone
r.MoveFirst
Do
If...
|
by: jack2019x |
last post by:
hello, Is there code or static lib for hook swapchain present?
I wanna hook dxgi swapchain present for dx11 and dx9.
|
by: DizelArs |
last post by:
Hi all)
Faced with a problem, element.click() event doesn't work in Safari browser.
Tried various tricks like emulating touch event through a function:
let clickEvent = new Event('click', {...
| |