473,566 Members | 3,245 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Performance Issues please help

I was testing this piece of code and it takes about 24-30 seconds to do
a look up on a list(m) of size 1000x1000
m -> list of size 1000x1000
import time
print time.ctime()
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
if box.has_key(e):
params = box[e]
box[e] = ( min(c, params[0]), min(r, params[1]),
max(c, params[2]), max(r, params[3] ) )
else:
box[e] = [c, r, c, r]
print time.ctime()

Can some one tell me what exactly is taking more time here. Is it
because I am using dictionaries or what is the problem. Can some one
help me improve this .Is there a better way to write this.

Jul 19 '05 #1
9 1466
PyPK wrote:
I was testing this piece of code and it takes about 24-30 seconds to do
a look up on a list(m) of size 1000x1000
m -> list of size 1000x1000
import time
print time.ctime()
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
if box.has_key(e):
params = box[e]
box[e] = ( min(c, params[0]), min(r, params[1]),
max(c, params[2]), max(r, params[3] ) )
else:
box[e] = [c, r, c, r]
print time.ctime()

Can some one tell me what exactly is taking more time here. Is it
because I am using dictionaries or what is the problem. Can some one
help me improve this .Is there a better way to write this.


Without gross changes to the algorithm, and in no particular order:
0. Stop worrying. Find something else to do during the 30 seconds.
1. Use psyco, if your [unspecified] platform supports it.
2. Why has_key()? Try "if e in box:" instead, if your [unspecified]
version of Python supports it. If it doesn't, (a) consider upgrading (b)
try doing box_has_key = box.has_key outside the loops and using the
result inside the loops.
3. Ensure this code is inside a function, not at global level in a
script [not apparent from your post].
4. Outside the loops, put "local_min = min", ditto for max. Then use
box[e] = (local_min(c, etc etc
5. Upgrade your [unspecified] hardware.
6. Use try/except, if indicated by the [unspecified] percentage of times
"e in box" is true. Hint: printing len(box) at the end might be useful.
7. Consider using pyrex.
8. Consider using numarray/mumeric.

Info req'd for discussing algorithm changes:
1. How much free memory do you have?
2. What can you tell us about the distribution of "e"?
3. Is "m" a rectangle, or are the rows of variable length?
4. Do you have control over "m" i.e. are you stuck with that data
structure, or can we design a different one?
Jul 19 '05 #2
>Info req'd for discussing algorithm changes:
1. How much free memory do you have?
- Memory is not a problem but I am working on some timing
constraints.Tha ts is the reason I am a bit concerned abt these 30
seconds
2. What can you tell us about the distribution of "e"?
- the distribution of e is not fixed.It changes based on what comes
first in the scan of the list.
3. Is "m" a rectangle, or are the rows of variable length?
- m is not a restnagle .its of varied dimensions.
4. Do you have control over "m" i.e. are you stuck with that data

structure, or can we design a different one?

-I can use 'm' as an array or a list. Other than that I am stuck.

I am suspecious abput dictionary approach here. Is this slower than if
i do it with arrays or lists with index of array as my key ?

Jul 19 '05 #3
PyPK wrote:

I am suspecious abput dictionary approach here. Is this slower than if
i do it with arrays or lists with index of array as my key ?


"with index of array as my key" is either redundant -- indicating that
your answer to my question about "e" should have included something like
"e is an integer that can range from blah0 to blah_max" -- or not
understandable.

In any case, seeing that the code in question is only a few lines, why
don't *you* write an alternative version and see how fast it runs? And
let us know?

Jul 19 '05 #4
PyPK wrote:
I was testing this piece of code and it takes about 24-30 seconds to do
a look up on a list(m) of size 1000x1000
m -> list of size 1000x1000
import time
print time.ctime()
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
if box.has_key(e):
params = box[e]
box[e] = ( min(c, params[0]), min(r, params[1]),
max(c, params[2]), max(r, params[3] ) )
else:
box[e] = [c, r, c, r]
print time.ctime()

Can some one tell me what exactly is taking more time here. Is it
because I am using dictionaries or what is the problem. Can some one
help me improve this .Is there a better way to write this.


AFAICT the row index 'r' can only grow. Therefore one min() and one max()
should be superfluous (untested):

def make_box(m):
box = {}
for r, row in enumerate(m):
for c, e in enumerate(row):
if e in box:
left, top, right, bottom = box[e]
box[e] = (min(c, left), top, max(c, right), r)
else:
box[e] = (c, r, c, r)
return box

Peter
Jul 19 '05 #5
Yep that improved the speed by about 50% now it takes about 10 secs
instead of 24 seconds..Thanks much. I guess that is the best we could
do right.It would be really helpful if I could get it less than 5
seconds. Any suggestions on that??

Jul 19 '05 #6
PyPK wrote:
Yep that improved the speed by about 50% now it takes about 10 secs
instead of 24 seconds..Thanks much. I guess that is the best we could
do right.It would be really helpful if I could get it less than 5
seconds. Any suggestions on that??

Things to try:
* in-lining the min and max expressions
* depending on the distribution of e, it may be faster to catch KeyErrors

def search1(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try:
minc, minr, maxc, maxr = box[e]
box[e] = ( c < minc and c or minc,
r < minr and r or minr,
c > maxc and c or maxc,
r > maxr and r or maxr)
except KeyError:
box[e] = (c, r, c, r)
return box

Michael

Jul 19 '05 #7
Michael Spencer wrote:
def search1(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try:
minc, minr, maxc, maxr = box[e]
box[e] = ( c < minc and c or minc,
r < minr and r or minr,
c > maxc and c or maxc,
r > maxr and r or maxr)
except KeyError:
box[e] = (c, r, c, r)
return box

Michael


Sorry, my earlier solution was incorrect since:
c < minc and c or minc # evaluates to minc if c == 0
Two of the tests are unnecessary, as pointed out by Peter
The remaining test:
c > maxc and c or maxc
does not suffer from the same problem, since c cannot both be 0 and greater than
maxc

The following version, still untested ;-) has the correction:

def search2(m):
box = {}
for r,row in enumerate(m):
for c,e in enumerate(row):
try: # use this form if e will be found 90%+ of the time
# otherwise, use: if e in box:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
box[e] = ( c and (c < minc and c or minc),
minr,
c > maxc and c or maxc,
r)

except KeyError:
box[e] = (c, r, c, r)
return box

Michael

Jul 19 '05 #8
Michael Spencer wrote:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
box[e] = ( c and (c < minc and c or minc),
minr,
c > maxc and c or maxc,
r)


This may be slightly faster (when c > 0), and slightly less opaque:

minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
if c < minc:
minc = c
box[e] = ( minc,
minr,
c > maxc and c or maxc,
r)

Jul 19 '05 #9
Michael Spencer wrote:
minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
box[e] = ( c and (c < minc and c or minc),
minr,
c > maxc and c or maxc,
r)


This may be slightly faster (when c > 0), and slightly less opaque:

minc, minr, maxc, maxr = box[e]
# note correction for c == 0
# also Peter's simplification
if c < minc:
minc = c
box[e] = ( minc,
minr,
c > maxc and c or maxc,
r)
Jul 19 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2300
by: Bob | last post by:
Are there any known applications out there used to test the performance of the .NET garbage collector over a long period of time? Basically I need an application that creates objects, uses them, and then throws them away and then monitors the garbage collection and store statistics on it, preferably in C#. I want to know what is the longest...
9
2801
by: bluedolphin | last post by:
Hello All: I have been brought onboard to help on a project that had some performance problems last year. I have taken some steps to address the issues in question, but a huge question mark remains. Last year, all of the tables and reports were stored in Access. The database was put online so that end users could access the reports...
115
7514
by: Mark Shelor | last post by:
I've encountered a troublesome inconsistency in the C-language Perl extension I've written for CPAN (Digest::SHA). The problem involves the use of a static array within a performance-critical transform function. When compiling under gcc on my big-endian PowerPC (Mac OS X), declaring this array as "static" DECREASES the transform throughput by...
4
3353
by: Martin | last post by:
I am using graphics as backgrounds for forms,buttons,labels etc. The question is: is it faster to load all graphics from files on app start or to use it embeded (places in editor during design). Reason for my question is that application has 5mb, while without graphics it has cca 400kb. Graphic files (bmps) take about 200kb (in program, they...
2
9732
by: Curtis Justus | last post by:
Hi, I've been searching for solutions to two issues that are undoubtedly common to everybody. The first is how do my team and I adequately perform unit testing. The second is how can I measure a load on my application? In the past, we would write our class and then come up with some logic in a console or win app to test it. I have...
0
768
by: Lakshmi | last post by:
Hi All, I am having performance issues with the .NET client calling the Java Webservice running on axis. Have detailed the problem below. Please help. I wrote a webservice in Java. Lets name this WebService1. (using Apache Axis 1.1) Scenario 1: -----------
2
2413
by: Brian Tabios | last post by:
Hello Everyone, I have a very complex performance issue with our production database. Here's the scenario. We have a production webserver server and a development web server. Both are running SQL Server 2000. I encounted various performance issues with the production server with a particular query. It would take approximately 22 seconds...
1
13668
by: marcfischman | last post by:
Please help. I have a website running on a linux/apache/mysql/php server. I receive about 8,000-10,000 visitors a day with about 200,000 to 300,000 page views. The server is a RedHat Linux server running PHP 5.x, MySQL 5.x, Apache 2.x We have been suffering from a number of performance issues. Our hosting company has set our max...
2
1685
by: Ryan | last post by:
My apologies if this is not the forum to post questions regarding .NET Remoting, but I figured WebServices would be the most appropriate forum of the bunch. We're currently completely re-arching some software (currently it's essentially a single tier app - a giant meat ball - we're turning it into an n-tier architecture; gotta love...
3
1257
by: =?Utf-8?B?bXVkR2ls?= | last post by:
Hi, I have the Enterprise Architect version installed on one machine, and the developper edition on another (VS.Net 2005). To my surprise, the performance tools are only available (in the menu system) on the developer instalation. Is that realy so? Is there a way to get them in the Architect version too? Thanks
0
7673
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
1
7645
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7953
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6263
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5485
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5213
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3643
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
1202
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
926
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.