448,956 Members | 1,221 Online Need help? Post your question and get tips & solutions from a community of 448,956 IT Pros & Developers. It's quick & easy.

# Strange Execution Times

 P: n/a I am running two functions in a row that do the same thing. One runs in .14 seconds, the other 56. I'm confused. I wrote another version of the program and couldn't get the slow behavior again, only the fast. I'm not sure what is causing it. Can anyone figure it out? Here is my code (sorry it's a bit of a mess, but my cleaned up version isn't slow!). Just skim to the bottom where the timing is. The first time printed out is .14, the seond is 56.56. f = open("/Users/curi/data.xml") o = open("/Users/curi/out2.xml", "w") import md5 import array p1 = "" p2 = "" cnt = 0 m = md5.new jo = "".join adjust = len(p1) - 1 i = 1 s = f.read() a = array.array('c', s).tolist() spot = 0 k = 0 find = s.find starts = [] ends = [] while k != -1: #print len(s) k = find(p2, spot) if k != -1: starts.append(find(p1, spot) + adjust) ends.append(k) spot = k + 1 #s = "".join([s[:j+1], md5.new(s[j+1:k-1]).hexdigest(), s[k:]]) #if k != -1: a[j+1:k-1] = m(jo(a[j+1:k-1])).hexdigest() r = range(len(starts)) #r = range(20) r.reverse() import time data = a[:] md5 = m join = jo t1 = time.clock() for j in r: #print jo(s[starts[j]+1:ends[j]]) digest = m(jo(s[starts[j]+1:ends[j]])).hexdigest() a[starts[j]+1:ends[j]] = digest #cnt += 1 #if cnt % 100 == 0: print cnt t2 = time.clock() print "time is", round(t2-t1, 5) t1 = time.clock() for i in r: data[starts[i]:ends[i]] = md5(join(s[starts[i]:ends[i]])).hexdigest() t2 = time.clock() print "second time is", round(t2-t1, 5) o.write(jo(a)) Jul 19 '05 #1
3 Replies

 P: n/a wrote: I am running two functions in a row that do the same thing. One runs in .14 seconds, the other 56. I'm confused. I wrote another version of the program and couldn't get the slow behavior again, only the fast. I'm not sure what is causing it. Can anyone figure it out? it would be a lot easier to help if you posted a self-contained example. Jul 19 '05 #2

 P: n/a cu****@gmail.com wrote: I am running two functions in a row that do the same thing. 1. I see no functions here. You should set out a script like this: def main(): your_code_goes_here() if __name__ == '__main__': main() for two reasons (a) your code will be referring to locals instead of globals; this is faster, which might appeal to you (b) if somebody accidentally imports the script, nothing happens. 2. The two loops to which you refer do *not* do the same thing; see later. One runs in .14 seconds, the other 56. I'm confused. I wrote another version of the program and couldn't get the slow behavior again, only the fast. I'm not sure what is causing it. Can anyone figure it out? Here is my code (sorry it's a bit of a mess, but my cleaned up version isn't slow!). Just skim to the bottom where the timing is. The first time printed out is .14, the seond is 56.56. [snip] [following has extraneous blank lines and comments removed] t1 = time.clock() for j in r: digest = m(jo(s[starts[j]+1:ends[j]])).hexdigest() a[starts[j]+1:ends[j]] = digest t2 = time.clock() print "time is", round(t2-t1, 5) t1 = time.clock() for i in r: data[starts[i]:ends[i]] = \ md5(join(s[starts[i]:ends[i]])).hexdigest() t2 = time.clock() print "second time is", round(t2-t1, 5) General questions: what platform? what version of Python? how large is the file? how much free memory do you have? how many passwords are there? what is the average length of a password? Ignoring the superficial-but-meaningless differences (i vs j, md5 [aarrgghh!!] vs m), jo vs join), these two loops differ in the following respects: (1) 'data' is a copy of 'a' (2) the first loop's body is effectively: digest = RHS; LHS = digest whereas the 2nd loop's body is: LHS = RHS (3) the first loop uses starts[j]+1 whereas the second loop uses starts[j] Item (1) may affect the timing if file is large compared with available memory -- could be 'a' has to be swapped out, and 'data' swapped in. Item (2) should make the 2nd loop very slightly faster, so we'll ignore that :-) Item (3) means you are not comparing like with like. It means that the 1st loop has less work to do. So this could make an observable difference for very short passwords -- but still nothing like 0.14 compared with 56. So, some more questions: The 56.56 is suspiciously precise -- you ran it a few times and it printed exactly 56.56 each time? Did you try putting the 2nd loop first [refer to Item (1) above]? Did you try putting in a switch so that your script runs either 1st loop or 2nd loop but not both? Note that each loop is making its target list expand in situ; this may after a while (like inside loop 2) cause the memory arena to become so fragmented that swapping will occur. This of course can vary wildly depending on the platform; Win95 used to be the most usual suspect but you're obviously not running on that. Some observations: (1) 's' is already a string, so ''.join(s[x:y]) is a slow way of doing s[x:y] (2) 'a' ends up as a list of one-byte strings, via a very circuitous process: a = array.array('c', s).tolist() A shorter route would be: a = list(s) However what's wrong with what you presumably tried out first i.e. a = array.array('c', s) ?? It doesn't need the final ''.join() before writing to disk, and it takes up less memory. NOTE: the array variety takes up 1 byte per character. The list variety takes up at least 4 bytes per character (on a machine where sizeof(PyObject *) == 4); to the extent that the file contains characters that are not interned (i.e. not [A-Za-z_] AFAIK), much more memory is required as a separate object will be created for each such character. Was it consistently slower? (3) If memory is your problem, you could rewrite the whole thing to simply do one write per password; that way you only need 1.x copy of the file contents in memory, not 2.x. Hoping some of this helps, John Jul 19 '05 #3

 P: n/a hey FYI i found the problem: i accidentally copied an output file for my test data. so all the passwords were exactly 32 chars long. so when replacing them with new 32 char passwords, it went much much faster, I guess because the list kept the same number of chars in it and didn't have to copy lots of data around. Jul 19 '05 #4

### This discussion thread is closed

Replies have been disabled for this discussion. 