cu****@gmail.com wrote:
I am running two functions in a row that do the same thing.
1. I see no functions here.
You should set out a script like this:
def main():
your_code_goes_here()
if __name__ == '__main__':
main()
for two reasons (a) your code will be referring to locals instead of
globals; this is faster, which might appeal to you (b) if somebody
accidentally imports the script, nothing happens.
2. The two loops to which you refer do *not* do the same thing; see later.
One runs
in .14 seconds, the other 56. I'm confused. I wrote another version
of the program and couldn't get the slow behavior again, only the fast.
I'm not sure what is causing it. Can anyone figure it out?
Here is my code (sorry it's a bit of a mess, but my cleaned up version
isn't slow!). Just skim to the bottom where the timing is. The first
time printed out is .14, the seond is 56.56.
[snip]
[following has extraneous blank lines and comments removed] t1 = time.clock()
for j in r:
digest = m(jo(s[starts[j]+1:ends[j]])).hexdigest()
a[starts[j]+1:ends[j]] = digest
t2 = time.clock()
print "time is", round(t2-t1, 5)
t1 = time.clock()
for i in r:
data[starts[i]:ends[i]] = \
md5(join(s[starts[i]:ends[i]])).hexdigest()
t2 = time.clock()
print "second time is", round(t2-t1, 5)
General questions: what platform? what version of Python? how large is
the file? how much free memory do you have? how many passwords are
there? what is the average length of a password?
Ignoring the superficial-but-meaningless differences (i vs j, md5
[aarrgghh!!] vs m), jo vs join), these two loops differ in the following
respects:
(1) 'data' is a copy of 'a'
(2) the first loop's body is effectively: digest = RHS; LHS = digest
whereas the 2nd loop's body is: LHS = RHS
(3) the first loop uses starts[j]+1 whereas the second loop uses starts[j]
Item (1) may affect the timing if file is large compared with available
memory -- could be 'a' has to be swapped out, and 'data' swapped in.
Item (2) should make the 2nd loop very slightly faster, so we'll ignore
that :-)
Item (3) means you are not comparing like with like. It means that the
1st loop has less work to do. So this could make an observable
difference for very short passwords -- but still nothing like 0.14
compared with 56.
So, some more questions:
The 56.56 is suspiciously precise -- you ran it a few times and it
printed exactly 56.56 each time?
Did you try putting the 2nd loop first [refer to Item (1) above]?
Did you try putting in a switch so that your script runs either 1st loop
or 2nd loop but not both? Note that each loop is making its target list
expand in situ; this may after a while (like inside loop 2) cause the
memory arena to become so fragmented that swapping will occur. This of
course can vary wildly depending on the platform; Win95 used to be the
most usual suspect but you're obviously not running on that.
Some observations:
(1) 's' is already a string, so ''.join(s[x:y]) is a slow way of doing
s[x:y]
(2) 'a' ends up as a list of one-byte strings, via a very circuitous
process: a = array.array('c', s).tolist()
A shorter route would be: a = list(s)
However what's wrong with what you presumably tried out first i.e. a =
array.array('c', s) ?? It doesn't need the final ''.join() before
writing to disk, and it takes up less memory. NOTE: the array variety
takes up 1 byte per character. The list variety takes up at least 4
bytes per character (on a machine where sizeof(PyObject *) == 4); to the
extent that the file contains characters that are not interned (i.e. not
[A-Za-z_] AFAIK), much more memory is required as a separate object
will be created for each such character. Was it consistently slower?
(3) If memory is your problem, you could rewrite the whole thing to
simply do one write per password; that way you only need 1.x copy of the
file contents in memory, not 2.x.
Hoping some of this helps,
John