473,698 Members | 2,346 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Re: Script Optimization

On Sun, May 4, 2008 at 4:43 AM, lev <le********@gma il.comwrote:
Can anyone provide some advice/suggestions to make a script more
precise/efficient/concise, etc.?
Hi, I started tidying up the script a bit, but there are some parts I
don't understand or look buggy. So I'm forwarding you the version I
have so far. Look for the comments with my e-mail address in them for
more information.

If I have time I'll tidy up the script some more when I have more info
about those issues.

Here are the changes I made to your version:

* Remove newlines introduced by email
* Move imports to start of file
* Change indentation from 8 spaces to 4
* Move main() to bottom of script
* Remove useless "pass" and "return" lines
* Temporarily change broken "chdir" line
* Split lines so they fit into 80 chars
* Add spaces after commas
* Use path.join instead of string interpolation
* rename rename() to rename_md5() because rename() shadows a function
imported from os.
* Rename vars shadowing imported names
* Improve logic for checking when to print help
* Create emtpy md5 listing file if one doesn't exist
* Add a comment for a dodgy-looking section

David.
Jun 27 '08 #1
3 1729
lev
* Remove newlines introduced by email
* Move imports to start of file
used imports of the edited script you sent.
* Change indentation from 8 spaces to 4
I like using tabs because of the text editor I use, the script at
the end is with 4 though.
* Move main() to bottom of script
* Remove useless "pass" and "return" lines
I replaced the return nothing lines with passes, but I like
keeping them in case the indentation is ever lost - makes it easy to
go back to original indentation
* Temporarily change broken "chdir" line
removed as many instances of chdir as possible (a few useless ones
to accomodate the functions - changed functions to not chdir as much),
that line seems to work... I made it in case the script is launched
with say: 'python somedir\someoth erdir\script.py ' rather than 'python
script.py', because I need it to work in it's own and parent
directory.
* Split lines so they fit into 80 chars
* Add spaces after commas
* Use path.join instead of string interpolation
in all cases when possible - done
* rename rename() to rename_md5() because rename() shadows a function
imported from os.
renamed all functions to more understandable names (without
collisions)
* Rename vars shadowing imported names
renamed almost all vars to more understandable names
* Improve logic for checking when to print help
the example you gave me does pretty much the exact same thing as
before... (the options are either false or true depending on if the
argument was used, if false for both then no logic was done and help
is shown, which would be exactly the same if the did_something var
remained false.
* Create emtpy md5 listing file if one doesn't exist
I intended it to be a script to help ripping a specific mp3cd to
disk, not necessarily create checksum files, because i intend to
include the checksums file.
* Add a comment for a dodgy-looking section
The 4 folders to be renamed are intentional (this is for a
specific mp3cd with 4 album folders)

I added comments to explain what I was doing with the dictionary[x][1]
[1][0], and also what the indexes for the strings are used for ([3:]
to remove the 001 in 001Track.mp3, etc.)
Thanks for the advice so far,
lev

#!/usr/bin/env python

import md5
from glob import glob
from optparse import OptionParser
from os import chdir, path, rename, remove
from sys import argv, exit

def verify_checksum _set(checksums) :
checksums = open(checksums, 'r')
changed_files = {}
missing_files = []
for fline in checksums.readl ines():
line = fline.split(' *')
original_sum = line[0].upper()
try:
new_sum = calculate_check sum(line[1].strip())
if new_sum == original_sum:
print '.',
pass
else:
changed_files[line[1]] = (original_sum, new_sum)
pass
except IOError:
missing_files.a ppend(line[1])
pass
pass
checksums.close ()
changed_files_k eys = changed_files.k eys()
changed_files_k eys.sort()
missing_files.s ort()
print '\n'
if len(changed_fil es) != 0:
print 'File(s) changed:'
for key in changed_files_k eys:
print key.strip('\n') , 'changed from:\n\t',
changed_files[key][0], \
'to\n\t', changed_files[key][1]
pass
print '\n\t', len(changed_fil es), 'file(s) changed.\n'
pass
if len(missing_fil es) != 0:
print 'File(s) not found:'
for x in range(len(missi ng_files)):
print '\t', missing_files[x]
pass
print '\n\t', len(missing_fil es), 'file(s) not found.\n'
pass
if not len(changed_fil es) and not len(missing_fil es):
print "\n\tChecks ums Verified\n"
pass
pass

def calculate_check sum(file_name):
file_to_check = open(file_name, 'rb')
chunk = 8196
checksum = md5.new()
while (True):
chunkdata = file_to_check.r ead(chunk)
if not chunkdata:
break
checksum.update (chunkdata)
pass
file_to_check.c lose()
return checksum.hexdig est().upper()

def rename_file_set (new_dir_names, checksums):
file_info = md5format(check sums)
dirlist = glob('00[1-4]Volume [1-4]')
dirlist.sort()
for x in range(4):
rename(dirlist[x], new_dir_names[x])
print '\t', dirlist[x], 'renamed to:', new_dir_names[x]
chdir(new_dir_n ames[x])
for old_file_name in glob ('*.mp3'):
# old_file_name[3:] is part of removing numbering:
'001Track ...'
new_file_name = old_file_name[3:]
rename(old_file _name, new_file_name)
print '\t\t', old_file_name, 'renamed to:', new_file_name
pass
chdir('..')
file_info = md5file_name_ed it(file_info,di rlist[x],
new_dir_names[x])
pass
md5write(file_i nfo, checksums)
replace_strings ('The American Century.htm', dirlist,
new_dir_names)
print '\n\tDirectorie s and Files renamed.'
pass

def md5format(check sums):
file_info = {}
checksums = open(checksums, 'r')
for line in checksums.readl ines():
splitline = line.split(' *')
#original full filename = (checksum, [directory name, file
name])
file_info[splitline[1]] = (splitline[0],splitline[1].split('\
\'))
pass
checksums.close ()
return file_info

def md5file_name_ed it(file_info, old_dir_name, new_dir_name):
for x in file_info.keys( ):
dir_name_from_f ile = file_info[x][1][0]
if dir_name_from_f ile == old_dir_name:
checksum = file_info[x][0]
file_name_from_ file = file_info[x][1][1]
#md5 format: 'C8109BF6B0EF72 4770A66CF4ED625 1A7 *001Album
1\001Track.mp3'
file_info[x] = (checksum, [new_dir_name,
file_name_from_ file])
#mp3cd numbering: '001Track.mp3, 002Track.mp3... '
if file_name_from_ file[0] == '0':
file_info[x] =(checksum, [new_dir_name,
file_name_from_ file[3:]])
pass
pass
pass
return file_info

def md5write(file_i nfo, checksums):
keys = file_info.keys( )
keys.sort()
checksums = open(checksums, 'w')
for x in keys:
checksum = file_info[x][0]
try:
#when the file is one directory deep:
#'C8109BF6B0EF7 24770A66CF4ED62 51A7 *001Album
1\001Track.mp3'
dir_name = file_info[x][1][0]
file_name = file_info[x][1][1]
checksums.write lines('%s *%s' % (checksum,
os.path.join(di r_name, \
file_name)))
pass
except IndexError:
#when the file is in root dir:
'007CC9C1234201 7709A2F19AF7524 7BD *010Track.mp3'
file_name = file_info[x][1][0]
checksums.write lines('%s *%s' % (checksum, file_name))
pass
pass
checksums.close ()
pass

def replace_strings (file_name, oldlist, newlist):
try:
new_file = open(file_name, 'r').read();
for x in range(4):
new_file = new_file.replac e(oldlist[x], newlist[x], 1)
pass
remove(file_nam e)
file_name = open(file_name, 'w', len(new_file))
file_name.write (new_file)
file_name.close ()
pass
except IOError:
print file_name, 'not found'
pass
pass

def main():
full_path = path.abspath(pa th.dirname(argv[0]))
chdir(full_path )
chdir('..')
checksums = path.join(full_ path, 'checksums.md5' )
new_dir_names = ('Volume 1 - 1889-1929', 'Volume 2 - 1929-1945', \
'Volume 3 - 1945-1965', 'Volume 4 - 1963-1989')
parser = OptionParser()
parser.add_opti on ('-v', '--verify', action = 'store_true', \
dest = 'verify', help = 'verify checksums')
parser.add_opti on ('-r', '--rename', action = 'store_true', dest =
\
'rename', help = \
'rename files to a more usable form (write rights needed)')
(options, args) = parser.parse_ar gs()
if options.verify:
verify_checksum _set(checksums)
pass
if options.rename:
rename_file_set (new_dir_names, checksums)
pass
if not options.verify and not options.rename:
parser.print_he lp()
pass
pass

if __name__ == '__main__':
main()
Jun 27 '08 #2
En Sun, 04 May 2008 17:01:15 -0300, lev <le********@gma il.comescribió:
>* Change indentation from 8 spaces to 4
I like using tabs because of the text editor I use, the script at
the end is with 4 though.
Can't you configure it to use 4 spaces per indent - and not use "hard" tabs?
>* Remove useless "pass" and "return" lines
I replaced the return nothing lines with passes, but I like
keeping them in case the indentation is ever lost - makes it easy to
go back to original indentation
I can't think of a case when only indentation "is lost" - if you have a crash or something, normally you lose much more than indentation... Simple backups or a SCM system like cvs/svn will help. So I don't see the usefulness of those "pass" statements; I think that after some time using Python you'll consider them just garbage, as everyone else.
>* Temporarily change broken "chdir" line
removed as many instances of chdir as possible (a few useless ones
to accomodate the functions - changed functions to not chdir as much),
that line seems to work... I made it in case the script is launched
with say: 'python somedir\someoth erdir\script.py ' rather than 'python
script.py', because I need it to work in it's own and parent
directory.
You can determine the directory where the script resides using

import os
basedir = os.path.dirname (os.path.abspat h(__file__))

This way it doesn't matter how it was launched. But execute the above code as soon as possible (before any chdir)
checksums = open(checksums, 'r')
for fline in checksums.readl ines():
You can directly iterate over the file:

for fline in checksums:

(readlines() reads the whole file contents in memory; I guess this is not an issue here, but in other cases it may be an important difference)
Although it's perfectly valid, I would not reccomend using the same name for two different things (checksums refers to the file name *and* the file itself)
changed_files_k eys = changed_files.k eys()
changed_files_k eys.sort()
missing_files.s ort()
print '\n'
if len(changed_fil es) != 0:
print 'File(s) changed:'
for key in changed_files_k eys:
You don't have to copy the keys and sort; use the sorted() builtin:

for key in sorted(changed_ files.iterkeys( )):

Also, "if len(changed_fil es) != 0" is usually written as:

if changed_files:

The same for missing_files.
for x in range(len(missi ng_files)):
print '\t', missing_files[x]
That construct range(len(somel ist)) is very rarely used. Either you don't need the index, and write:

for missing_file in missing_files:
print '\t', missing_file

Or you want the index too, and write:

for i, missing_file in enumerate(missi ng_files):
print '%2d: %s' % (i, missing_file)
def calculate_check sum(file_name):
file_to_check = open(file_name, 'rb')
chunk = 8196
Any reason to use such number? 8K is 8192; you could use 8*1024 if you don't remember the value. I usually write 1024*1024 when I want exactly 1M.

--
Gabriel Genellina

Jun 27 '08 #3
lev
On May 4, 10:04 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
En Sun, 04 May 2008 17:01:15 -0300, lev <levlozh...@gma il.comescribió:
* Change indentation from 8 spaces to 4
I like using tabs because of the text editor I use, the script at
the end is with 4 though.

Can't you configure it to use 4 spaces per indent - and not use "hard" tabs?
* Remove useless "pass" and "return" lines
I replaced the return nothing lines with passes, but I like
keeping them in case the indentation is ever lost - makes it easy to
go back to original indentation

I can't think of a case when only indentation "is lost" - if you have a crash or something, normally you lose much more than indentation... Simple backups or a SCM system like cvs/svn will help. So I don't see the usefulness of those "pass" statements; I think that after some time using Python you'll consider them just garbage, as everyone else.
* Temporarily change broken "chdir" line
removed as many instances of chdir as possible (a few useless ones
to accomodate the functions - changed functions to not chdir as much),
that line seems to work... I made it in case the script is launched
with say: 'python somedir\someoth erdir\script.py ' rather than 'python
script.py', because I need it to work in it's own and parent
directory.

You can determine the directory where the script resides using

import os
basedir = os.path.dirname (os.path.abspat h(__file__))

This way it doesn't matter how it was launched. But execute the above codeas soon as possible (before any chdir)
checksums = open(checksums, 'r')
for fline in checksums.readl ines():

You can directly iterate over the file:

for fline in checksums:

(readlines() reads the whole file contents in memory; I guess this is not an issue here, but in other cases it may be an important difference)
Although it's perfectly valid, I would not reccomend using the same name for two different things (checksums refers to the file name *and* the file itself)
changed_files_k eys = changed_files.k eys()
changed_files_k eys.sort()
missing_files.s ort()
print '\n'
if len(changed_fil es) != 0:
print 'File(s) changed:'
for key in changed_files_k eys:

You don't have to copy the keys and sort; use the sorted() builtin:

for key in sorted(changed_ files.iterkeys( )):

Also, "if len(changed_fil es) != 0" is usually written as:

if changed_files:

The same for missing_files.
for x in range(len(missi ng_files)):
print '\t', missing_files[x]

That construct range(len(somel ist)) is very rarely used. Either you don't need the index, and write:

for missing_file in missing_files:
print '\t', missing_file

Or you want the index too, and write:

for i, missing_file in enumerate(missi ng_files):
print '%2d: %s' % (i, missing_file)
def calculate_check sum(file_name):
file_to_check = open(file_name, 'rb')
chunk = 8196

Any reason to use such number? 8K is 8192; you could use 8*1024 if you don't remember the value. I usually write 1024*1024 when I want exactly 1M.

--
Gabriel Genellina
Thank you Gabriel, I did not know about a number of the commands you
posted, the use of 8196 was error on my part. I will change the script
to reflect your corrections later tonight, I have another project I
need to finish/comment/submit for corrections later on, so I will be
using the version of the script that I will come up with tonight.

Thank you for your invaluable advice,
The python community is the first online community that I have had
this much help from, Thank you all.
Jun 27 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
1845
by: Jonny | last post by:
Netscape 7.02 is giving me a headache with a downloaded snow script. Starting with a blank page, I inserted the script and checked it in IE 6 and Netscape 7.02. Everything worked and looked fine. A check on CPU usage (Windows Task Manager>Performance) gave a 0% to 2% reading for both browsers on a Pentium 4, 3.06GHz running XP. As I added text, images, tiled background and so on. I noticed the mouse was becoming jerky in Netscape 7....
1
1938
by: Peter Forthmann | last post by:
Hi, I am testing the Intel 8.0 Compiler for Linux. For all the sources I have compiled It produces a lot faster code than the GNU g++ compiler. However, I must have something wrong, since I can't use the -ipo and -xW options for interprocedural optimization and Pentium4 optimization, which is supposed to give a lot of speed up on top. When I add the -ipo switch everything compiles ok, but then I get the linker error: ...
9
2395
by: Rune | last post by:
Is it best to use double quotes and let PHP expand variables inside strings, or is it faster to do the string manipulation yourself manually? Which is quicker? 1) $insert = 'To Be'; $sentence = "$insert or not $insert. That is the question."; or
12
6189
by: WantedToBeDBA | last post by:
Hi all, db2 => create table emp(empno int not null primary key, \ db2 (cont.) => sex char(1) not null constraint s_check check \ db2 (cont.) => (sex in ('m','f')) \ db2 (cont.) => not enforced \ db2 (cont.) => enable query optimization) DB20000I The SQL command completed successfully. db2 => insert into emp values(1,'m')
14
3137
by: joshc | last post by:
I'm writing some C to be used in an embedded environment and the code needs to be optimized. I have a question about optimizing compilers in general. I'm using GCC for the workstation and Diab compiler for the embedded target. My question is about how compilers optimize certain code sequences. As an example, take the code below. Will the compiler eliminate the actual function call to foo() in the object code generated and just store...
21
2570
by: mjbackues at yahoo | last post by:
Hello. I'm having a problem with the Visual Studio .net (2003) C++ speed optimization, and hope someone can suggest a workaround. My project includes many C++ files, most of which work fine with speed optimization turned on. At least one does not however, though it does work with size optimization turned on. I don't know specifically what the optimizer is doing wrong, just that the output is incorrect. And I know within about 10...
5
2387
by: wkaras | last post by:
I've compiled this code: const int x0 = 10; const int x1 = 20; const int x2 = 30; int x = { x2, x0, x1 }; struct Y {
1
1554
YenRaven
by: YenRaven | last post by:
I have wrote a script that searches through the body's innerHTML to find and highlight each instance string omiting any matches in html tags. it works.. but slowly to the point that if you have more than about 100 matches it hangs up and crashes. I have tested the script in FireFox 1.5 only so i dont know how well it will work in IE. I was wondering if anyone knows any ways I could optimize this script to get it to run smoother. Here it is ...
1
161
by: David | last post by:
It's too long to post here (160 lines) so here's the link: Neither link works for me. I get an error page "Error: invalid download linnk". How about you send it to the list as an attachment? David.
20
2340
by: Ravikiran | last post by:
Hi Friends, I wanted know about whatt is ment by zero optimization and sign optimization and its differences.... Thank you...
0
8608
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
8898
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7734
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6524
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5860
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4370
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4619
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3051
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2332
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.