473,386 Members | 1,752 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Re: Script Optimization

On Sun, May 4, 2008 at 4:43 AM, lev <le********@gmail.comwrote:
Can anyone provide some advice/suggestions to make a script more
precise/efficient/concise, etc.?
Hi, I started tidying up the script a bit, but there are some parts I
don't understand or look buggy. So I'm forwarding you the version I
have so far. Look for the comments with my e-mail address in them for
more information.

If I have time I'll tidy up the script some more when I have more info
about those issues.

Here are the changes I made to your version:

* Remove newlines introduced by email
* Move imports to start of file
* Change indentation from 8 spaces to 4
* Move main() to bottom of script
* Remove useless "pass" and "return" lines
* Temporarily change broken "chdir" line
* Split lines so they fit into 80 chars
* Add spaces after commas
* Use path.join instead of string interpolation
* rename rename() to rename_md5() because rename() shadows a function
imported from os.
* Rename vars shadowing imported names
* Improve logic for checking when to print help
* Create emtpy md5 listing file if one doesn't exist
* Add a comment for a dodgy-looking section

David.
Jun 27 '08 #1
3 1711
lev
* Remove newlines introduced by email
* Move imports to start of file
used imports of the edited script you sent.
* Change indentation from 8 spaces to 4
I like using tabs because of the text editor I use, the script at
the end is with 4 though.
* Move main() to bottom of script
* Remove useless "pass" and "return" lines
I replaced the return nothing lines with passes, but I like
keeping them in case the indentation is ever lost - makes it easy to
go back to original indentation
* Temporarily change broken "chdir" line
removed as many instances of chdir as possible (a few useless ones
to accomodate the functions - changed functions to not chdir as much),
that line seems to work... I made it in case the script is launched
with say: 'python somedir\someotherdir\script.py' rather than 'python
script.py', because I need it to work in it's own and parent
directory.
* Split lines so they fit into 80 chars
* Add spaces after commas
* Use path.join instead of string interpolation
in all cases when possible - done
* rename rename() to rename_md5() because rename() shadows a function
imported from os.
renamed all functions to more understandable names (without
collisions)
* Rename vars shadowing imported names
renamed almost all vars to more understandable names
* Improve logic for checking when to print help
the example you gave me does pretty much the exact same thing as
before... (the options are either false or true depending on if the
argument was used, if false for both then no logic was done and help
is shown, which would be exactly the same if the did_something var
remained false.
* Create emtpy md5 listing file if one doesn't exist
I intended it to be a script to help ripping a specific mp3cd to
disk, not necessarily create checksum files, because i intend to
include the checksums file.
* Add a comment for a dodgy-looking section
The 4 folders to be renamed are intentional (this is for a
specific mp3cd with 4 album folders)

I added comments to explain what I was doing with the dictionary[x][1]
[1][0], and also what the indexes for the strings are used for ([3:]
to remove the 001 in 001Track.mp3, etc.)
Thanks for the advice so far,
lev

#!/usr/bin/env python

import md5
from glob import glob
from optparse import OptionParser
from os import chdir, path, rename, remove
from sys import argv, exit

def verify_checksum_set(checksums):
checksums = open(checksums, 'r')
changed_files = {}
missing_files = []
for fline in checksums.readlines():
line = fline.split(' *')
original_sum = line[0].upper()
try:
new_sum = calculate_checksum(line[1].strip())
if new_sum == original_sum:
print '.',
pass
else:
changed_files[line[1]] = (original_sum, new_sum)
pass
except IOError:
missing_files.append(line[1])
pass
pass
checksums.close()
changed_files_keys = changed_files.keys()
changed_files_keys.sort()
missing_files.sort()
print '\n'
if len(changed_files) != 0:
print 'File(s) changed:'
for key in changed_files_keys:
print key.strip('\n'), 'changed from:\n\t',
changed_files[key][0], \
'to\n\t', changed_files[key][1]
pass
print '\n\t', len(changed_files), 'file(s) changed.\n'
pass
if len(missing_files) != 0:
print 'File(s) not found:'
for x in range(len(missing_files)):
print '\t', missing_files[x]
pass
print '\n\t', len(missing_files), 'file(s) not found.\n'
pass
if not len(changed_files) and not len(missing_files):
print "\n\tChecksums Verified\n"
pass
pass

def calculate_checksum(file_name):
file_to_check = open(file_name, 'rb')
chunk = 8196
checksum = md5.new()
while (True):
chunkdata = file_to_check.read(chunk)
if not chunkdata:
break
checksum.update(chunkdata)
pass
file_to_check.close()
return checksum.hexdigest().upper()

def rename_file_set(new_dir_names, checksums):
file_info = md5format(checksums)
dirlist = glob('00[1-4]Volume [1-4]')
dirlist.sort()
for x in range(4):
rename(dirlist[x], new_dir_names[x])
print '\t', dirlist[x], 'renamed to:', new_dir_names[x]
chdir(new_dir_names[x])
for old_file_name in glob ('*.mp3'):
# old_file_name[3:] is part of removing numbering:
'001Track ...'
new_file_name = old_file_name[3:]
rename(old_file_name, new_file_name)
print '\t\t', old_file_name, 'renamed to:', new_file_name
pass
chdir('..')
file_info = md5file_name_edit(file_info,dirlist[x],
new_dir_names[x])
pass
md5write(file_info, checksums)
replace_strings('The American Century.htm', dirlist,
new_dir_names)
print '\n\tDirectories and Files renamed.'
pass

def md5format(checksums):
file_info = {}
checksums = open(checksums, 'r')
for line in checksums.readlines():
splitline = line.split(' *')
#original full filename = (checksum, [directory name, file
name])
file_info[splitline[1]] = (splitline[0],splitline[1].split('\
\'))
pass
checksums.close()
return file_info

def md5file_name_edit(file_info, old_dir_name, new_dir_name):
for x in file_info.keys():
dir_name_from_file = file_info[x][1][0]
if dir_name_from_file == old_dir_name:
checksum = file_info[x][0]
file_name_from_file = file_info[x][1][1]
#md5 format: 'C8109BF6B0EF724770A66CF4ED6251A7 *001Album
1\001Track.mp3'
file_info[x] = (checksum, [new_dir_name,
file_name_from_file])
#mp3cd numbering: '001Track.mp3, 002Track.mp3...'
if file_name_from_file[0] == '0':
file_info[x] =(checksum, [new_dir_name,
file_name_from_file[3:]])
pass
pass
pass
return file_info

def md5write(file_info, checksums):
keys = file_info.keys()
keys.sort()
checksums = open(checksums, 'w')
for x in keys:
checksum = file_info[x][0]
try:
#when the file is one directory deep:
#'C8109BF6B0EF724770A66CF4ED6251A7 *001Album
1\001Track.mp3'
dir_name = file_info[x][1][0]
file_name = file_info[x][1][1]
checksums.writelines('%s *%s' % (checksum,
os.path.join(dir_name, \
file_name)))
pass
except IndexError:
#when the file is in root dir:
'007CC9C12342017709A2F19AF75247BD *010Track.mp3'
file_name = file_info[x][1][0]
checksums.writelines('%s *%s' % (checksum, file_name))
pass
pass
checksums.close()
pass

def replace_strings(file_name, oldlist, newlist):
try:
new_file = open(file_name, 'r').read();
for x in range(4):
new_file = new_file.replace(oldlist[x], newlist[x], 1)
pass
remove(file_name)
file_name = open(file_name, 'w', len(new_file))
file_name.write(new_file)
file_name.close()
pass
except IOError:
print file_name, 'not found'
pass
pass

def main():
full_path = path.abspath(path.dirname(argv[0]))
chdir(full_path)
chdir('..')
checksums = path.join(full_path, 'checksums.md5')
new_dir_names = ('Volume 1 - 1889-1929', 'Volume 2 - 1929-1945', \
'Volume 3 - 1945-1965', 'Volume 4 - 1963-1989')
parser = OptionParser()
parser.add_option ('-v', '--verify', action = 'store_true', \
dest = 'verify', help = 'verify checksums')
parser.add_option ('-r', '--rename', action = 'store_true', dest =
\
'rename', help = \
'rename files to a more usable form (write rights needed)')
(options, args) = parser.parse_args()
if options.verify:
verify_checksum_set(checksums)
pass
if options.rename:
rename_file_set(new_dir_names, checksums)
pass
if not options.verify and not options.rename:
parser.print_help()
pass
pass

if __name__ == '__main__':
main()
Jun 27 '08 #2
En Sun, 04 May 2008 17:01:15 -0300, lev <le********@gmail.comescribió:
>* Change indentation from 8 spaces to 4
I like using tabs because of the text editor I use, the script at
the end is with 4 though.
Can't you configure it to use 4 spaces per indent - and not use "hard" tabs?
>* Remove useless "pass" and "return" lines
I replaced the return nothing lines with passes, but I like
keeping them in case the indentation is ever lost - makes it easy to
go back to original indentation
I can't think of a case when only indentation "is lost" - if you have a crash or something, normally you lose much more than indentation... Simple backups or a SCM system like cvs/svn will help. So I don't see the usefulness of those "pass" statements; I think that after some time using Python you'll consider them just garbage, as everyone else.
>* Temporarily change broken "chdir" line
removed as many instances of chdir as possible (a few useless ones
to accomodate the functions - changed functions to not chdir as much),
that line seems to work... I made it in case the script is launched
with say: 'python somedir\someotherdir\script.py' rather than 'python
script.py', because I need it to work in it's own and parent
directory.
You can determine the directory where the script resides using

import os
basedir = os.path.dirname(os.path.abspath(__file__))

This way it doesn't matter how it was launched. But execute the above code as soon as possible (before any chdir)
checksums = open(checksums, 'r')
for fline in checksums.readlines():
You can directly iterate over the file:

for fline in checksums:

(readlines() reads the whole file contents in memory; I guess this is not an issue here, but in other cases it may be an important difference)
Although it's perfectly valid, I would not reccomend using the same name for two different things (checksums refers to the file name *and* the file itself)
changed_files_keys = changed_files.keys()
changed_files_keys.sort()
missing_files.sort()
print '\n'
if len(changed_files) != 0:
print 'File(s) changed:'
for key in changed_files_keys:
You don't have to copy the keys and sort; use the sorted() builtin:

for key in sorted(changed_files.iterkeys()):

Also, "if len(changed_files) != 0" is usually written as:

if changed_files:

The same for missing_files.
for x in range(len(missing_files)):
print '\t', missing_files[x]
That construct range(len(somelist)) is very rarely used. Either you don't need the index, and write:

for missing_file in missing_files:
print '\t', missing_file

Or you want the index too, and write:

for i, missing_file in enumerate(missing_files):
print '%2d: %s' % (i, missing_file)
def calculate_checksum(file_name):
file_to_check = open(file_name, 'rb')
chunk = 8196
Any reason to use such number? 8K is 8192; you could use 8*1024 if you don't remember the value. I usually write 1024*1024 when I want exactly 1M.

--
Gabriel Genellina

Jun 27 '08 #3
lev
On May 4, 10:04 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Sun, 04 May 2008 17:01:15 -0300, lev <levlozh...@gmail.comescribió:
* Change indentation from 8 spaces to 4
I like using tabs because of the text editor I use, the script at
the end is with 4 though.

Can't you configure it to use 4 spaces per indent - and not use "hard" tabs?
* Remove useless "pass" and "return" lines
I replaced the return nothing lines with passes, but I like
keeping them in case the indentation is ever lost - makes it easy to
go back to original indentation

I can't think of a case when only indentation "is lost" - if you have a crash or something, normally you lose much more than indentation... Simple backups or a SCM system like cvs/svn will help. So I don't see the usefulness of those "pass" statements; I think that after some time using Python you'll consider them just garbage, as everyone else.
* Temporarily change broken "chdir" line
removed as many instances of chdir as possible (a few useless ones
to accomodate the functions - changed functions to not chdir as much),
that line seems to work... I made it in case the script is launched
with say: 'python somedir\someotherdir\script.py' rather than 'python
script.py', because I need it to work in it's own and parent
directory.

You can determine the directory where the script resides using

import os
basedir = os.path.dirname(os.path.abspath(__file__))

This way it doesn't matter how it was launched. But execute the above codeas soon as possible (before any chdir)
checksums = open(checksums, 'r')
for fline in checksums.readlines():

You can directly iterate over the file:

for fline in checksums:

(readlines() reads the whole file contents in memory; I guess this is not an issue here, but in other cases it may be an important difference)
Although it's perfectly valid, I would not reccomend using the same name for two different things (checksums refers to the file name *and* the file itself)
changed_files_keys = changed_files.keys()
changed_files_keys.sort()
missing_files.sort()
print '\n'
if len(changed_files) != 0:
print 'File(s) changed:'
for key in changed_files_keys:

You don't have to copy the keys and sort; use the sorted() builtin:

for key in sorted(changed_files.iterkeys()):

Also, "if len(changed_files) != 0" is usually written as:

if changed_files:

The same for missing_files.
for x in range(len(missing_files)):
print '\t', missing_files[x]

That construct range(len(somelist)) is very rarely used. Either you don't need the index, and write:

for missing_file in missing_files:
print '\t', missing_file

Or you want the index too, and write:

for i, missing_file in enumerate(missing_files):
print '%2d: %s' % (i, missing_file)
def calculate_checksum(file_name):
file_to_check = open(file_name, 'rb')
chunk = 8196

Any reason to use such number? 8K is 8192; you could use 8*1024 if you don't remember the value. I usually write 1024*1024 when I want exactly 1M.

--
Gabriel Genellina
Thank you Gabriel, I did not know about a number of the commands you
posted, the use of 8196 was error on my part. I will change the script
to reflect your corrections later tonight, I have another project I
need to finish/comment/submit for corrections later on, so I will be
using the version of the script that I will come up with tonight.

Thank you for your invaluable advice,
The python community is the first online community that I have had
this much help from, Thank you all.
Jun 27 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Jonny | last post by:
Netscape 7.02 is giving me a headache with a downloaded snow script. Starting with a blank page, I inserted the script and checked it in IE 6 and Netscape 7.02. Everything worked and looked fine. A...
1
by: Peter Forthmann | last post by:
Hi, I am testing the Intel 8.0 Compiler for Linux. For all the sources I have compiled It produces a lot faster code than the GNU g++ compiler. However, I must have something wrong, since I can't...
9
by: Rune | last post by:
Is it best to use double quotes and let PHP expand variables inside strings, or is it faster to do the string manipulation yourself manually? Which is quicker? 1) $insert = 'To Be';...
12
by: WantedToBeDBA | last post by:
Hi all, db2 => create table emp(empno int not null primary key, \ db2 (cont.) => sex char(1) not null constraint s_check check \ db2 (cont.) => (sex in ('m','f')) \ db2 (cont.) => not enforced...
14
by: joshc | last post by:
I'm writing some C to be used in an embedded environment and the code needs to be optimized. I have a question about optimizing compilers in general. I'm using GCC for the workstation and Diab...
21
by: mjbackues at yahoo | last post by:
Hello. I'm having a problem with the Visual Studio .net (2003) C++ speed optimization, and hope someone can suggest a workaround. My project includes many C++ files, most of which work fine...
5
by: wkaras | last post by:
I've compiled this code: const int x0 = 10; const int x1 = 20; const int x2 = 30; int x = { x2, x0, x1 }; struct Y {
1
YenRaven
by: YenRaven | last post by:
I have wrote a script that searches through the body's innerHTML to find and highlight each instance string omiting any matches in html tags. it works.. but slowly to the point that if you have more...
1
by: David | last post by:
It's too long to post here (160 lines) so here's the link: Neither link works for me. I get an error page "Error: invalid download linnk". How about you send it to the list as an attachment? ...
20
by: Ravikiran | last post by:
Hi Friends, I wanted know about whatt is ment by zero optimization and sign optimization and its differences.... Thank you...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.