Comments on my first script?

Phillip B Oldham

I'm keen on learning python, with a heavy lean on doing things the
"pythonic" way, so threw the following script together in a few hours
as a first-attempt in programming python.

I'd like the community's thoughts/comments on what I've done;
improvements I can make, "don'ts" I should be avoiding, etc. I'm not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it's whois record and push to a csv
## file.

import subprocess
import re

src = open('./domains.txt')

dest = open('./whois.csv', 'w');

sep = "|"
headers = ["Domain","Regis trant","Registr ant's
Address","Regis trar","Registra nt Type","Date Registered","Re newal
Date","Last Updated","Name Servers"]

dest.write(sep. join(headers)+" \n")

def trim( txt ):
x = []
for line in txt.split("\n") :
if line.strip() == "":
continue
if line.strip().st artswith('WHOIS '):
continue
if line.strip().st artswith('>>>') :
continue
if line.strip().st artswith('%'):
continue
if line.startswith ("--"):
return ''.join(x)
x.append(" "+line)
return "\n".join(x )

def clean( txt ):
x = []
isok = re.compile("^\s ?([^:]+): ").match
for line in txt.split("\n") :
match = isok(line)
if not match:
continue
x.append(line)
return "\n".join(x );

def clean_co_uk( rec ):
rec = rec.replace('Co mpany number:', 'Company number -')
rec = rec.replace("\n \n", "\n")
rec = rec.replace("\n ", "")
rec = rec.replace(": ", ":\n")
rec = re.sub("([^(][a-zA-Z']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\ n", ": ")
rec = re.sub("^[ ]+\n", "", rec)
return rec

def clean_net( rec ):
rec = rec.replace("\n \n", "\n")
rec = rec.replace("\n ", "")
rec = rec.replace(": ", ":\n")
rec = re.sub("([a-zA-Z']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\ n", ": ")
return rec

def clean_info( rec ):
x = []
for line in rec.split("\n") :
x.append(re.sub ("^([^:]+):", "\g<0", line))
return "\n".join(x )

def record(domain, record):
details = ['','','','','', '','','','']
for k, v in record.items():
try:
details[0] = domain.lower()
result = {
"registrant ": lambda: 1,
"registrant name": lambda: 1,
"registrant type": lambda: 4,
"registrant 's address": lambda: 2,
"registrant address1": lambda: 2,
"registrar" : lambda: 3,
"sponsoring registrar": lambda: 3,
"registered on": lambda: 5,
"registered ": lambda: 5,
"domain registeration date": lambda: 5,
"renewal date": lambda: 6,
"last updated": lambda: 7,
"domain last updated date": lambda: 7,
"name servers": lambda: 8,
"name server": lambda: 8,
"nameserver s": lambda: 8,
"updated date": lambda: 7,
"creation date": lambda: 5,
"expiration date": lambda: 6,
"domain expiration date": lambda: 6,
"administra tive contact": lambda: 2
}[k.lower()]()
if v != '':
details[result] = v
except:
continue

dest.write(sep. join(details)+" \n")

## Loop through domains
for domain in src:

domain = domain.strip()

if domain == '':
continue

rec = subprocess.Pope n(["whois",dom ain],
stdout=subproce ss.PIPE).commun icate()[0]

if rec.startswith( "No whois server") == True:
continue

if rec.startswith( "This TLD has no whois server") == True:
continue

rec = trim(rec)

if domain.endswith (".net"):
rec = clean_net(rec)

if domain.endswith (".com"):
rec = clean_net(rec)

if domain.endswith (".tv"):
rec = clean_net(rec)

if domain.endswith (".co.uk"):
rec = clean_co_uk(rec )

if domain.endswith (".info"):
rec = clean_info(rec)

rec = clean(rec)

details = {}

try:
for line in rec.split("\n") :
bits = line.split(': ')
a = bits.pop(0)
b = bits.pop(0)
details[a.strip()] = b.strip().repla ce("\t", ", ")
except:
continue

record(domain, details)

## Cleanup
src.close()
dest.close()

Jun 27 '08 #1

Subscribe Reply

1284

John Salerno

"Phillip B Oldham" <ph************ @gmail.comwrote in message
news:7e******** *************** ***********@26g 2000hsk.googleg roups.com...

I'd like the community's thoughts/comments on what I've done;
improvements I can make, "don'ts" I should be avoiding, etc. I'm not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

I'm not expert, but here are a few thoughts. I hope they help.

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it's whois record and push to a csv
## file.

You might want to look into doc strings as a method of providing longer
documentation like this about what your program does.

dest = open('./whois.csv', 'w');

Semicolon!!!! :)

def trim( txt ):
x = []
for line in txt.split("\n") :
if line.strip() == "":
continue
if line.strip().st artswith('WHOIS '):
continue
if line.strip().st artswith('>>>') :
continue
if line.strip().st artswith('%'):
continue
if line.startswith ("--"):
return ''.join(x)

Is all this properly indented? One thing you can do is put each of these on
one line, since they are fairly simple:

if line.strip().st artswith('WHOIS '): continue

although I still like proper indentation. But you have a lot of them so it
might save a good amount of space to do it this way.

Also, just my personal preference, I like to be consistent with the type of
quotes I use for strings. Here, you mix both single and double quotes on
different lines.

return "\n".join(x );

Semicolon!!!! :) :)

details = ['','','','','', '','','','']

I don't have Python available to me right now, but I think you can do this
instead:

details = [''] * 9

except:
continue

Non-specific except clauses usually aren't preferred since they catch
everything, even something you might not want to catch.

if domain == '':
continue

You can say:

if not domain

instead of that equivalence test. But what does this if statement do?

if rec.startswith( "No whois server") == True:
continue

if rec.startswith( "This TLD has no whois server") == True:
continue

Like above, you don't need "== True" here.

if domain.endswith (".net"):
rec = clean_net(rec)

if domain.endswith (".com"):
rec = clean_net(rec)

if domain.endswith (".tv"):
rec = clean_net(rec)

if domain.endswith (".co.uk"):
rec = clean_co_uk(rec )

if domain.endswith (".info"):
rec = clean_info(rec)

Hmm, my first thought is to do something like this with all these if tests:

for extension in [<list all the extensions as strings here>]:
rec = clean_net(exten sion)

But for that to work, you may need to generalize the clean_net function so
it works for all of them, instead of having to call different functions
depending on the extension.

Anyway, I hope some of that helps!

Jun 27 '08 #2

John Salerno

"John Salerno" <jo******@NOSPA Mgmail.comwrote in message
news:48******** **************@ news.astraweb.c om...

>if domain.endswith (".net"):
rec = clean_net(rec)

if domain.endswith (".com"):
rec = clean_net(rec)

if domain.endswith (".tv"):
rec = clean_net(rec)

if domain.endswith (".co.uk"):
rec = clean_co_uk(rec )

if domain.endswith (".info"):
rec = clean_info(rec)

Hmm, my first thought is to do something like this with all these if
tests:

for extension in [<list all the extensions as strings here>]:
rec = clean_net(exten sion)

Whoops, you'd still need an if test in there I suppose!

for extension in [<list all the extensions as strings here>]:
if domain.endswith (extension):
rec = clean_net(exten sion)

Not sure if this is ideal.

Jun 27 '08 #3

Chris

On Jun 12, 4:27*pm, Phillip B Oldham <phillip.old... @gmail.comwrote :

I'm keen on learning python, with a heavy lean on doing things the
"pythonic" way, so threw the following script together in a few hours
as a first-attempt in programming python.

I'd like the community's thoughts/comments on what I've done;
improvements I can make, "don'ts" I should be avoiding, etc. I'm not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it's whois record and push to a csv
## file.

import subprocess
import re

src = open('./domains.txt')

dest = open('./whois.csv', 'w');

sep = "|"
headers = ["Domain","Regis trant","Registr ant's
Address","Regis trar","Registra nt Type","Date Registered","Re newal
Date","Last Updated","Name Servers"]

dest.write(sep. join(headers)+" \n")

def trim( txt ):
* * * * x = []
* * * * for line in txt.split("\n") :
* * * * * * * * if line.strip() == "":
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().st artswith('WHOIS '):
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().st artswith('>>>') :
* * * * * * * * * * * * continue
* * * * * * * * if line.strip().st artswith('%'):
* * * * * * * * * * * * continue
* * * * * * * * if line.startswith ("--"):
* * * * * * * * * * * * return ''.join(x)
* * * * * * * * x.append(" "+line)
* * * * return "\n".join(x )

def clean( txt ):
* * * * x = []
* * * * isok = re.compile("^\s ?([^:]+): ").match
* * * * for line in txt.split("\n") :
* * * * * * * * match = isok(line)
* * * * * * * * if not match:
* * * * * * * * * * * * continue
* * * * * * * * x.append(line)
* * * * return "\n".join(x );

def clean_co_uk( rec ):
* * * * rec = rec.replace('Co mpany number:', 'Company number -')
* * * * rec = rec.replace("\n \n", "\n")
* * * * rec = rec.replace("\n ", "")
* * * * rec = rec.replace(": ", ":\n")
* * * * rec = re.sub("([^(][a-zA-Z']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
* * * * rec = rec.replace(":\ n", ": ")
* * * * rec = re.sub("^[ ]+\n", "", rec)
* * * * return rec

def clean_net( rec ):
* * * * rec = rec.replace("\n \n", "\n")
* * * * rec = rec.replace("\n ", "")
* * * * rec = rec.replace(": ", ":\n")
* * * * rec = re.sub("([a-zA-Z']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
* * * * rec = rec.replace(":\ n", ": ")
* * * * return rec

def clean_info( rec ):
* * * * x = []
* * * * for line in rec.split("\n") :
* * * * * * * * x.append(re.sub ("^([^:]+):", "\g<0", line))
* * * * return "\n".join(x )

def record(domain, record):
* * * * details = ['','','','','', '','','','']
* * * * for k, v in record.items():
* * * * * * * * try:
* * * * * * * * * * * * details[0] = domain.lower()
* * * * * * * * * * * * result = {
* * * * * * * * * * * * * * * * "registrant ": lambda: 1,
* * * * * * * * * * * * * * * * "registrant name": lambda: 1,
* * * * * * * * * * * * * * * * "registrant type": lambda: 4,
* * * * * * * * * * * * * * * * "registrant 's address": lambda: 2,
* * * * * * * * * * * * * * * * "registrant address1": lambda: 2,
* * * * * * * * * * * * * * * * "registrar" : lambda: 3,
* * * * * * * * * * * * * * * * "sponsoring registrar": lambda: 3,
* * * * * * * * * * * * * * * * "registered on": lambda: 5,
* * * * * * * * * * * * * * * * "registered ": lambda: 5,
* * * * * * * * * * * * * * * * "domain registeration date": lambda: 5,
* * * * * * * * * * * * * * * * "renewal date": lambda: 6,
* * * * * * * * * * * * * * * * "last updated": lambda: 7,
* * * * * * * * * * * * * * * * "domain last updated date": lambda: 7,
* * * * * * * * * * * * * * * * "name servers": lambda: 8,
* * * * * * * * * * * * * * * * "name server": lambda: 8,
* * * * * * * * * * * * * * * * "nameserver s": lambda: 8,
* * * * * * * * * * * * * * * * "updated date": lambda: 7,
* * * * * * * * * * * * * * * * "creation date": lambda: 5,
* * * * * * * * * * * * * * * * "expiration date": lambda: 6,
* * * * * * * * * * * * * * * * "domain expiration date": lambda: 6,
* * * * * * * * * * * * * * * * "administra tive contact": lambda: 2
* * * * * * * * * * * * }[k.lower()]()
* * * * * * * * * * * * if v != '':
* * * * * * * * * * * * * * * * details[result] = v
* * * * * * * * except:
* * * * * * * * * * * * continue

* * * * dest.write(sep. join(details)+" \n")

## Loop through domains
for domain in src:

* * * * domain = domain.strip()

* * * * if domain == '':
* * * * * * * * continue

* * * * rec = subprocess.Pope n(["whois",dom ain],
stdout=subproce ss.PIPE).commun icate()[0]

* * * * if rec.startswith( "No whois server") == True:
* * * * * * * * continue

* * * * if rec.startswith( "This TLD has no whois server") == True:
* * * * * * * * continue

* * * * rec = trim(rec)

* * * * if domain.endswith (".net"):
* * * * * * * * rec = clean_net(rec)

* * * * if domain.endswith (".com"):
* * * * * * * * rec = clean_net(rec)

* * * * if domain.endswith (".tv"):
* * * * * * * * rec = clean_net(rec)

* * * * if domain.endswith (".co.uk"):
* * * * * * * * rec = clean_co_uk(rec )

* * * * if domain.endswith (".info"):
* * * * * * * * rec = clean_info(rec)

* * * * rec = clean(rec)

* * * * details = {}

* * * * try:
* * * * * * * * for line in rec.split("\n") :
* * * * * * * * * * * * bits = line.split(': ')
* * * * * * * * * * * * a = bits.pop(0)
* * * * * * * * * * * * b = bits.pop(0)
* * * * * * * * * * * * details[a.strip()] = b.strip().repla ce("\t", ", ")
* * * * except:
* * * * * * * * continue

* * * * record(domain, details)

## Cleanup
src.close()
dest.close()

Just a few quick things before I leave work.

#!/usr/bin/env python
"""Open a file containing a list of domains (1 per line),
request and parse it's whois record and push to a csv
file.
""" # Rather use docstrings than multiline commenting like that.

def trim(txt):
x = []
for line in txt.splitlines( ): # Strings have a built in function
if not line.strip() or line.startswith ('WHOIS') \
or line.startswith ('>>>') or line.startswith ('%'):
continue # you can do them in one if statement
if line.startswith ('--'): return ''.join(x)
x.append(' '+line)
return '\n'.join(x)

for domain in src:
if not domain.strip(): continue # A line with nothing is False

rec = subprocess.Pope n(["whois",domain. strip()],
stdout=subproce ss.PIPE).commun icate()[0]
if rec.startswith( 'No whois server') \
or rec.startswith( 'This TLD has no whois server'):
continue # Startswith will return True/False so it is enough

rec = trim(rec)
if domain.endswith ('.net'):
rec = clean_net(rec)
elif domain.endswith ('.com'):
# Rather use if/elif statements unless somehow you think you
will match more than one.
....

for line in rec.splitlines( ):
try:
a, b = line.split(': ')[:2]
details[a.strip()] = b.strip().repla ce('\t', ', ')
except IndexError: # No matches
continue

Hope that's a start.

Jun 27 '08 #4

Phillip B Oldham

Thanks guys. Those comments are really helpful. The odd semi-colon is
my PHP background. Will probably be a hard habbit to break, that
one! ;) If I do accidentally drop a semi-colon at the end of the line,
will that cause any weird errors?

Also, Chris, can you explain this:
a, b = line.split(': ')[:2]

I understand the first section, but I've not seen [:2] before.

Jun 27 '08 #5

Chris

On Jun 13, 9:38*am, Phillip B Oldham <phillip.old... @gmail.comwrote :

Thanks guys. Those comments are really helpful. The odd semi-colon is
my PHP background. Will probably be a hard habbit to break, that
one! ;) If I do accidentally drop a semi-colon at the end of the line,
will that cause any weird errors?

Also, Chris, can you explain this:
a, b = line.split(': ')[:2]

I understand the first section, but I've not seen [:2] before.

That's slicing at work. What it is doing is only taking the first two
elements of the list that is built by the line.split.

Jun 27 '08 #6

Bruno Desthuilliers

Phillip B Oldham a écrit :

I'm keen on learning python, with a heavy lean on doing things the
"pythonic" way, so threw the following script together in a few hours
as a first-attempt in programming python.

I'd like the community's thoughts/comments on what I've done;
improvements I can make, "don'ts" I should be avoiding, etc. I'm not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

Ok, since you asked for it, let's go:

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it's whois record and push to a csv
## file.

import subprocess
import re

src = open('./domains.txt')

dest = open('./whois.csv', 'w');

Might be better to allow the user to pass source and destination as
arguments, defaulting to stdin and stdout.

Also, you may want to have a look at the csv module in the stdlib.

sep = "|"
headers = ["Domain","Regis trant","Registr ant's
Address","Regis trar","Registra nt Type","Date Registered","Re newal
Date","Last Updated","Name Servers"]

dest.write(sep. join(headers)+" \n")

def trim( txt ):
x = []
for line in txt.split("\n") :
if line.strip() == "":
continue
if line.strip().st artswith('WHOIS '):
continue
if line.strip().st artswith('>>>') :
continue
if line.strip().st artswith('%'):
continue
if line.startswith ("--"):
return ''.join(x)
x.append(" "+line)
return "\n".join(x )

You're doing way to may calls to line.strip(). Call it once and store
the result.

def trim_test(line) :
line = line.strip()
if not line:
return False
for test in ("WHOIS", ">>>", "%",):
if line.startswith (test):
return False
return True

def trim(txt):
lines = []
for line in txt.split.split lines():
if trim_test(line) :
if line.starstwith ("--"):
return "".join(lin es)
lines.append(" " + line)
return "\n".join(lines )

def clean( txt ):
x = []
isok = re.compile("^\s ?([^:]+): ").match

Would be better to extract the regex compilation out of the function.

for line in txt.split("\n") :
match = isok(line)
if not match:
continue
x.append(line)

If you don't use the match object itself, don't ever bother to bind it:

for line in txt.split("\n") :
if not isok(line):
continue
x.append(line)

Then, you may find the intent and flow most obvious if you get rid of
the double negation (the not and the continue):

for line in txt.splitlines( ):
if isok(line):
x.append(line)

which is easy to rewrite as a either a list comprehension:

x = [line for line in txt.splitlines( ) if isok(line)]

or in a more lispish/functional style:

x = filter(isok, txt.splitlines( ))

In both way, you now can get rid of the binding to 'x' (a very bad name
for a list of lines BTW - what about something more explicit, like
'lines' ?)

return "\n".join(x );

isok = re.compile("^\s ?([^:]+): ").match

def clean(txt):
return "\n".join(filte r(isok, txt.splitlines( )))

def clean_co_uk( rec ):
rec = rec.replace('Co mpany number:', 'Company number -')
rec = rec.replace("\n \n", "\n")

Given the following, this above statement is useless.

rec = rec.replace("\n ", "")

rec = rec.replace(": ", ":\n")
rec = re.sub("([^(][a-zA-Z']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\ n", ": ")
rec = re.sub("^[ ]+\n", "", rec)

All this could probably be simplified.

return rec

def clean_net( rec ):
rec = rec.replace("\n \n", "\n")
rec = rec.replace("\n ", "")
rec = rec.replace(": ", ":\n")
rec = re.sub("([a-zA-Z']+\s?[a-zA-Z]*:\n)", "\n\g<0>", rec)
rec = rec.replace(":\ n", ": ")
return rec

Idem.

def clean_info( rec ):
x = []
for line in rec.split("\n") :
x.append(re.sub ("^([^:]+):", "\g<0", line))
return "\n".join(x )

def record(domain, record):
details = ['','','','','', '','','','']

details = [''] * 9

for k, v in record.items():
try:
details[0] = domain.lower()
result = {
"registrant ": lambda: 1,
"registrant name": lambda: 1,
"registrant type": lambda: 4,
"registrant 's address": lambda: 2,
"registrant address1": lambda: 2,
"registrar" : lambda: 3,
"sponsoring registrar": lambda: 3,
"registered on": lambda: 5,
"registered ": lambda: 5,
"domain registeration date": lambda: 5,
"renewal date": lambda: 6,
"last updated": lambda: 7,
"domain last updated date": lambda: 7,
"name servers": lambda: 8,
"name server": lambda: 8,
"nameserver s": lambda: 8,
"updated date": lambda: 7,
"creation date": lambda: 5,
"expiration date": lambda: 6,
"domain expiration date": lambda: 6,
"administra tive contact": lambda: 2
}[k.lower()]()

Ok, let's summarize. On each iteration, you define a dict with the very
same 21 key:value pairs. Isn't it a bit wasteful ? What about defining
the dict only once, outside the function ?

Also, the values in the dict are constant functions. Why not just use
the constant results of the functions then ? I mean : what's wrong with
just :

{
"registrant ": 1,
"registrant name": 1,
"registrant type": 4,
(etc...)
}

if v != '':
details[result] = v

As an icing on the cake, you build this whole dict, look up a function
in it, an call the function *before* you even decide if you need that
result.

except:
continue

Friendly advice : *never* use a bare except clause that discards the
exception. Never ever do that.

Your except clause here should specifically catch KeyError. But anyway
you don't ever need to worry about exceptions here, you just have to use
dict.get(key, default) instead.
FIELDS_POSITION S = {
"registrant ": 1,
"registrant name": 1,
"registrant type": 4,
"registrant 's address": 2,
(etc...)
}

def record(domain, rec):
details = [domain.lower()] + [''] * 8
for k, v in record.items():
if v:
pos = FIELDS_POSITION S.get(k.lower() , None)
if pos is not None:
details[pos] = v

# I'm leaving this here, but I'd personnaly split the
# two unrelated concerns of formatting the record and
# writing it somewhere.

dest.write(sep. join(details)+" \n")

## Loop through domains
for domain in src:

domain = domain.strip()

if domain == '':
continue

rec = subprocess.Pope n(["whois",dom ain],
stdout=subproce ss.PIPE).commun icate()[0]

if rec.startswith( "No whois server") == True:
continue

if rec.startswith( "This TLD has no whois server") == True:
continue

rec = trim(rec)

if domain.endswith (".net"):
rec = clean_net(rec)

if domain.endswith (".com"):
rec = clean_net(rec)

if domain.endswith (".tv"):
rec = clean_net(rec)

if domain.endswith (".co.uk"):
rec = clean_co_uk(rec )

if domain.endswith (".info"):
rec = clean_info(rec)

Since the domain is very unlikely to match more than one test, at least
use if/elif/.../else to avoid redundant useless tests.

Now *this* would have been a good use of a dict of functions:
REC_CLEANERS = {
'.net' : clean_net,
'.com' : clean_com,
'.tv' : clean_net,
'.uk' : clean_co_uk,
(etc...)
}

for domain in rec:
# code here
ext = domain.rsplit(' .', 1)[1]
cleaner = REC_CLEANERS.ge t(ext, None)
if cleaner:
rec = cleaner(rec)

rec = clean(rec)

details = {}

try:
for line in rec.split("\n") :
bits = line.split(': ')
a = bits.pop(0)
b = bits.pop(0)

if you expect only one ': ', then:
a, b = line.split(': ')

if you can have many but don't care about the others:
bits = line.split(': ')
a, b = bits[0], bits[1]

details[a.strip()] = b.strip().repla ce("\t", ", ")
except:

cf above. Please, *don't* do that.

continue

record(domain, details)

## Cleanup
src.close()
dest.close()

There are other possible improvements of course. Like:

- putting the main loop in it's own function taking source and dest (two
opened (resp in 'r' and 'w' mode) filelike objects)
- conditionnally call it from the top-level *if* the script has been
called as a script (vs imported as a module) so you can reuse this code
from another script.

The test is:

if __name__ == '__main__':
# has been called as a script
else:
# has been imported

HTH

Jun 27 '08 #7

Aidan

Chris wrote:

On Jun 13, 9:38 am, Phillip B Oldham <phillip.old... @gmail.comwrote :
>Thanks guys. Those comments are really helpful. The odd semi-colon is
my PHP background. Will probably be a hard habbit to break, that
one! ;) If I do accidentally drop a semi-colon at the end of the line,
will that cause any weird errors?

Also, Chris, can you explain this:
a, b = line.split(': ')[:2]

I understand the first section, but I've not seen [:2] before.

That's slicing at work. What it is doing is only taking the first two
elements of the list that is built by the line.split.

slicing is a very handy feature... I'll expand on it a little

OK so, first I'll create a sequence of integers

>>seq = range(10)
seq

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

"take element with index 4 and everything after it"

>>seq[4:]

[4, 5, 6, 7, 8, 9]

"take everything up to, but not including, the element with index 4"

>>seq[:4]

[0, 1, 2, 3]

"take the element with index 3 and everything up to, but not including,
the element with index 6"

>>seq[3:6]

[3, 4, 5]

then there's the step argument

"take every second element from the whole sequence"

>>seq[::2]

[0, 2, 4, 6, 8]

"take every second element from the element with index 2 up to, but not
including, the element with index 8"

>>seq[2:8:2]

[2, 4, 6]
Hope that helps.

Jun 27 '08 #8

D'Arcy J.M. Cain

On Fri, 13 Jun 2008 10:19:38 +0200
Bruno Desthuilliers <br************ ********@websit eburo.invalidwr ote:

Ok, since you asked for it, let's go:

Good commentary. One small improvement:

REC_CLEANERS = {
'.net' : clean_net,
'.com' : clean_com,
'.tv' : clean_net,
'.uk' : clean_co_uk,
(etc...)
}

for domain in rec:
# code here
ext = domain.rsplit(' .', 1)[1]
cleaner = REC_CLEANERS.ge t(ext, None)
if cleaner:
rec = cleaner(rec)

How about this?

for domain in rec:
# code here
ext = domain.rsplit(' .', 1)[1]
rec = REC_CLEANERS.ge t(ext, lambda x: x)

I suppose you could predefine the default function as well. This saves
a binding and a test at the expense of a possible lambda call.

--
D'Arcy J.M. Cain <da***@druid.ne t | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

Jun 27 '08 #9

Lie

On Jun 12, 10:10*pm, "John Salerno" <johnj...@NOSPA Mgmail.comwrote :

"Phillip B Oldham" <phillip.old... @gmail.comwrote in messagenews:7e* *************** *************** ***@26g2000hsk. googlegroups.co m...

I'd like the community's thoughts/comments on what I've done;
improvements I can make, "don'ts" I should be avoiding, etc. I'm not
so much bothered about the resulting data - for the moment it meets my
needs. But any comment is welcome!

I'm not expert, but here are a few thoughts. I hope they help.

#!/usr/bin/env python
## Open a file containing a list of domains (1 per line),
## request and parse it's whois record and push to a csv
## file.

You might want to look into doc strings as a method of providing longer
documentation like this about what your program does.

dest = open('./whois.csv', 'w');

Semicolon!!!! :)

def trim( txt ):
x = []
for line in txt.split("\n") :
if line.strip() == "":
continue
if line.strip().st artswith('WHOIS '):
continue
if line.strip().st artswith('>>>') :
continue
if line.strip().st artswith('%'):
continue
if line.startswith ("--"):
return ''.join(x)

Is all this properly indented? One thing you can do is put each of these on
one line, since they are fairly simple:

if line.strip().st artswith('WHOIS '): continue

although I still like proper indentation. But you have a lot of them so it
might save a good amount of space to do it this way.

Also, just my personal preference, I like to be consistent with the type of
quotes I use for strings. Here, you mix both single and double quotes on
different lines.

return "\n".join(x );

Semicolon!!!! *:) :)

details = ['','','','','', '','','','']

I don't have Python available to me right now, but I think you can do this
instead:

details = [''] * 9

Be careful with this, as python's string is immutable, this is ok, but
if you're replicating a mutable item here, the result would be nasty.

>
except:
continue

Non-specific except clauses usually aren't preferred since they catch
everything, even something you might not want to catch.

if domain == '':
continue

You can say:

if not domain

instead of that equivalence test. But what does this if statement do?

if rec.startswith( "No whois server") == True:
continue

if rec.startswith( "This TLD has no whois server") == True:
continue

Like above, you don't need "== True" here.

if domain.endswith (".net"):
rec = clean_net(rec)

if domain.endswith (".com"):
rec = clean_net(rec)

if domain.endswith (".tv"):
rec = clean_net(rec)

if domain.endswith (".co.uk"):
rec = clean_co_uk(rec )

if domain.endswith (".info"):
rec = clean_info(rec)

Hmm, my first thought is to do something like this with all these if tests:

for extension in [<list all the extensions as strings here>]:
* * rec = clean_net(exten sion)

But for that to work, you may need to generalize the clean_net function so
it works for all of them, instead of having to call different functions
depending on the extension.

Anyway, I hope some of that helps!

Jun 27 '08 #10

Similar topics

1823

Why are comments preceeded by // not ignored?

by: deko | last post by:

Problem occurs only when running php script at the command line. Here is an example: <?php //myscript.php //-should be called from some page. //-sets cookie on visitor's first visit - expires in 2 days (default). //-does this and that (and also the other thing). //records special data in /var/blah.txt (when using another-script.php).

PHP

1706

Coding comments/suggestions - first python script - sshd/ftpd blocking

by: avinashc | last post by:

If anyone is interested in a /etc/hosts.deny automatic update script (Unix only) based on sshd/vsftpd attacks, here's a python script: http://www.aczoom.com/tools/blockhosts/ This is a beta release, and my first attempt at Python coding. Any comments, suggestions, pointers on using more common Python idioms or example coding snippets, etc, welcome! Thanks!

Python

5224

Javascript form validation - comments please

by: Stephen Poley | last post by:

I have quite often (as have probably many of you) come across HTML forms with irritating bits of Javascript attached. The last straw on this particular camel's back was a large form I was asked to complete in connection with attendance at a seminar. After spending more than 15 minutes on it, I clicked on the submit button - and nothing happened. Looking round the pages on Javascript form validation that Google produced for me (well,...

Javascript

2548

Comments and scripts

by: Safalra | last post by:

If I understand the specification corrently, comments are delimitted by -- and --, but these delimitters can only occur in a block marked by <! and >. Is this correct? If I try to validate something like  then the validator obviously complains about b: "invalid comment declaration: found name start character outside comment but inside comment declaration". Is there anything that can occur in such a place in HTML? I...

HTML / CSS

7163

Advance Date +15 Days

by: | last post by:

I have a script... ----- <SCRIPT language="JavaScript" type="text/javascript"> <!-- function makeArray() { for (i = 0; i<makeArray.arguments.length; i++) this = makeArray.arguments; } function makeArray0() {

Javascript

3464

Conditional comments: is this new?

by: Benjamin Niemann | last post by:

Hello, I've been just investigating IE conditional comments - hiding things from non-IE/Win browsers is easy, but I wanted to know, if it's possible to hide code from IE/Win browsers. I found <!> in the original MSDN documentation, but this is (although it is working) unfortunately non-validating gibberish. So I fooled around trying to find a way to make it valid. And voila:

HTML / CSS

16789

delete comments in .c file

by: Timex | last post by:

I want to delete all comments in .c file. Size of .c file is very big. Any good idea to do this? Please show me example code.

C / C++

1679

First practical Python code, comments appreciated

by: planetthoughtful | last post by:

Hi All, I've written my first piece of practical Python code (included below), and would appreciate some comments. My situation was that I had a directory with a number of subdirectories that contained one or more zip files in each. Many of the zipfiles had the same filename (which is why they had previously been stored in separate directories). I wanted to bring all of the zip files (several hundrd in total) down to the common parent...

Python

11994

mysqldump wraps all trigger code in comments.

by: Marjeta | last post by:

I was trying to compare a particular trigger on multiple servers. First I tried phpMyAdmin to script the trigger code, which unfortunately only worked on one server that has newer version of phpMyAdmin... Then I used mysqldump, which scripted trigger code on all the servers, bur with comments around all the trigger related code: phpMyAdmine scripted trigger code without comments. Why are those comments there? I searched thru...

MySQL Database

9538

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

10123

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9975

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

8794

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7342

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6623

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5241

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5384

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3481

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP