I am trying to input a spreadsheet of possible domain names and output the length of the sourcecode of the webpage (if it exists). In testing this, I’ve come across a lot of errors, for example, when websites ask for username/password, take too slow to load, or no longer work. I am a newbie and am looking for help dealing with multiple errors.
First, is the basic code setup below the best way to deal with multiple errors? I would like to have the program read the lines if possible, and return back null (or string=’’) if something goes wrong; do I have to do this ad hoc, adding new errors to my list each time I found one, or is there some way to say ‘return null if any error comes up?’
Second, how do I handle this error?
InvalidURL: nonnumeric port: ''
It seems to happen on z=urlopen('http://www.'+domainTerm)
Third, how do I handle this error?
error: (10054, 'Connection reset by peer')
It seems to happen on lineList=z.readlines()
Thank you,
Mitch
Expand|Select|Wrap|Line Numbers
- from mechanize import Browser
- from urllib import urlopen
- import re, time, urllib2, string
- f = open('rawST2.csv','r')
- lines = f.readlines()
- f.close()
- f2 = open('toy4.csv','w')
- f2.write(lines[0].rstrip()+',sourcelines'+',sourcecharacters' +"\n")
- print(lines[0].rstrip()+",sourcelines"+",sourcecharacters)
- domainTerms=[]
- for i in range(1,len(lines)):
- domainTerms.append( lines[i].split(',')[5].rstrip() )
- x = domainTerms[59]
- domainTerm=string.replace(x,x[0],"")
- try:
- z=urlopen('http://www.'+domainTerm)
- lineList=z.readlines()
- except (urllib2.URLError, ValueError, IOError, AttributeError, TypeError):
- lineList=""