Hello,
I am trying to input a spreadsheet of possible domain names and output the length of the sourcecode of the webpage (if it exists). In doing this, I have three small questions (I am a newbie and apologize if the questions are simple):
1. How do I convert the length of the page to a string? I have looked around the web for Python 'tostring' and found several individually created functions, but I tried a few and had problems.
2. What is the best way to handle errors when a domain phrase doesn't lead to a good website? This will happen (I think) with the line z=br.open('http://www.'+domainTerm)
for which the domainTerm might not lead to an active website.
3. Instead of getting the total number of characters on the sourcepage (which I get by looking at len(page) ), is there any way to get the number of lines?
Thank you,
Mitch
from mechanize import Browser
import re, time, urllib2
def MakeBrowser():
b = Browser()
headerString = 'mozilla/5.0 (x11; u; linux i686; en-us; rv:1.7.12) ' + \
'gecko/20050922 firefox/1.0.7 (debian package 1.0.7-1)'
h = [('User-agent', headerString)]
b.addheaders = h
b.set_handle_robots(False)
return(b)
f = open('bizornot1.csv','r')
lines = f.readlines()
f.close()
f2 = open('bizornot1_new.csv','w')
f2.write(lines[0].rstrip()+',PageSize'+"\n")
print(lines[0].rstrip()+",PageSize")
for i in range(len(lines)-1):
domainTerm = domainTerms[i]
br = MakeBrowser()
z=br.open('http://www.'+domainTerm)
page=z.read()
f2.write(lines[i+1].rstrip()+','+len(page)+"\n")
print(lines[i+1].rstrip()+','+len(page))
f2.close()