473,714 Members | 2,139 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Getting the size of sourcecode

7 New Member
Hello,

I am trying to input a spreadsheet of possible domain names and output the length of the sourcecode of the webpage (if it exists). In doing this, I have three small questions (I am a newbie and apologize if the questions are simple):

1. How do I convert the length of the page to a string? I have looked around the web for Python 'tostring' and found several individually created functions, but I tried a few and had problems.

2. What is the best way to handle errors when a domain phrase doesn't lead to a good website? This will happen (I think) with the line z=br.open('http ://www.'+domainTer m)
for which the domainTerm might not lead to an active website.

3. Instead of getting the total number of characters on the sourcepage (which I get by looking at len(page) ), is there any way to get the number of lines?

Thank you,
Mitch

from mechanize import Browser
import re, time, urllib2

def MakeBrowser():
b = Browser()
headerString = 'mozilla/5.0 (x11; u; linux i686; en-us; rv:1.7.12) ' + \
'gecko/20050922 firefox/1.0.7 (debian package 1.0.7-1)'
h = [('User-agent', headerString)]
b.addheaders = h
b.set_handle_ro bots(False)
return(b)

f = open('bizornot1 .csv','r')
lines = f.readlines()
f.close()
f2 = open('bizornot1 _new.csv','w')
f2.write(lines[0].rstrip()+',Pag eSize'+"\n")
print(lines[0].rstrip()+",Pag eSize")

for i in range(len(lines )-1):
domainTerm = domainTerms[i]
br = MakeBrowser()
z=br.open('http ://www.'+domainTer m)
page=z.read()
f2.write(lines[i+1].rstrip()+','+l en(page)+"\n")
print(lines[i+1].rstrip()+','+l en(page))

f2.close()
Aug 9 '07 #1
4 1473
bvdet
2,851 Recognized Expert Moderator Specialist
Hello,

I am trying to input a spreadsheet of possible domain names and output the length of the sourcecode of the webpage (if it exists). In doing this, I have three small questions (I am a newbie and apologize if the questions are simple):

1. How do I convert the length of the page to a string? I have looked around the web for Python 'tostring' and found several individually created functions, but I tried a few and had problems.

2. What is the best way to handle errors when a domain phrase doesn't lead to a good website? This will happen (I think) with the line z=br.open('http ://www.'+domainTer m)
for which the domainTerm might not lead to an active website.

3. Instead of getting the total number of characters on the sourcepage (which I get by looking at len(page) ), is there any way to get the number of lines?

Thank you,
Mitch

from mechanize import Browser
import re, time, urllib2

def MakeBrowser():
b = Browser()
headerString = 'mozilla/5.0 (x11; u; linux i686; en-us; rv:1.7.12) ' + \
'gecko/20050922 firefox/1.0.7 (debian package 1.0.7-1)'
h = [('User-agent', headerString)]
b.addheaders = h
b.set_handle_ro bots(False)
return(b)

f = open('bizornot1 .csv','r')
lines = f.readlines()
f.close()
f2 = open('bizornot1 _new.csv','w')
f2.write(lines[0].rstrip()+',Pag eSize'+"\n")
print(lines[0].rstrip()+",Pag eSize")

for i in range(len(lines )-1):
domainTerm = domainTerms[i]
br = MakeBrowser()
z=br.open('http ://www.'+domainTer m)
page=z.read()
f2.write(lines[i+1].rstrip()+','+l en(page)+"\n")
print(lines[i+1].rstrip()+','+l en(page))

f2.close()
I am not familiar with the 'mechanize' module. You can do the following with the 'urllib' module:
Expand|Select|Wrap|Line Numbers
  1. from urllib import urlopen
  2.  
  3. h = urlopen('http://www.somewebsite.com/')
  4.  
  5. # source = h.read() # read page into a string
  6. lineList = h.readlines() # read page into a list of strings
  7. info = h.info()
  8. trueURL = h.geturl()
  9.  
  10. print ('The number of lines is %d' % len(lineList))
  11.  
  12. print 'The number of words is %d' % sum([len(line.strip().split()) for line in lineList])
  13.  
  14. h.close()
  15.  
  16. try:
  17.     h = urlopen('http://www.invalidURL.com/')
  18. except IOError, e:
  19.     print e
Expand|Select|Wrap|Line Numbers
  1. >>> The number of lines is 110
  2. The number of words is 807
  3. [Errno socket error] (7, 'getaddrinfo failed')
  4.  
  5. >>> info
  6. <httplib.HTTPMessage instance at 0x00E7A3C8>
  7. >>> print info
  8. Date: Thu, 09 Aug 2007 02:25:43 GMT
  9. Server: Apache
  10. Last-Modified: Tue, 07 Aug 2007 02:41:23 GMT
  11. ETag: "744141-2b29-46b7dbd3"
  12. Accept-Ranges: bytes
  13. Content-Length: 11049
  14. Connection: close
  15. Content-Type: text/html
  16.  
  17. >>> trueURL
  18. 'http://www.bvdetailing.com/'
  19. >>> 
  20. '''
Aug 9 '07 #2
robin746
5 New Member
1. How do I convert the length of the page to a string? I have looked around the web for Python 'tostring' and found several individually created functions, but I tried a few and had problems.
Please clarify. If you have read the page into a variable, it is (likely) already a string. The str() function converts to a string but I do not think that is what you want.

2. What is the best way to handle errors when a domain phrase doesn't lead to a good website?
What error does your module spit back? Wrap the code you have in a try/except block specifying this error, and then do what you want when it happens. For example, if the error is BadSillyError, do this:
Expand|Select|Wrap|Line Numbers
  1. try:
  2.     # code to run
  3. except BadSillyError:
  4.     # what to do on failure
  5. else:
  6.     # continue with rest of cdoe if error does not happen
  7.  
3. Instead of getting the total number of characters on the sourcepage (which I get by looking at len(page) ), is there any way to get the number of lines?
If you know lines are separated by carriage returns you can do something like:
Expand|Select|Wrap|Line Numbers
  1. lines = page.split('\n')
  2. number_of_lines = len(lines)
  3.  
But then you might want to eliminate blank lines. Or comments.
Aug 9 '07 #3
robin746
5 New Member
I have just published a full Line Of Code Counter that you can adapt to your purpose.
Aug 9 '07 #4
mh121
7 New Member
Thank you very much for your comments. What I meant for the destring function was actually just the str() function you provided. I tried out many of your suggestions today and, after trying out more tomorrow, if I continue to have questions, I will repost.
Aug 10 '07 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

4
4327
by: DvDmanDT | last post by:
Hello, I have an intresting problem: I want to let users upload sourcecode and then compile it using my cygwin gcc... But one thing at a time... I can't get that gcc to execute... shell_exec("gcc temp.c"); # Doesn't work as that will use MinGW shell_exec("D:\\cygwin\\bin\\gcc.exe temp.c"); /* Gives some error about dll stuff (procedure not found I think)*/ So my next though was to open "bash.exe --login -I" and fwrite the commands......
12
3064
by: Mark Buch | last post by:
Hi, is it possible to protect the python sourcecode? I have a nice little script and i dont want to show everbody the source. Im using python on a windows pc. Thank you - Mark
4
9074
by: MJB | last post by:
I never get the above exception in Windows 2k. It only happens in Windows XP, which is the first oddity. My application is multi-threaded and I use the webbrowser control and media player. The exception normally occurs when I open the browser control or media control, but sometimes it just occurs randomly. I was thinking first that it was some sort of build difference with the COM interop components, but I re-referenced and rebuilt...
4
1493
by: Aaron | last post by:
I would like to display my sourcecode(plain text) in a html page. I found this website that does exactly that. http://www.manoli.net/csharpformat/ Does anyone know where I can download the script? or teach me how to write one. Thanks in advance
0
892
by: HarryMangurian | last post by:
I have a memo field in an ACCES database (up to 650000 characters). I have set up an adapter and a dataset for the database. I have a datagrid bound to the dataset. There are 4 fields in the datatable. The "Codetext" field which should be the memo only gets the first 50 characters from the database. Is it the schema generated by Visual Basic ? Here is the schema: <?xml version="1.0" standalone="yes" ?>
7
1522
by: messagedog | last post by:
maybe, we may together study windows sourcecode. and u? if u need,u may download in http://activex.126.com/
4
1813
by: Larry Tate | last post by:
I am wanting to get those cool html error pages that ms produces when I hit an error in asp.net. For instance, when I get a compilation error I get an html error page that shows me the Description: Compiler Error Message: Source Error: Source File: The main thing I want is the Source Error. This give me a few lines of the
4
1187
by: Dave | last post by:
Hi, Is there anyone knows howto obtain HTML sourcecode in a string. In VB6 I used "inet" to do the job but it won't work in VB.net. Thank Dave
0
8815
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8713
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
9080
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9033
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7960
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5961
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4730
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3164
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2113
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.