473,788 Members | 2,735 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Link Checking Issues - Sub domains

Hi,

I have written this script to run as a cron that will loop through a
text file with a list of urls. It works fine for most of the links,
however there are a number of urls which are subdomains (they are
government sites) such as http://basename.airforce.mil, these links
are always throwing 400 errors even though the site exists.

Is there a way to get around this?

Here is the script:

import httplib
from urlparse import urlparse

class LinkChecker:

def oldStuff():
p = urlparse(url)
h = HTTP(p[1])
h.putrequest('H EAD', p[2])
h.endheaders()
if h.getreply()[0] == 200: return 1
else: return 0
def check(self):
print "\nLooping through the file, line by line."

# define default values for the paremeters
text_file = open("/home/jjaffe/pythonModules/JAMRSscripts/urls.txt",
"r")
output = ""
errors = "============== ===== ERRORS (website exists but 404, 503
etc ): =============== ====\n"
failures= "\n============ ======= FAILURES (cannot connect to website
at all): =============== ====\n"
eCount = 0
fCount = 0

#loop through each line and see what the response code is
for line in text_file:
p = urlparse(line)
try:
conn = httplib.HTTPCon nection(p[1])
conn.request("G ET", p[2])
r1 = conn.getrespons e()
if r1.status != 200: #if the response code was not success (200)
then report the error
errors += "\n "+str(r1.status )+" error for: "+p[1]+p[2]
eCount = (eCount + 1)
data1 = r1.read()
conn.close()
except: #the connection attempt timed out - hence the website
doesn't even exist
failures +="\n Could not create connection object: "+p[1]+p[2]
fCount = (fCount + 1)
text_file.close ()

#see if there were errors and create output string
if (eCount == 0) and (fCount == 0):
output = "No errors or failures to report"
else:
output = errors+"\n\n"+f ailures

print output

if __name__ == '__main__':
lc = LinkChecker()
lc.check()
del lc
Thanks in advance.
Aug 5 '08 #1
1 1056


rpupkin77 wrote:
Hi,

I have written this script to run as a cron that will loop through a
text file with a list of urls. It works fine for most of the links,
however there are a number of urls which are subdomains (they are
government sites) such as http://basename.airforce.mil, these links
are always throwing 400 errors even though the site exists.
Have you looked at urllib/urllib2 (urllib.request in 3.0)
for checking links?
If 'http://basename.airfor ce.mil' works typed into your browser,
this from the doc for urllib.request. Request might be relevant:

"headers should be a dictionary, and will be treated as if add_header()
was called with each key and value as arguments. This is often used to
“spoof” the User-Agent header, which is used by a browser to identify
itself – some HTTP servers only allow requests coming from common
browsers as opposed to scripts. For example, Mozilla Firefox may
identify itself as "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127
Firefox/2.0.0.11", while urllib‘s default user agent string is
"Python-urllib/2.6" (on Python 2.6)."
Aug 5 '08 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
1530
by: Razzbar | last post by:
Is there any to tell via javascript the "state" of a link? I mean, I'd like to be able to tell if a link has been followed or not. (hehe, I can hear shrill crys of "privacy!" in the distance...)
5
3306
by: Scott Tilton | last post by:
I am having a terrible time getting this to work. I am hoping someone out there can help me with very specific code examples. I am trying to get the linked tables in my Access 97 database to be updated whenever the database opens. I need to have them updated based on an INI file that resides in the same directory as the current database. I do not need or want the ability to open a dialog box to pick the location. I simply want it to...
13
2704
by: Derek | last post by:
As I understand it there is a good amount of link compatibility among C compilers. For example, I can compile main.c with GCC and func.c with Sun One and link the objects using either linker (GNU or Sun). What I'm curious about is why this compatibility exists in the absence of a standard C ABI? What encourages C compiler vendors to agree on implementation issues such as alignment, packing, etc., such that their object
14
2839
by: Steve McLellan | last post by:
Hi, Sorry to repost, but this is becoming aggravating, and causing me a lot of wasted time. I've got a reasonably large mixed C++ project, and after a number of builds (but not a constant number) linking (and sometimes compiling) becomes immensely slow, and task manager shows that link.exe (or cl.exe) is barely using any processor time, but an awful lot of RAM (around 150-200MB). I'm going to keep an eye on page faults since I can't...
6
4870
by: Ludvig | last post by:
I have various domains using the same application/assembly They differ in contents and design, based on a "site id", and get its information from an SQL server. Now I have to deploy the different "sites" to individual folders on the server, and set up the domains on IIS to point to these folders. If I do changes in the application, I have to deploy to all the different folders on the server.
26
2820
by: libsfan01 | last post by:
Hi all! Can anyone show me how to check and email field on a form for the existence of these two characters. Kind regards Marc
8
2444
by: Steve | last post by:
Hi; I had a big link checking job to do and it has been years since I have done anything like that so I found a test page to use that I knew had bad links on it( a friends site ) and I decided to test the various free services out. I tried about 5 different link checkers on the test page I had , including Xenu and NetMechanic. I got 5 sets of identical results.
8
2490
by: Bern McCarty | last post by:
We have a large mixed dll that I can never seem to get to link incrementally. Below is the console output. For simplicity I've eliminated some stuff that we normally do when we really link this dll like manifest embedding and strong name delay signing. Can anyone see anything wrong with my link command? Or offer some other explanation why I can never get an incremental link out of it? To test, I'm just touching one of the source files so...
2
2708
by: Visine_Eyes | last post by:
What .NET mechanism should I use to resolve permission issues before I begin to copy directory structure/file(s) from one computer to another. I have tried Try/Catch statement using DirectoryInfo and FileInfo but process halts on error. The computers have Vista or XP Pro installed. Each computer is attached to a Win 2003 or Win 2008 server.
0
9656
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10373
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10177
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10118
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9969
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6750
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5403
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5538
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3677
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.