471,055 Members | 1,673 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,055 software developers and data experts.

Getting HTTP responses - a python linkchecking script.

Hi Folks,

I'm thinking about writing a script that can be run over a whole site
and produce a report about broken links etc...

I've been playing with the urllib2 and httplib modules as a starting
point and have found that with urllib2 it doesn't seem possible to get
HTTP status codes.

I've had more success with httplib...
Firstly I create a new HTTPConnection object with a given hostname and
port then I try connecting to the host and catch any socket errors
which I can assume mean the server is either down or doesn't exist at
this place anymore.
If the connection was successful I try requesting the resource in
question, I then get the response and check the status code.

So, I've got the tools I need to do the job sufficiently. Just
wondering whether anybody can recommend any alternatives.

Cheers,
-Blair

May 8 '06 #1
5 1389
bl*************@gmail.com:
with urllib2 it doesn't seem possible to get HTTP status codes.


except urllib2.HTTPError, e:
if e.code == 403:

--
René Pijlman
May 8 '06 #2
Rene Pijlman wrote:
bl*************@gmail.com:
with urllib2 it doesn't seem possible to get HTTP status codes.


except urllib2.HTTPError, e:
if e.code == 403:


Thanks. Is there documentation for this available somewhere online, I
can't see it to obviously in the library reference?

Cheers,
-Blair

May 8 '06 #3
bl*************@gmail.com:
Rene Pijlman wrote:
bl*************@gmail.com:
>with urllib2 it doesn't seem possible to get HTTP status codes.


except urllib2.HTTPError, e:
if e.code == 403:


Thanks. Is there documentation for this available somewhere online, I
can't see it to obviously in the library reference?


No, this seems to be missing from the documentation.

--
René Pijlman
May 8 '06 #4
bl*************@gmail.com wrote:
Rene Pijlman wrote:
bl*************@gmail.com:
with urllib2 it doesn't seem possible to get HTTP status codes.

except urllib2.HTTPError, e:
if e.code == 403:


Thanks. Is there documentation for this available somewhere online, I
can't see it to obviously in the library reference?


You can help by mentioning where you'd most expect to find it in a
Python documentation bug (or enhancement) report. Then you to can be a
Python contributor.

--Scott David Daniels
sc***********@acm.org
May 8 '06 #5
bl*************@gmail.com a écrit :
Hi Folks,

I'm thinking about writing a script that can be run over a whole site
and produce a report about broken links etc...

I've been playing with the urllib2 and httplib modules as a starting
point and have found that with urllib2 it doesn't seem possible to get
HTTP status codes.

I've had more success with httplib...
Firstly I create a new HTTPConnection object with a given hostname and
port then I try connecting to the host and catch any socket errors
which I can assume mean the server is either down or doesn't exist at
this place anymore.
If the connection was successful I try requesting the resource in
question, I then get the response and check the status code.

So, I've got the tools I need to do the job sufficiently. Just
wondering whether anybody can recommend any alternatives.

Cheers,
-Blair

have a look at

urllib2 - The Missing Manual

http://www.voidspace.org.uk/python/a.../urllib2.shtml
May 8 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Stuart D. Gathman | last post: by
8 posts views Thread by Chris Gray | last post: by
3 posts views Thread by John Draper | last post: by
reply views Thread by Yansky | last post: by
3 posts views Thread by Thomas Morton | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.