468,771 Members | 1,637 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,771 developers. It's quick & easy.

Determine Whether File Exists On HTTP Server

Hi, I'm trying to determine whether a given URL exists. I'm new to Python
but I think that urllib is the tool for the job. However, if I give it a
non-existent file, it simply returns the 404 page. Aside from grepping this
for '404', is there a better way to do this? (Preferrably, there is a
solution that can be applied to both HTTP and FTP.) Thanks in advance.
Jul 18 '05 #1
2 3993
On Saturday 22 May 2004 12:28 am, OvErboRed wrote:
Hi, I'm trying to determine whether a given URL exists. I'm new to Python
but I think that urllib is the tool for the job. However, if I give it a
non-existent file, it simply returns the 404 page. Aside from grepping this
for '404', is there a better way to do this? (Preferrably, there is a
solution that can be applied to both HTTP and FTP.) Thanks in advance.


Try urllib2.urlopen, and put a try/except block around it. Here's what an
unhandled exception from a 404 response looks like:

Python 2.3.3 (#1, May 14 2004, 09:49:22)
[GCC 3.3.2 20031218 (Gentoo Linux 3.3.2-r5, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
import urllib2
handle = urllib2.urlopen('http://google.com/this_page_doesnt_exist')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.3/urllib2.py", line 129, in urlopen
return _opener.open(url, data)
File "/usr/lib/python2.3/urllib2.py", line 326, in open
'_open', req)
File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
result = func(*args)
File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
return self.do_open(httplib.HTTP, req)
File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
return self.parent.error('http', req, fp, code, msg, hdrs)
File "/usr/lib/python2.3/urllib2.py", line 346, in error
result = self._call_chain(*args)
File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
result = func(*args)
File "/usr/lib/python2.3/urllib2.py", line 472, in http_error_302
return self.parent.open(new)
File "/usr/lib/python2.3/urllib2.py", line 326, in open
'_open', req)
File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
result = func(*args)
File "/usr/lib/python2.3/urllib2.py", line 901, in http_open
return self.do_open(httplib.HTTP, req)
File "/usr/lib/python2.3/urllib2.py", line 895, in do_open
return self.parent.error('http', req, fp, code, msg, hdrs)
File "/usr/lib/python2.3/urllib2.py", line 352, in error
return self._call_chain(*args)
File "/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
result = func(*args)
File "/usr/lib/python2.3/urllib2.py", line 412, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

--
Troy Melhase, tr**@gci.net
--
When Christ calls a man, he bids him come and die. - Dietrich Bonhoeffer
Jul 18 '05 #2
This works with HTTP:

import sys # exc_info
import httplib # HTTPConnection

HOST = "www.python.org"
PAGE = "/path/to/some/file.html"

try:
c = httplib.HTTPConnection( HOST )
# c._http_vsn = 10; c._http_vsn_str = "HTTP/1.0"
c.connect( )
c.putrequest ( "GET", PAGE )
c.endheaders()
r = c.getresponse()
print "%s\n%s\n%s\n" % (r.status, r.reason, r.msg)
if r.status == 200: # OK
print "%s exists" % PAGE
PageContent = r.read() # this is the requested html file in a
string
elif r.status == 404: # not found
print "%s does not exist" % PAGE
Page404 = r.read() # this is the 404 page in a string
else:
print "%s : status %s %s %s" % (PAGE, r.status, r.reason, r.msg)
except:
print sys.exc_info()[1]

Greetings
Harald Walter

"OvErboRed" <pu******@SPAMoverbored.net> wrote in message
news:Xn*****************************@127.0.0.1...
Hi, I'm trying to determine whether a given URL exists. I'm new to Python
but I think that urllib is the tool for the job. However, if I give it a
non-existent file, it simply returns the 404 page. Aside from grepping this for '404', is there a better way to do this? (Preferrably, there is a
solution that can be applied to both HTTP and FTP.) Thanks in advance.

Jul 18 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Wayne Wengert | last post: by
7 posts views Thread by Stephen E. Weber | last post: by
6 posts views Thread by Rick Brandt | last post: by
25 posts views Thread by _DD | last post: by
28 posts views Thread by Tim Daneliuk | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.