By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,231 Members | 1,507 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,231 IT Pros & Developers. It's quick & easy.

urlretrieve() questions

P: n/a
I'm building an app that needs to download a file from the
web.

I'm trying to make sure I catch any issues with the download
but I've run into a problem.

here's what I have so far:

try:
urllib.urlretrieve(url,filename)
print "File: ", filename, " downloaded"
except IOError:
print "IOError File Not Found: ", url

Pretty straight forward...but what I'm finding is if the
url is pointing to a file that is not there, the server
returns a file that's a web page displaying a 404 error.

Anyone have any recommendations for handling this?
--

Rene
Dec 23 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
> Pretty straight forward...but what I'm finding is if the
url is pointing to a file that is not there, the server
returns a file that's a web page displaying a 404 error.

Anyone have any recommendations for handling this?
You're right, that is NOT documented in a way that's easy to find!

What I was able to find is how to what you want using urllib2 instead of
urllib. I found an old message thread that touches on the topic:
http://groups.google.com/group/comp....c7bfec87e18ba9
(also accessable as http://tinyurl.com/952dw). Here's a quick summary:
-----------------------------------------------------------------------

Ivan Karajas
Apr 28 2004, 11:03 pm show options
Newsgroups: comp.lang.python
From: Ivan Karajas <my_full_name_concatena...@myrealbox.com> - Find messages by this author
Date: Wed, 28 Apr 2004 23:03:54 -0800
Local: Wed, Apr 28 2004 11:03 pm
Subject: Re: 404 errors
Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse

On Tue, 27 Apr 2004 10:46:47 +0200, Tut wrote: Tue, 27 Apr 2004 11:00:57 +0800, Derek Fountain wrote:
Some servers respond with a nicely formatted bit of HTML explaining the
problem, which is fine for a human, but not for a script. Is there some
flag or something definitive on the response which says "this is a 404
error"?

Maybe catch the urllib2.HTTPError?


This kind of answers the question. urllib will let you read whatever it
receives, regardless of the HTTP status; you need to use urllib2 if you
want to find out the status code when a request results in an error (any
HTTP status beginning with a 4 or 5). This can be done like so:

import urllib2
try:
asock = urllib2.urlopen("http://www.foo.com/qwerty.html")
except urllib2.HTTPError, e:
print e.code

The value in urllib2.HTTPError.code comes from the first line of the web
server's HTTP response, just before the headers begin, e.g. "HTTP/1.1 200
OK", or "HTTP/1.1 404 Not Found".

One thing you need to be aware of is that some web sites don't behave as
you would expect them to; e.g. responding with a redirection rather than a
404 error when you when you request a page that doesn't exist. In these
cases you might still have to rely on some clever scripting.
----------------------------------------------------------------------

I hope that helps.

Dan
Dec 23 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.