473,324 Members | 2,548 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

urllib.urlretireve problem

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Everybody,

I've got a small problem with urlretrieve.
Even passing a bad url to urlretrieve doesn't raise an exception. Or does
it?

If Yes, What exception is it ? And how do I use it in my program ? I've
searched a lot but haven't found anything helping.

Example:
try:

urllib.urlretrieve("http://security.debian.org/pool/updates/main/p/perl/libparl5.6_5.6.1-8.9_i386.deb")
except IOError, X:
DoSomething(X)
except OSError, X:
DoSomething(X)

urllib.urlretrieve doesn't raise an exception even though there is no
package named libparl5.6

Please Help!

rrs
- --
Ritesh Raj Sarraf
RESEARCHUT -- http://www.researchut.com
Gnupg Key ID: 04F130BC
"Stealing logic from one person is plagiarism, stealing from many is
research".
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCRcCk4Rhi6gTxMLwRAlb2AJ0fB3V5ZpwdAiCxfl/rGBWU92YBEACdFYIJ
8bGZMJ5nuKAqvjO0KEAylUg=
=eaHC
-----END PGP SIGNATURE-----

Jul 18 '05 #1
8 8958
I noticed you hadn't gotten a reply. When I execute this it put's the following
in the retrieved file:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
The requested URL /pool/updates/main/p/perl/libparl5.6_5.6.1-8.9_i386.deb was no
t found on this server.<P>
</BODY></HTML>

You will probably need to use something else to first determine if the URL
actually exists.

Larry Bates
Ritesh Raj Sarraf wrote:
Hello Everybody,

I've got a small problem with urlretrieve.
Even passing a bad url to urlretrieve doesn't raise an exception. Or does
it?

If Yes, What exception is it ? And how do I use it in my program ? I've
searched a lot but haven't found anything helping.

Example:
try:

urllib.urlretrieve("http://security.debian.org/pool/updates/main/p/perl/libparl5.6_5.6.1-8.9_i386.deb")
except IOError, X:
DoSomething(X)
except OSError, X:
DoSomething(X)

urllib.urlretrieve doesn't raise an exception even though there is no
package named libparl5.6

Please Help!

rrs

Jul 18 '05 #2
Mertz' "Text Processing in Python" book had a good discussion about
trapping 403 and 404's.

http://gnosis.cx/TPiP/

Larry Bates wrote:
I noticed you hadn't gotten a reply. When I execute this it put's the following in the retrieved file:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
The requested URL /pool/updates/main/p/perl/libparl5.6_5.6.1-8.9_i386.deb was no t found on this server.<P>
</BODY></HTML>

You will probably need to use something else to first determine if the URL actually exists.

Larry Bates
Ritesh Raj Sarraf wrote:
Hello Everybody,

I've got a small problem with urlretrieve.
Even passing a bad url to urlretrieve doesn't raise an exception. Or does it?

If Yes, What exception is it ? And how do I use it in my program ? I've searched a lot but haven't found anything helping.

Example:
try:

urllib.urlretrieve("http://security.debian.org/pool/updates/main/p/perl/libparl5.6_5.6.1-8.9_i386.deb") except IOError, X:
DoSomething(X)
except OSError, X:
DoSomething(X)

urllib.urlretrieve doesn't raise an exception even though there is no package named libparl5.6

Please Help!

rrs


Jul 18 '05 #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Larry Bates wrote:
I noticed you hadn't gotten a reply.**When*I*execute*this*it*put's*the
following in the retrieved file:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
The requested URL /pool/updates/main/p/perl/libparl5.6_5.6.1-8.9_i386.deb
was no t found on this server.<P>
</BODY></HTML>

You will probably need to use something else to first determine if the URL
actually exists.


I'm happy that at least someone responded as this was my first post to the
python mailing list.

I'm coding a program for offline package management.
The link that I provided could be obsolete by newer packages. That is where
my problem is. I wanted to know how to raise an exception here so that
depending on the type of exception I could make my program function.

For example, for Temporary Name Resolution Failure, python raises an
exception which I've handled well. The problem lies with obsolete urls
where no exception is raised and I end up having a 404 error page as my
data.

Can we have an exception for that ? Or can we have the exit status of
urllib.urlretrieve to know if it downloaded the desired file.
I think my problem is fixable in urllib.urlopen, I just find
urllib.urlretrieve more convenient and want to know if it can be done with
it.

Thanks for responding.

rrs
- --
Ritesh Raj Sarraf
RESEARCHUT -- http://www.researchut.com
Gnupg Key ID: 04F130BC
"Stealing logic from one person is plagiarism, stealing from many is
research".
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCSuYS4Rhi6gTxMLwRAu0FAJ9R0s4TyB7zHcvDFTflOp 2joVkErQCfU4vG
8U0Ah5WTdTQHKRkmPsZsHdE=
=OMub
-----END PGP SIGNATURE-----

Jul 18 '05 #4
> I'm coding a program for offline package management.
The link that I provided could be obsolete by newer packages. That is
where my problem is. I wanted to know how to raise an exception here so
that depending on the type of exception I could make my program function.

For example, for Temporary Name Resolution Failure, python raises an
exception which I've handled well. The problem lies with obsolete urls
where no exception is raised and I end up having a 404 error page as my
data.

Can we have an exception for that ? Or can we have the exit status of
urllib.urlretrieve to know if it downloaded the desired file.
I think my problem is fixable in urllib.urlopen, I just find
urllib.urlretrieve more convenient and want to know if it can be done with
it.


It makes no sense having urllib generating exceptions for such a case. From
its point of view, things work pefectly - it got a result. No network error
or whatsoever.

Its your application that is not happy with the result - but it has to
figure that out by itself.

You could for instance try and see what kind of result you got using the
unix file command - it will tell you that you received a html file, not a
deb.

Or check the mimetype returned - its text/html in the error case of yours,
and most probably something like application/octet-stream otherwise.

Regards,

Diez

Jul 18 '05 #5
For example, for Temporary Name Resolution Failure, python raises an
exception which I've handled well. The problem lies with obsolete
urls where no exception is raised and I end up having a 404 error
page as my data.


Diez> It makes no sense having urllib generating exceptions for such a
Diez> case. From its point of view, things work pefectly - it got a
Diez> result. No network error or whatsoever.

You can subclass FancyURLOpener and define a method to handle 404s, 403s,
401s, etc. There should be no need to resort to grubbing around with file
extensions and such.

Skip

Jul 18 '05 #6
..from urllib2 import urlopen
.. try:
.. urlopen(someURL)
.. except IOError, errobj:
.. if hasattr(errobj, 'reason'): print 'server doesnt exist, is
down, DNS prob, or we don't have internet connect'
.. if hasattr(errobj, 'code'): print errobj.code

Jul 18 '05 #7

Diez B. Roggisch wrote:
It makes no sense having urllib generating exceptions for such a case. From its point of view, things work pefectly - it got a result. No network error or whatsoever.

Its your application that is not happy with the result - but it has to figure that out by itself.

You could for instance try and see what kind of result you got using the unix file command - it will tell you that you received a html file, not a deb.

Or check the mimetype returned - its text/html in the error case of yours, and most probably something like application/octet-stream otherwise.

Regards,

Diez


Also be aware that many webservers (especially IIS ones) are configured
to return some kind of custom page instead of a stock 404, and you
might be getting a 200 status code even though the page you requested
is not there. So depending on what site you are scraping, you might
have to read the page you got back to figure out if it's what you
wanted.

-- Wade Leftwich
Ithaca, NY

Jul 18 '05 #8
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Diez B. Roggisch wrote:
You could for instance try and see what kind of result you got using the
unix file command - it will tell you that you received a html file, not a
deb.

Or check the mimetype returned - its text/html in the error case of yours,
and most probably something like application/octet-stream otherwise.


Using the unix file command is not possible at all. The whole goal of the
program is to help people get their packages downloaded from some other
(high speed) machine which could be running Windows/Mac OSX/Linux et
cetera. That is why I'm sticking strictly to python libraries.

The second suggestion sounds good. I'll look into that.

Thanks,

rrs
- --
Ritesh Raj Sarraf
RESEARCHUT -- http://www.researchut.com
Gnupg Key ID: 04F130BC
"Stealing logic from one person is plagiarism, stealing from many is
research".
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCTDhV4Rhi6gTxMLwRAi2BAJ4zp7IsQNMZ1zqpF/hGUAjUyYwKigCeKaqO
FbGuuFOIHawZ8y/ICf87wOI=
=btA5
-----END PGP SIGNATURE-----

Jul 18 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Haim Ashkenazi | last post by:
Hi I'm writing a script that uses urllib on win98. until now I used python 2.3.x (x < 4) and it worked ok. I re-installed windows and installed python 2.3.4 and now I get an error when trying to...
7
by: Stuart McGraw | last post by:
I just spent a $*#@!*&^&% hour registering at ^$#@#%^ Sourceforce and trying to submit a Python bug report but it still won't let me. I give up. Maybe someone who cares will see this post, or...
11
by: Pater Maximus | last post by:
I am trying to implement the recipe listed at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/211886 However, I can not get to first base. When I try to run import urllib...
0
by: Shane Hathaway | last post by:
I started experimenting with SOAPpy yesterday and immediately hit a snag. Both web services I tried simply hung and never replied. After a lot of digging, I found out what was going wrong:...
1
by: Timothy Wu | last post by:
Hi, I'm trying to fill the form on page http://www.cbs.dtu.dk/services/TMHMM/ using urllib. There are two peculiarities. First of all, I am filling in incorrect key/value pairs in the...
4
by: william | last post by:
I've got a strange problem on windows (not very familiar with that OS). I can ping a host, but cannot get it via urllib (see here under). I can even telnet the host on port 80. Thus network...
1
by: evanpmeth | last post by:
I have tried multiple ways of posting information to a website and have failed. I have seen this problem on other forums can someone explain or point me to information on how POST works through...
1
by: John Nagle | last post by:
If you try to open a password protected page with "urllib.urlopen()", you get "Enter username for EnterPassword at example.com:" on standard output, followed by a read for input! This seems to...
5
by: chrispoliquin | last post by:
Hi, I have a small Python script to fetch some pages from the internet. There are a lot of pages and I am looping through them and then downloading the page using urlretrieve() in the urllib...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.