By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,923 Members | 1,443 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,923 IT Pros & Developers. It's quick & easy.

urlretrieve get file name

P: n/a
Hi guys and gals,

I'm wrestling with the urlretrieve function in the urllib module. I
want to download a file from a web server and save it locally with the
same name. The problem is the URL - it's on the form
http://www.page.com/?download=12345. It doesn't reveal the file name.
Some hints to point me in the right direction are greatly appreciated.

Sven

Nov 9 '06 #1
Share this Question
Share on Google+
6 Replies


P: n/a
At Thursday 9/11/2006 19:11, Sven wrote:
>I'm wrestling with the urlretrieve function in the urllib module. I
want to download a file from a web server and save it locally with the
same name. The problem is the URL - it's on the form
http://www.page.com/?download=12345. It doesn't reveal the file name.
Some hints to point me in the right direction are greatly appreciated.
The file name *may* come in the Content-Disposition header (ex:
Content-Disposition: attachment; filename="budget.xls")
Use urlopen to obtain a file-like object; its info() method gives you
those headers.
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! - http://correo.yahoo.com.ar
Nov 9 '06 #2

P: n/a
Hello Gabriel,

Thanks for your help, but I'm a guy with no luck. :-) I can't get the
file name from response header...

On Nov 10, 12:39 am, Gabriel Genellina <gagsl...@yahoo.com.arwrote:
At Thursday 9/11/2006 19:11, Sven wrote:
I'm wrestling with the urlretrieve function in the urllib module. I
want to download a file from a web server and save it locally with the
same name. The problem is the URL - it's on the form
http://www.page.com/?download=12345. It doesn't reveal the file name.
Some hints to point me in the right direction are greatly appreciated.The file name *may* come in the Content-Disposition header (ex:
Content-Disposition: attachment; filename="budget.xls")
Use urlopen to obtain a file-like object; its info() method gives you
those headers.

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! -http://correo.yahoo.com.ar
Nov 9 '06 #3

P: n/a
At Thursday 9/11/2006 20:52, Sven wrote:
>Thanks for your help, but I'm a guy with no luck. :-) I can't get the
file name from response header...
Try using a browser and "Save as..."; if it suggests a file name, it
*must* be in the headers - so look again carefully.
If it does not suggests a filen ame, the server is not providing one
(there is no obligation to do so).
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! - http://correo.yahoo.com.ar
Nov 10 '06 #4

P: n/a
Yes the browser suggests a file name, but I did a little research using
http://web-sniffer.net/. The Response Header contains roughly this:

HTTP Status Code: HTTP/1.1 302 Found
Location: http://page.com/filename.zip
Content-Length: 0
Connection: close
Content-Type: text/html

The status code 302 tells the browser where to find the file. The funny
thing is that calling the info() function, on the file-like response
object, in Python doesn't return the same header. I'm so stuck. :-)
Thanks for your help.

On 10 Nov, 01:27, Gabriel Genellina <gagsl...@yahoo.com.arwrote:
At Thursday 9/11/2006 20:52, Sven wrote:
Thanks for your help, but I'm a guy with no luck. :-) I can't get the
file name from response header...Try using a browser and "Save as..."; if it suggests a file name, it
*must* be in the headers - so look again carefully.
If it does not suggests a filen ame, the server is not providing one
(there is no obligation to do so).

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! -http://correo.yahoo.com.ar
Nov 10 '06 #5

P: n/a
At Friday 10/11/2006 16:58, Sven wrote:
>Yes the browser suggests a file name, but I did a little research using
http://web-sniffer.net/. The Response Header contains roughly this:

HTTP Status Code: HTTP/1.1 302 Found
Location: http://page.com/filename.zip
Content-Length: 0
Connection: close
Content-Type: text/html

The status code 302 tells the browser where to find the file. The funny
thing is that calling the info() function, on the file-like response
object, in Python doesn't return the same header. I'm so stuck. :-)
Thanks for your help.
Because urlopen is smart enough to detect the redirection and do a
second request.
You can use the geturl() method to obtain the true URL used (that
would be http://page.com/filename.zip) and then rename the file.
Or, you can install your own URLOpener (I think a FancyURLOpener with
retries=0 would be OK) and process the Location header yourself. See
the urllib documentation.
--
Gabriel Genellina
Softlab SRL

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ˇgratis!
ˇAbrí tu cuenta ya! - http://correo.yahoo.com.ar
Nov 10 '06 #6

P: n/a
You can use the geturl() method to obtain the true URL used (that
would behttp://page.com/filename.zip) and then rename the file.
Thanks mate, this was exactly what I needed. A realy clean and simple
solution to my problem. :-)

Nov 11 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.