By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,289 Members | 1,447 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,289 IT Pros & Developers. It's quick & easy.

urllib download insanity

P: n/a
ok what i am seeing is impossible.
i DELETED the file from my webserver, uploaded the new one. when my app
logs in it checks the file, if it's changed it downloads it. the
impossible part, is that on my pc is downloading the OLD file i've
deleted! if i download it via IE, i get the new file. SO, my only
conculsion is that urllib is caching it some where. BUT i'm already
calling urlcleanup(), so what else can i do?
here is the code

LastModified = urllib2.urlopen('http://x.x.x.x/library.zip')
LastModified = LastModified.headers['Content-Length']

LocalFile = os.stat('library.zip')
LocalFile = int(LocalFile.st_size)
if LocalFile != int(LastModified):
urllib.urlcleanup()

urllib.urlretrieve('http://x.x.x.x/library.zip','library.zip')

as a test i got someone in the office to login and try it - worked
properly for them. i'm on a different ISP to them however, so my other
idea is that possibly my isp has a transparent proxy setup that urllib
is using, but IE isn't???
Jul 19 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Timothy Smith wrote:
ok what i am seeing is impossible.
i DELETED the file from my webserver, uploaded the new one. when my app
logs in it checks the file, if it's changed it downloads it. the
impossible part, is that on my pc is downloading the OLD file i've
deleted! if i download it via IE, i get the new file. SO, my only
conculsion is that urllib is caching it some where. BUT i'm already
calling urlcleanup(), so what else can i do?


Here are some ideas to use in your hunt.

- If you are getting a cached local file then the returned object
will have a "name" attribute.

result = urllib.retrieve(".....")
print result.fp.name

As far as I can tell, this will only occur if you use
a tempcache or a file URL.
- You can force some debugging of the open calls, to see if
your program is dealing with a local file.
old_open = open
def my_open(*args): .... print "opening", args
.... return old_open(*args)
.... open("/etc/passwd") <open file '/etc/passwd', mode 'r' at 0x60da0> import __builtin__
__builtin__.open = my_open
open("/etc/passwd") opening ('/etc/passwd',)
<open file '/etc/passwd', mode 'r' at 0x60c20>


You'll may also need to change os.fdopen because that's used
by retrieve if it needs a tempfile.

If you want to see where the open is being called from,
use one of the functions in the traceback module to print
the stack trace.

- for surety's sake, also do

import webbrowser
webbrowser.open(url)

just before you do

urllib.retrieve(url, filename)

This will double check that your program is using the URL you
expect it to use.

- beyond that, check that you've got network activity,

You could check the router lights, or use a web sniffer like
ethereal, or set up a debugging proxy

- check the headers. If your ISP is using a cache then
it might insert a header into what it returns. But if
it was caching then your IE view should have seen the cached
version as well.

Andrew
da***@dalkescientific.com

Jul 19 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.