473,385 Members | 1,958 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

urllib download insanity

ok what i am seeing is impossible.
i DELETED the file from my webserver, uploaded the new one. when my app
logs in it checks the file, if it's changed it downloads it. the
impossible part, is that on my pc is downloading the OLD file i've
deleted! if i download it via IE, i get the new file. SO, my only
conculsion is that urllib is caching it some where. BUT i'm already
calling urlcleanup(), so what else can i do?
here is the code

LastModified = urllib2.urlopen('http://x.x.x.x/library.zip')
LastModified = LastModified.headers['Content-Length']

LocalFile = os.stat('library.zip')
LocalFile = int(LocalFile.st_size)
if LocalFile != int(LastModified):
urllib.urlcleanup()

urllib.urlretrieve('http://x.x.x.x/library.zip','library.zip')

as a test i got someone in the office to login and try it - worked
properly for them. i'm on a different ISP to them however, so my other
idea is that possibly my isp has a transparent proxy setup that urllib
is using, but IE isn't???
Jul 19 '05 #1
1 1942
Timothy Smith wrote:
ok what i am seeing is impossible.
i DELETED the file from my webserver, uploaded the new one. when my app
logs in it checks the file, if it's changed it downloads it. the
impossible part, is that on my pc is downloading the OLD file i've
deleted! if i download it via IE, i get the new file. SO, my only
conculsion is that urllib is caching it some where. BUT i'm already
calling urlcleanup(), so what else can i do?


Here are some ideas to use in your hunt.

- If you are getting a cached local file then the returned object
will have a "name" attribute.

result = urllib.retrieve(".....")
print result.fp.name

As far as I can tell, this will only occur if you use
a tempcache or a file URL.
- You can force some debugging of the open calls, to see if
your program is dealing with a local file.
old_open = open
def my_open(*args): .... print "opening", args
.... return old_open(*args)
.... open("/etc/passwd") <open file '/etc/passwd', mode 'r' at 0x60da0> import __builtin__
__builtin__.open = my_open
open("/etc/passwd") opening ('/etc/passwd',)
<open file '/etc/passwd', mode 'r' at 0x60c20>


You'll may also need to change os.fdopen because that's used
by retrieve if it needs a tempfile.

If you want to see where the open is being called from,
use one of the functions in the traceback module to print
the stack trace.

- for surety's sake, also do

import webbrowser
webbrowser.open(url)

just before you do

urllib.retrieve(url, filename)

This will double check that your program is using the URL you
expect it to use.

- beyond that, check that you've got network activity,

You could check the router lights, or use a web sniffer like
ethereal, or set up a debugging proxy

- check the headers. If your ISP is using a cache then
it might insert a header into what it returns. But if
it was caching then your IE view should have seen the cached
version as well.

Andrew
da***@dalkescientific.com

Jul 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Sam Sungshik Kong | last post by:
Hello! I'm trying to download PDF files from web to my computer using urllib. Some pdf files are fine but other files are downloaded only 6kB which is wrong. It didn't show any error message. ...
3
by: Haim Ashkenazi | last post by:
Hi I'm writing a script that uses urllib on win98. until now I used python 2.3.x (x < 4) and it worked ok. I re-installed windows and installed python 2.3.4 and now I get an error when trying to...
1
by: Matthijs | last post by:
I have been trying to make a script that will download several rss feeds to my computer. The only problem I have is that I have to go through a proxy. First I tried using urllib (python 2.4,...
1
by: Oyvind Ostlund | last post by:
Hello, I have to download a lot of files. And at the moment i am trying with URLLib. But sometimes it doesn't download the whole file. It looks like it stops half way through or something. Is it...
8
by: Gabriel Zachmann | last post by:
Here is a very simple Python script utilizing urllib: import urllib url = "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological" print url print file = urllib.urlopen(...
5
by: supercooper | last post by:
I am downloading images using the script below. Sometimes it will go for 10 mins, sometimes 2 hours before timing out with the following error: Traceback (most recent call last): File...
5
by: chrispoliquin | last post by:
Hi, I have a small Python script to fetch some pages from the internet. There are a lot of pages and I am looping through them and then downloading the page using urlretrieve() in the urllib...
4
by: Fernando Rodriguez | last post by:
Hi, I'musing urllib to download pages from a site. How can I detect if a given url is being redirected somewhere else? I want to avoid this, is it possible? Thanks in advance!
4
by: tstrogen | last post by:
I am using Python 2.6 on Mac OS 10.3.9. I have been trying to use: image = urllib.URLopener() image.retrieve(url, filename) to download images from websites. I am able to do so, and end up with...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.