urllib download insanity

Timothy Smith

ok what i am seeing is impossible.
i DELETED the file from my webserver, uploaded the new one. when my app
logs in it checks the file, if it's changed it downloads it. the
impossible part, is that on my pc is downloading the OLD file i've
deleted! if i download it via IE, i get the new file. SO, my only
conculsion is that urllib is caching it some where. BUT i'm already
calling urlcleanup(), so what else can i do?
here is the code

LastModified = urllib2.urlopen('http://x.x.x.x/library.zip')
LastModified = LastModified.headers['Content-Length']

LocalFile = os.stat('library.zip')
LocalFile = int(LocalFile.st_size)
if LocalFile != int(LastModified):
urllib.urlcleanup()

urllib.urlretrieve('http://x.x.x.x/library.zip','library.zip')

as a test i got someone in the office to login and try it - worked
properly for them. i'm on a different ISP to them however, so my other
idea is that possibly my isp has a transparent proxy setup that urllib
is using, but IE isn't???

Jul 19 '05 #1

Subscribe Post Reply

1942

Andrew Dalke

Timothy Smith wrote:

ok what i am seeing is impossible.
i DELETED the file from my webserver, uploaded the new one. when my app
logs in it checks the file, if it's changed it downloads it. the
impossible part, is that on my pc is downloading the OLD file i've
deleted! if i download it via IE, i get the new file. SO, my only
conculsion is that urllib is caching it some where. BUT i'm already
calling urlcleanup(), so what else can i do?

Here are some ideas to use in your hunt.

- If you are getting a cached local file then the returned object
will have a "name" attribute.

result = urllib.retrieve(".....")
print result.fp.name

As far as I can tell, this will only occur if you use
a tempcache or a file URL.
- You can force some debugging of the open calls, to see if
your program is dealing with a local file.

old_open = open
def my_open(*args): .... print "opening", args
.... return old_open(*args)
.... open("/etc/passwd") <open file '/etc/passwd', mode 'r' at 0x60da0> import __builtin__
__builtin__.open = my_open
open("/etc/passwd") opening ('/etc/passwd',)
<open file '/etc/passwd', mode 'r' at 0x60c20>

You'll may also need to change os.fdopen because that's used
by retrieve if it needs a tempfile.

If you want to see where the open is being called from,
use one of the functions in the traceback module to print
the stack trace.

- for surety's sake, also do

import webbrowser
webbrowser.open(url)

just before you do

urllib.retrieve(url, filename)

This will double check that your program is using the URL you
expect it to use.

- beyond that, check that you've got network activity,

You could check the router lights, or use a web sniffer like
ethereal, or set up a debugging proxy

- check the headers. If your ISP is using a cache then
it might insert a header into what it returns. But if
it was caching then your IE view should have seen the cached
version as well.

Andrew
da***@dalkescientific.com

Jul 19 '05 #2

Similar topics

urllib.urlretrieve error

by: Sam Sungshik Kong | last post by:

Hello! I'm trying to download PDF files from web to my computer using urllib. Some pdf files are fine but other files are downloaded only 6kB which is wrong. It didn't show any error message. ...

Python

a bug in urllib?

by: Haim Ashkenazi | last post by:

Hi I'm writing a script that uses urllib on win98. until now I used python 2.3.x (x < 4) and it worked ok. I re-installed windows and installed python 2.3.4 and now I get an error when trying to...

Python

Using a python web client behind a proxy (urllib and twisted.web)

by: Matthijs | last post by:

I have been trying to make a script that will download several rss feeds to my computer. The only problem I have is that I have to go through a proxy. First I tried using urllib (python 2.4,...

Python

Downloading files using URLLib

by: Oyvind Ostlund | last post by:

Hello, I have to download a lot of files. And at the moment i am trying with URLLib. But sometimes it doesn't download the whole file. It looks like it stops half way through or something. Is it...

Python

urllib behaves strangely

by: Gabriel Zachmann | last post by:

Here is a very simple Python script utilizing urllib: import urllib url = "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological" print url print file = urllib.urlopen(...

Python

urllib timeout issues

by: supercooper | last post by:

I am downloading images using the script below. Sometimes it will go for 10 mins, sometimes 2 hours before timing out with the following error: Traceback (most recent call last): File...

Python

urllib (54, 'Connection reset by peer') error

by: chrispoliquin | last post by:

Hi, I have a small Python script to fetch some pages from the internet. There are a lot of pages and I am looping through them and then downloading the page using urlretrieve() in the urllib...

Python

Avoiding redirects with urllib

by: Fernando Rodriguez | last post by:

Hi, I'musing urllib to download pages from a site. How can I detect if a given url is being redirected somewhere else? I want to avoid this, is it possible? Thanks in advance!

Python

Problem using urllib to download images

by: tstrogen | last post by:

I am using Python 2.6 on Mac OS 10.3.9. I have been trying to use: image = urllib.URLopener() image.retrieve(url, filename) to download images from websites. I am able to do so, and end up with...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server