473,405 Members | 2,349 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

When should cacheing proxies check for changes?

I tried posting this on comp.infosystems.www.misc, but that group
appears to get very little traffic. So now I'm trying here. If there
is a more appropriate group, please let me know.

I'm interested in how cacheing web proxies are expected to behave with
regard to pages that change. After changing the content of a simple
web site I have, I discovered that the cacheing web proxies at at
least one ISP I have access to did not refresh their caches. No matter
what I did (even Ctrl+F5 in IE 6.0 to try and force a reload), they
just carried on serving up the old content. I had always assumed that
when such a cacheing proxy was asked for a page, that it should check
back with the original web server to see if the page had been updated
(based on the cached value of the page's Last-Modified HTTP header). I
believe this is a "Get if-modified-since" request. After asking about
this on the online forum of the ISP in question, others told me that a
cacheing proxy is not actually required to check back, and that many
administrators set up their proxies to only check back now and again.

I understand that by avoiding doing so the ISP saves a little bit of
external bandwidth (although the amount saved seems so trifling as to
be irrelevant), and that it will reduce latency (obviously a good
idea), but if it's at the expense of serving up stale pages, then it
just seems kind of wrong. Is it true that a cacheing proxy isn't
expected to check whether a page has been modified before serving it
out?

As the HTML author, I don't see how I can be expected to have any
control over this, short of changing the name of the page each time I
update it (which, apart from being a hassle to do, would render
browser bookmarks out of date).

(Before advising me to use the HTTP headers to assist the proxy, I
will just point out that the web site is hosted at a "normal" ISP, and
I have no control over the HTTP headers their web servers send out.
But I have checked the headers, and they do include a Last-Modified
value).
Jul 20 '05 #1
10 2373
Clive Backham wrote:
(Before advising me to use the HTTP headers to assist the proxy...


Well HTTP headers are the correct tool for this job. If you can't set
'Expires', then ask your ISP to do it for you. If they won't, and this is
important to you, invest in better webhosting.

--
David Dorward http://dorward.me.uk/
Jul 20 '05 #2
In article <3f***************@news.nildram.co.uk>,
Clive Backham <cl***@capita.nildram.co.uk> wrote:
I'm interested in how cacheing web proxies are expected to behave with
regard to pages that change. After changing the content of a simple
web site I have, I discovered that the cacheing web proxies at at
least one ISP I have access to did not refresh their caches. No matter
what I did (even Ctrl+F5 in IE 6.0 to try and force a reload), they
just carried on serving up the old content. I had always assumed that


In my experience IE only sends If-Modified-Since rather than appropriate
Cache-Control headers. Try a standards-compliant browser like lynx or
Mozilla/Firebird to force caches to refresh. (I think Opera will do it
on *two* shift/refreshes.)

nhoJ

--
John P Baker
Jul 20 '05 #3
In article <3f***************@news.nildram.co.uk>, one of infinite monkeys
at the keyboard of cl***@capita.nildram.co.uk (Clive Backham) wrote:
I tried posting this on comp.infosystems.www.misc, but that group
appears to get very little traffic. So now I'm trying here. If there
is a more appropriate group, please let me know.
Well, it's one of those issues that fall between groups. I'd expect
to find more relevant expertise on *.servers.*.
I'm interested in how cacheing web proxies are expected to behave with
regard to pages that change.
Conservative in what you send, liberal in what you accept?

Have you read the extensive discussion of this in the HTTP RFC, or in
documentation of proxying software?
After changing the content of a simple
web site I have, I discovered that the cacheing web proxies at at
least one ISP I have access to did not refresh their caches. No matter
what I did (even Ctrl+F5 in IE 6.0 to try and force a reload), they
just carried on serving up the old content.
IE is well-known for only supporting a very limited subset of HTTP.
It's the last thing you should ever use for diagnostics!
I had always assumed that
when such a cacheing proxy was asked for a page, that it should check
back with the original web server to see if the page had been updated
That depends on the headers from the page and from the browser. See [HTTP].
(based on the cached value of the page's Last-Modified HTTP header). I
believe this is a "Get if-modified-since" request. After asking about
this on the online forum of the ISP in question, others told me that a
cacheing proxy is not actually required to check back, and that many
administrators set up their proxies to only check back now and again.
That's true, but it's not the whole story. HTTP/1.1 gives a lot of
control of this to browsers and servers. But that sometimes gets abused
by ignorant authors (see "how do I prevent my page getting cached" threads
in the newsgroups - the right answer is almost always to do nothing
because the server will default to correct behaviour) and in turn
proxies may sometimes be over-aggressive. One could suggest a loose
analogy to search-engine-spamming that devalued "keywords".
I understand that by avoiding doing so the ISP saves a little bit of
external bandwidth (although the amount saved seems so trifling as to
be irrelevant),
Erm, it may be very relevant indeed if you're huge and have millions
of users behind a proxy. Think AOL: if they stopped running their
big caching proxies, the increased traffic would be significant
throughout the 'net backbone.
and that it will reduce latency (obviously a good
idea), but if it's at the expense of serving up stale pages, then it
just seems kind of wrong. Is it true that a cacheing proxy isn't
expected to check whether a page has been modified before serving it
out?
See [HTTP].
As the HTML author, I don't see how I can be expected to have any
control over this, short of changing the name of the page each time I
update it (which, apart from being a hassle to do, would render
browser bookmarks out of date).
And would make you part of the problem in more ways than that.
(Before advising me to use the HTTP headers to assist the proxy, I
will just point out that the web site is hosted at a "normal" ISP, and
I have no control over the HTTP headers their web servers send out.
But I have checked the headers, and they do include a Last-Modified
value).


So you're doing it right from where you are (apart from your use of
MSIE). Stick with that. There is a lot of brokenness out there;
don't make it worse.

--
Nick Kew

In urgent need of paying work - see http://www.webthing.com/~nick/cv.html
Jul 20 '05 #4
Thanks to eveyone who has helped me on this. Just a bit of feedback:

On Wed, 1 Oct 2003 17:45:43 +0100, ni**@fenris.webthing.com (Nick Kew)
wrote:
Have you read the extensive discussion of this in the HTTP RFC, or in
documentation of proxying software?


Forgive me when I say that the RFC is hardly light reading and that I
had difficulty following the fine detail. But I have read a more
informal paper I found online about it. It looks like I should make an
effort to understand the RFC once again.
I understand that by avoiding doing so the ISP saves a little bit of
external bandwidth (although the amount saved seems so trifling as to
be irrelevant),


Erm, it may be very relevant indeed if you're huge and have millions
of users behind a proxy. Think AOL: if they stopped running their
big caching proxies, the increased traffic would be significant
throughout the 'net backbone.


Point taken.
(Before advising me to use the HTTP headers to assist the proxy, I
will just point out that the web site is hosted at a "normal" ISP, and
I have no control over the HTTP headers their web servers send out.
But I have checked the headers, and they do include a Last-Modified
value).


So you're doing it right from where you are (apart from your use of
MSIE). Stick with that. There is a lot of brokenness out there;
don't make it worse.


So the bottom line seems to be that any mechanism which would
guarantee that cacheing proxies never serve stale pages will
inevitably generate a significant amount of extra backbone traffic,
and on balance it's better for the net community to put up with some
stale pages getting served up than to cope with all that extra
traffic. As the website author, I just have to accept this pragmatic
approach. Have I got that right?
Jul 20 '05 #5
On Thu, 2 Oct 2003, Clive Backham wrote:
Forgive me when I say that the RFC is hardly light reading and that I
had difficulty following the fine detail. But I have read a more
informal paper I found online about it.
But you're not telling us what it was: that's unhelpful of you.

If it was http://www.mnot.net/cache_docs/ , then good: if it wasn't,
then I suggest you read that one.
So the bottom line seems to be that any mechanism which would
guarantee that cacheing proxies never serve stale pages will
inevitably generate a significant amount of extra backbone traffic,
That's the long and the short of it, yes. Once a cache has got the
idea that a resource is stable and has decided not to check it for
freshness for a while, there's no way that you from the server side
can contradict that retrospectively. Because the cache won't ask the
server again until it's good and ready.

If you expect a page to change e.g daily then you could send it out
with appropriate cacheability suggestions (Apache has some handy
features for doing this via configuration statements). Caches aren't
compelled to take those suggestions, even so. And as Nick says, it
would be rude and counterproductive, in general, to pretend that a
document is uncacheable or short-lived even though it doesn't normally
change from one month to the next.

The client, on the other hand, has technical mechanisms available to
"punch through" a cache when necessary. That's if their client
software supports the mechanism (and the punter knows how to use it
;-), and, of course, provided that the cache server also honours that
mechanism (I think most do, but there have been rumors in the past
that certain big providers would never access a given server more
often than e.g once a day, no matter how hard the client tried).
and on balance it's better for the net community to put up with some
stale pages getting served up than to cope with all that extra
traffic. As the website author, I just have to accept this pragmatic
approach. Have I got that right?


Right-ish. As I say, if you have better knowledge of when your next
update will be due, you can suggest an expiry date/time for your
current page. But see that tutorial by Mark Nottingham, it's good.
Jul 20 '05 #6
On Thu, 2 Oct 2003 10:47:27 +0100, "Alan J. Flavell"
<fl*****@ph.gla.ac.uk> wrote:
On Thu, 2 Oct 2003, Clive Backham wrote:
Forgive me when I say that the RFC is hardly light reading and that I
had difficulty following the fine detail. But I have read a more
informal paper I found online about it.
But you're not telling us what it was: that's unhelpful of you.


Yes, I know. I found it several weeks ago, printed it out, and now
it's at home (and I'm currently at work). I did try to find it again
so I could reference it, but failed.
If it was http://www.mnot.net/cache_docs/ , then good: if it wasn't,
then I suggest you read that one.
That's the one; thank you for the URL (which I will now bookmark!).
If you expect a page to change e.g daily then you could send it out
with appropriate cacheability suggestions (Apache has some handy
features for doing this via configuration statements). Caches aren't
compelled to take those suggestions, even so. And as Nick says, it
would be rude and counterproductive, in general, to pretend that a
document is uncacheable or short-lived even though it doesn't normally
change from one month to the next.
Quite so. The pages in my web site typically stay static for several
weeks or months, and I have no idea in advance when they will next
change. So this rules out using an Expires: header. The problem I have
is that when I do make a new release of the software and mailshot my
users, I invariably get dozens of replies that they can't get to the
new version. True, most of them just need to refresh their own
browser's cache, but it seems that some of them are stuck behind
transparent cacheing proxies that simply refuse to refresh themselves
until they're good and ready. Perhaps it would be better to wait a
couple of days after updating the website before announcing the new
release, to give proxies time to refresh?
The client, on the other hand, has technical mechanisms available to
"punch through" a cache when necessary. That's if their client
software supports the mechanism
.....which rules out IE prior to 5.5sp1, I believe (ie. probably the
majority of browsers in use around the world)....
(and the punter knows how to use it ;-)
.....which rules out the majority of punters (but they probably use IE
anyway, so we shouldn't count them twice :-)
and, of course, provided that the cache server also honours that
mechanism (I think most do, but there have been rumors in the past
that certain big providers would never access a given server more
often than e.g once a day, no matter how hard the client tried).


Indeed. From the proxy behaviour I saw, it seems that NTL in the UK
may be guilty of this.

Thanks again to everyone for their comments.
Jul 20 '05 #7
Tim
On Thu, 02 Oct 2003 12:19:31 GMT,
cl***@capita.nildram.co.uk (Clive Backham) wrote:
The pages in my web site typically stay static for several
weeks or months, and I have no idea in advance when they will next
change. So this rules out using an Expires: header.


Not really. You have expires headers x days since the document has been
modified. Say you set it to 15 days, giving moderately reasonable
cacheability, and as soon as you modify the document, the counter is
reset. You can set other things (e.g. navigational icons, logos, etc.),
to be longer.

Then, browsers are supposed to check that a document is unexpired before
re-getting it (intervening proxies, likewise). (You request a resource,
it checks for newer versions and gets them, or uses what's already
there.) Of course, some will get it wrong, but some will get it wrong
no matter what you do.

One of my ISPs stupidly sets expiry times to about 3 minutes. That's
not even long enough for wading back and forth through the site in one
sessions. Cluelessness knows no bounds.

--
My "from" address is totally fake. (Hint: If I wanted e-mails from
complete strangers, I'd have put a real one, there.) Reply to usenet
postings in the same place as you read the message you're replying to.
Jul 20 '05 #8
cl***@capita.nildram.co.uk (Clive Backham) wrote in
news:3f***************@news.nildram.co.uk:
On Thu, 2 Oct 2003 10:47:27 +0100, "Alan J. Flavell"
<fl*****@ph.gla.ac.uk> wrote:

If you expect a page to change e.g daily then you could send it out
with appropriate cacheability suggestions (Apache has some handy
features for doing this via configuration statements). Caches aren't
compelled to take those suggestions, even so. And as Nick says, it
would be rude and counterproductive, in general, to pretend that a
document is uncacheable or short-lived even though it doesn't normally
change from one month to the next.


Quite so. The pages in my web site typically stay static for several
weeks or months, and I have no idea in advance when they will next
change. So this rules out using an Expires: header. The problem I have
is that when I do make a new release of the software and mailshot my
users, I invariably get dozens of replies that they can't get to the
new version. True, most of them just need to refresh their own
browser's cache, but it seems that some of them are stuck behind
transparent cacheing proxies that simply refuse to refresh themselves
until they're good and ready.


Some(most?) caching proxies only cache "HTML documents", so, at
least in some cases, content that is presumed not to be as static
as a 'regular webpage' isn't cached, and I believe that at least
in some cases that includes PHP pages. You might try making your
download page a PHP page, and see if that helps with your problem.

--
Dave Patton
Canadian Coordinator, the Degree Confluence Project
http://www.confluence.org dpatton at confluence dot org
My website: http://members.shaw.ca/davepatton/
Vancouver/Whistler - host of the 2010 Winter Olympics
Jul 20 '05 #9
Tim wrote:

One of my ISPs stupidly sets expiry times to about 3 minutes. That's
not even long enough for wading back and forth through the site in one
sessions. Cluelessness knows no bounds.


ouch. And you're stuck with it?

--
Brian
follow the directions in my address to email me

Jul 20 '05 #10
Tim
Tim wrote:
One of my ISPs stupidly sets expiry times to about 3 minutes. That's
not even long enough for wading back and forth through the site in one
sessions. Cluelessness knows no bounds.


Brian <us*****@mangymutt.com.invalid-remove-this-part> wrote:
ouch. And you're stuck with it?


Yes, unless I change services. There's is one of those hideous IIS
systems, with no user-overrides.

--
My "from" address is totally fake. (Hint: If I wanted e-mails from
complete strangers, I'd have put a real one, there.) Reply to usenet
postings in the same place as you read the message you're replying to.
Jul 20 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Alan Kennedy | last post by:
Greetings all, A few months ago I said I'd put together a page listing all of the python HTTP proxies that I knew of. And now the time has come..... http://xhaus.com/alan/python/proxies.html ...
12
by: Sunny | last post by:
Hi, I need to download some web pages from a web server (most of them are not static html, but .asp pages). The content (I.e. the pure html) in the pages are not changed very often, so I'll...
5
by: Wal Turner | last post by:
This has been discussed in a prior thread but there was no solution proposed. Also, using WebRequest is not viable since we need a keep-alive connection. Can anyone provide any information on using...
0
by: Morten Overgaard | last post by:
Hi (Sorry for the cross-posting) We are developing a SOAPService which has its types defines in different XSDs like CommonTypes.xsd (namespace http:/MyNamespace/commonTypes -ComplexTypeA...
1
by: tomazi75-nospam(at)gmail.com | last post by:
Hello all, I've a problem using urllib2 with a proxy which need authentication. I've tested the 'simple way' : -- code -- import urllib # example values for the post
5
by: CindyRob | last post by:
Using .NET framework 1.1 SP1, .NET framework SDK 1.1 SP1, Visual Studio .NET 2003, hotfixes 892202 and 823639. I create a proxy class using wsdl.exe, and in the serialized XML request, I see...
0
by: Alimah | last post by:
My objective is to log onto a wiki account (specifically wikipedia) using the http proxies provided by them (145.97.39.130 - 145.97.39.140:80). The operating system is Windows XP/Windows Server 2003....
30
by: Charles Law | last post by:
Here's one that should probably have the sub-heading "I'm sure I asked this once before, but ...". Two users are both looking at the same data, from a database. One user changes the data and...
1
by: Eddy Jones | last post by:
I'm looking to implement the Enterprise Cacheing block as a plugin to an ASP page. I expect this page to be hit many many times a second and I'm concerned about thread safety. One of the 'bullet...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.