471,616 Members | 1,624 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,616 software developers and data experts.

urllib to cache 301 redirections?

Hi,
There is an Open Tracker item against urllib2 library python.org/sf/735515
which states that.
urllib / urllib2 should cache the results of 301 (permanent) redirections.
This shouldn't break anything, since it's just an internal optimisation
from one point of view -- but it's also what the RFC (2616, section 10.3.2, first para) says
SHOULD happen.

I am trying to understand, what does it mean.
Should the original url be avaiable to the user upon request as urllib
automatically calls the redirect_request and provides the redirected url only?

I am not completely getting what "cache - redirection" implies and what should
be done with the urllib2 module. Any pointers?

Thanks,
--
O.R.Senthil Kumaran
http://uthcode.sarovar.org
Jul 6 '07 #1
6 1955
"O.R.Senthil Kumaran" <or*******@users.sourceforge.netwrites:
Hi,
There is an Open Tracker item against urllib2 library python.org/sf/735515
which states that.
urllib / urllib2 should cache the results of 301 (permanent) redirections.
This shouldn't break anything, since it's just an internal optimisation
from one point of view -- but it's also what the RFC (2616, section 10.3.2, first para) says
SHOULD happen.

I am trying to understand, what does it mean.
Should the original url be avaiable to the user upon request as urllib
automatically calls the redirect_request and provides the redirected url only?
urllib2, you mean.

Regardless of this bug, Request.get_full_url() should be (and is)
whatever URL the request instance was originally constructed with.

I am not completely getting what "cache - redirection" implies and what should
be done with the urllib2 module. Any pointers?
When a 301 redirect occurs after a request for URL U, via
urllib2.urlopen(U), urllib2 should remember the result of that
redirection, viz a second URL, V. Then, when another
urllib2.urlopen(U) takes place, urllib2 should send an HTTP request
for V, not U. urllib2 does not currently do this. (Obviously the
cache -- that is, the dictionary or whatever that stores the mapping
from URLs U to V -- should not be maintained by function urlopen
itself. Perhaps it should live on the redirect handler.)

302 redirections are temporary and are handled correctly in this
respect already by urllib2.
John
Jul 6 '07 #2
Thank you for the reply, Mr. John and I apologize for a very late response
from my end.

* John J. Lee <jj*@pobox.com[2007-07-06 18:53:09]:
"O.R.Senthil Kumaran" <or*******@users.sourceforge.netwrites:
Hi,
There is an Open Tracker item against urllib2 library python.org/sf/735515
I am not completely getting what "cache - redirection" implies and what should
be done with the urllib2 module. Any pointers?

When a 301 redirect occurs after a request for URL U, via
urllib2.urlopen(U), urllib2 should remember the result of that
redirection, viz a second URL, V. Then, when another
urllib2.urlopen(U) takes place, urllib2 should send an HTTP request
for V, not U. urllib2 does not currently do this. (Obviously the
cache -- that is, the dictionary or whatever that stores the mapping
from URLs U to V -- should not be maintained by function urlopen
itself. Perhaps it should live on the redirect handler.)
I spent a little time thinking about a solution and figured out that the
following changes to HTTPRedirectHandler, might be helpful in implementing
this.

Class HTTPRedirectHandler(BaseHandler):
# ... omitted ...
# Initialize a dictionary to hold cache.

def __init__(self):
self.cache = {}
# Handles 301 errors separately in a different function which maintains a
# maintains cache.

def http_error_301(self, req, fp, code, msg, headers):

if req in self.cache:
# Look for loop, if a particular url appears in both key and value
# then there is loop and return HTTPError
if len(set(self.cache.keys()) & set(self.cache.values())) 0:
raise HTTPError(req.get_full_url(), code, self.inf_msg + msg +
headers, fp)
return self.cache[req]

self.cache[req] = self.http_error_302(req,fp,code,msg, headers)
return self.cache[req]
John, let me know your comments on this approach.
I have not tested this code in real scenario yet with a 301 redirect.
If its okay, I shall test it and submit a patch for the tracker item.

Thanks,
Senthil

--
O.R.Senthil Kumaran
http://uthcode.sarovar.org
Jul 16 '07 #3
On Tue, 17 Jul 2007, O.R.Senthil Kumaran wrote:
[...]
I spent a little time thinking about a solution and figured out that the
following changes to HTTPRedirectHandler, might be helpful in implementing
this.
[...]

Did you post it on the Python SF patch tracker?

If not, please do, and point us at it. I'll comment there.
John

Jul 16 '07 #4
O.R.Senthil Kumaran wrote:
Thank you for the reply, Mr. John and I apologize for a very late response
from my end.

* John J. Lee <jj*@pobox.com[2007-07-06 18:53:09]:

>>"O.R.Senthil Kumaran" <or*******@users.sourceforge.netwrites:

>>>Hi,
There is an Open Tracker item against urllib2 library python.org/sf/735515
>>>I am not completely getting what "cache - redirection" implies and what should
be done with the urllib2 module. Any pointers?

When a 301 redirect occurs after a request for URL U, via
urllib2.urlopen(U), urllib2 should remember the result of that
redirection, viz a second URL, V. Then, when another
urllib2.urlopen(U) takes place, urllib2 should send an HTTP request
for V, not U. urllib2 does not currently do this. (Obviously the
cache -- that is, the dictionary or whatever that stores the mapping
from URLs U to V -- should not be maintained by function urlopen
itself. Perhaps it should live on the redirect handler.)


I spent a little time thinking about a solution and figured out that the
following changes to HTTPRedirectHandler, might be helpful in implementing
this.

Class HTTPRedirectHandler(BaseHandler):
# ... omitted ...
# Initialize a dictionary to hold cache.

def __init__(self):
self.cache = {}
# Handles 301 errors separately in a different function which maintains a
# maintains cache.

def http_error_301(self, req, fp, code, msg, headers):

if req in self.cache:
# Look for loop, if a particular url appears in both key and value
# then there is loop and return HTTPError
if len(set(self.cache.keys()) & set(self.cache.values())) 0:
raise HTTPError(req.get_full_url(), code, self.inf_msg + msg +
headers, fp)
return self.cache[req]

self.cache[req] = self.http_error_302(req,fp,code,msg, headers)
return self.cache[req]
John, let me know your comments on this approach.
I have not tested this code in real scenario yet with a 301 redirect.
If its okay, I shall test it and submit a patch for the tracker item.
That assumes you're reusing the same object to reopen another URL.

Is this thread-safe?

That's also an inefficient way to test for an empty dictionary.

John Nagle
Jul 16 '07 #5
* John J Lee <jj*@pobox.com[2007-07-16 20:17:40]:
I spent a little time thinking about a solution and figured out that the
following changes to HTTPRedirectHandler, might be helpful in implementing
this.

Did you post it on the Python SF patch tracker?

If not, please do, and point us at it. I'll comment there.
Posted: http://www.python.org/sf/1755841
Thanks,
--
O.R.Senthil Kumaran
http://uthcode.sarovar.org
Jul 18 '07 #6
* John Nagle <na***@animats.com[2007-07-16 12:34:00]:
That assumes you're reusing the same object to reopen another URL.

Is this thread-safe?
I don't know. I looked into few other cache requests (cache ftp) and saw how it was
implemented. I am not getting as how this wont be thread-safe.
>
That's also an inefficient way to test for an empty dictionary.
How should it be done, otherwise? I am looking for alternative methods as
well.

--
O.R.Senthil Kumaran
http://uthcode.sarovar.org
Jul 18 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Pieter Edelman | last post: by
1 post views Thread by Timothy Wu | last post: by
1 post views Thread by Timothy Smith | last post: by
4 posts views Thread by william | last post: by
6 posts views Thread by justsee | last post: by
1 post views Thread by AndrewJ | last post: by
reply views Thread by =?Utf-8?B?UnVzc2VsbCBQb29sZXk=?= | last post: by
5 posts views Thread by supercooper | last post: by
5 posts views Thread by chrispoliquin | last post: by
1 post views Thread by XIAOLAOHU | last post: by
reply views Thread by leo001 | last post: by
1 post views Thread by ZEDKYRIE | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.