473,513 Members | 2,661 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

urllib2 rate limiting

Hello list,

I want to limit the download speed when using urllib2. In particular,
having several parallel downloads, I want to make sure that their total
speed doesn't exceed a maximum value.

I can't find a simple way to achieve this. After researching a can try
some things but I'm stuck on the details:

1) Can I overload some method in _socket.py to achieve this, and perhaps
make this generic enough to work even with other libraries than urllib2?

2) There is the urllib.urlretrieve() function which accepts a reporthook
parameter. Perhaps I can have reporthook to increment a global counter and
sleep as necessary when a threshold is reached.
However there is not something similar in urllib2. Isn't urllib2 supposed
to be a superset of urllib in functionality? Why there is no reporthook
parameter in any of urllib2's functions?
Moreover, even the existing way reporthook can be used doesn't seem so
right: reporthook(blocknum, bs, size) is always called with bs=8K even
for the last block, and sometimes (blocknum*bs size) is possible, if the
server sends wrong Content-Lentgth HTTP headers.

3) Perhaps I can use filehandle.read(1024) and manually read as many
chunks of data as I need. However I think this would generally be
inefficient and I'm not sure how it would work because
of internal buffering of urllib2.

So how do you think I can achieve rate limiting in urllib2?
Thanks in advance,
Dimitris

P.S. And something simpler: How can I disallow urllib2 to follow
redirections to foreign hosts?
Jan 10 '08 #1
4 7184
Dimitrios Apostolou <ji***@gmx.netwrites:
P.S. And something simpler: How can I disallow urllib2 to follow
redirections to foreign hosts?
You need to subclass `urllib2.HTTPRedirectHandler`, override
`http_error_301` and `http_error_302` methods and throw
`urllib2.HTTPError` exception.

http://diveintopython.org/http_web_s...redirects.html

HTH,
Rob
Jan 10 '08 #2
On Thu, 10 Jan 2008, Rob Wolfe wrote:
Dimitrios Apostolou <ji***@gmx.netwrites:
>P.S. And something simpler: How can I disallow urllib2 to follow
redirections to foreign hosts?

You need to subclass `urllib2.HTTPRedirectHandler`, override
`http_error_301` and `http_error_302` methods and throw
`urllib2.HTTPError` exception.
Thanks! I think for my case it's better to override redirect_request
method, and return a Request only in case the redirection goes to the
same site. Just another question, because I can't find in the docs the
meaning of (req, fp, code, msg, hdrs) parameters. To read the URL I get
redirected to (the 'Location:' HTTP header?), should I check the hdrs
parameter or there is a better way?
Thanks,
Dimitris

>
http://diveintopython.org/http_web_s...redirects.html

HTH,
Rob
--
http://mail.python.org/mailman/listinfo/python-list
Jan 10 '08 #3
Dimitrios Apostolou <ji***@gmx.netwrote:
I want to limit the download speed when using urllib2. In particular,
having several parallel downloads, I want to make sure that their total
speed doesn't exceed a maximum value.

I can't find a simple way to achieve this. After researching a can try
some things but I'm stuck on the details:

1) Can I overload some method in _socket.py to achieve this, and perhaps
make this generic enough to work even with other libraries than urllib2?

2) There is the urllib.urlretrieve() function which accepts a reporthook
parameter.
Here is an implementation based on that idea. I've used urllib rather
than urllib2 as that is what I'm familiar with.

------------------------------------------------------------
#!/usr/bin/python

"""
Fetch a url rate limited

Syntax: rate URL local_file_name
"""

import os
import sys
import urllib
from time import time, sleep

class RateLimit(object):
"""Rate limit a url fetch"""
def __init__(self, rate_limit):
"""rate limit in kBytes / second"""
self.rate_limit = rate_limit
self.start = time()
def __call__(self, block_count, block_size, total_size):
total_kb = total_size / 1024
downloaded_kb = (block_count * block_size) / 1024
elapsed_time = time() - self.start
if elapsed_time != 0:
rate = downloaded_kb / elapsed_time
print "%d kb of %d kb downloaded %f.1 kBytes/s\n" % (downloaded_kb ,total_kb, rate),
expected_time = downloaded_kb / self.rate_limit
sleep_time = expected_time - elapsed_time
print "Sleep for", sleep_time
if sleep_time 0:
sleep(sleep_time)

def main():
"""Fetch the contents of urls"""
if len(sys.argv) != 4:
print 'Syntax: %s "rate in kBytes/s" URL "local output path"' % sys.argv[0]
raise SystemExit(1)
rate_limit, url, out_path = sys.argv[1:]
rate_limit = float(rate_limit)
print "Fetching %r to %r with rate limit %.1f" % (url, out_path, rate_limit)
urllib.urlretrieve(url, out_path, reporthook=RateLimit(rate_limit))

if __name__ == "__main__": main()
------------------------------------------------------------

Use it like this

$ ./rate-limited-fetch.py 16 http://some/url/or/other z
Fetching 'http://some/url/or/other' to 'z' with rate limit 16.0
0 kb of 10118 kb downloaded 0.000000.1 kBytes/s
Sleep for -0.0477550029755
8 kb of 10118 kb downloaded 142.073242.1 kBytes/s
Sleep for 0.443691015244
16 kb of 10118 kb downloaded 32.130966.1 kBytes/s
Sleep for 0.502038002014
24 kb of 10118 kb downloaded 23.952789.1 kBytes/s
Sleep for 0.498028993607
32 kb of 10118 kb downloaded 21.304672.1 kBytes/s
Sleep for 0.497982025146
40 kb of 10118 kb downloaded 19.979510.1 kBytes/s
Sleep for 0.497948884964
48 kb of 10118 kb downloaded 19.184721.1 kBytes/s
Sleep for 0.498008966446
....
1416 kb of 10118 kb downloaded 16.090774.1 kBytes/s
Sleep for 0.499262094498
1424 kb of 10118 kb downloaded 16.090267.1 kBytes/s
Sleep for 0.499293088913
1432 kb of 10118 kb downloaded 16.089760.1 kBytes/s
Sleep for 0.499292135239
1440 kb of 10118 kb downloaded 16.089254.1 kBytes/s
Sleep for 0.499267101288
....
--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Jan 11 '08 #4
On Fri, 11 Jan 2008, Nick Craig-Wood wrote:
Here is an implementation based on that idea. I've used urllib rather
than urllib2 as that is what I'm familiar with.
Thanks! Really nice implementation. However I'm stuck with urllib2 because
of its extra functionality so I'll try to implement something similar
using handle.read(1024) to read in small chunks.

It really seems weird that urllib2 is missing reporthook functionality!
Thank you,
Dimitris

Jan 12 '08 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
21082
by: O. Koch | last post by:
Until now, i know that ftplib doesn't support proxies and that i have to use urllib2. But i don't know how to use the urllib2 correct. I found some examples, but i don't understand them. Is...
4
4441
by: bmiras | last post by:
I've got a problem using urllib2 to get a web page. I'm going through a proxy using user/password authentification and i'm trying to get a page asking for a HTTP authentification. And I'm using...
1
3942
by: Matthew Wilson | last post by:
I am writing a script to check on my router's external IP address. My ISP refreshes my IP very often and I use dyndns for the hostname for my computer. My Netgear mr814 router has a webserver that...
2
6073
by: John F Dutcher | last post by:
Can anyone comment on why the code shown in the Python error is in some way incorrect...or is there a problem with Python on my hoster's site ?? The highlites don't seem to show here...but line...
5
7378
by: Pascal | last post by:
Hello, I want to acces my OWA (Outlook Web Acces - http Exchange interface) server with urllib2 but, when I try, I've always a 401 http error. Can someone help me (and us)? Thanks. ...
4
1626
by: Monty | last post by:
Hello, Sorry for this maybe stupid newbie question but I didn't find any answer in all my readings about python: With urllib, using urlretrieve, it's possible to get the number of blocks...
1
3352
by: Ray Slakinski | last post by:
Hello, I have defined a function to set an opener for urllib2, this opener defines any proxy and http authentication that is required. If the proxy has authencation itself and requests an...
3
2507
by: Peter Silva | last post by:
Hi folks, I have a need in a network data distribution application to send out data to folks who want it using the protocol of their choice. I´d like it to support a variety of protocols and I...
1
5734
by: Alessandro Fachin | last post by:
I write this simply code that should give me the access to private page with htaccess using a proxy, i don't known because it's wrong... import urllib,urllib2 #input url...
0
7257
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7157
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7535
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7098
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5682
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5084
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4745
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3232
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3221
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.