By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,772 Members | 935 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,772 IT Pros & Developers. It's quick & easy.

How to prevent the script from stopping before it should

P: n/a
I have a script that downloads some webpages.The problem is that,
sometimes, after I download few pages the script hangs( stops).
(But sometimes it finishes in an excellent way ( to the end) and
download all the pages I want to)
I think the script stops if the internet connection to the server (from
where I download the pages) is rather poor.
Is there a solution how to prevent the script from hanging before all
pages are downloaded?

Thanks for help
Lad.

Jul 18 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine

Jul 18 '05 #2

P: n/a
wi******@hotmail.com wrote:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine

More generally you may wish to use the timeout features of TCP sockets.
These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.

regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119
Jul 18 '05 #3

P: n/a

Steve Holden wrote:
wi******@hotmail.com wrote:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP

sockets. These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.
regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119


Thank you wi******@hotmail.com and Steve for some ideas.Finding the
fact that the script hanged is not a big problem .
I,however, would need a solution that I will not need to start again
the script but the script re-start by itself. I am thinking about two
threads, the main(master) that will supervise a slave thread.This slave
thread will download the pages and whenever there is a timeout the
master thread restart a slave thread.
Is it a good solution? Or is there a better one?
Thanks for help
Lad

Jul 18 '05 #4

P: n/a
Steve Holden wrote:
You will need to import the socket module and then call socket.setdefaulttimeout() to ensure that
communication with non-responsive servers results in a socket exception that you can trap.


or you can use asynchronous sockets, so your program can keep processing
the sites that do respond at once while it's waiting for the ones that don't. for
one way to do that, see "Using HTTP to Download Files" here:

http://effbot.org/zone/effnews-1.htm

(make sure you read the second and third article as well)

</F>

Jul 18 '05 #5

P: n/a

Fredrik Lundh wrote:
Steve Holden wrote:
You will need to import the socket module and then call socket.setdefaulttimeout() to ensure that communication with non-responsive servers results in a socket
exception that you can trap.
or you can use asynchronous sockets, so your program can keep processing the sites that do respond at once while it's waiting for the ones that don't. for one way to do that, see "Using HTTP to Download Files" here:

http://effbot.org/zone/effnews-1.htm

(make sure you read the second and third article as well)

Dear Fredrik Lundh,
Thank you for the link. I checked it. But I have not found an answer to
my question.
My problem is that I can not finish( sometimes) to download all pages.
Sometimes my script freezes and I can not do nothing but restart the
script from the last successfully downloaded web page. There is no
error saying that was an error. I do not know why; maybe the server is
programed to reduce the numbers of connection or there maybe different
reasons.So, my idea was two threads. One master ,suprevising the slave
thread that would do downloading and if the slave thread stopped,
master thread would start another slave. Is it a good solution? Or is
there a better solution?
Thanks for help
Lad

Jul 18 '05 #6

P: n/a

Steve Holden wrote:
wi******@hotmail.com wrote:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP

sockets. These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.
So adding :

import socket
socket.setdefaulttimeout()
Is *necessary* in order to avoid hangs when using urllib2 to fetch web
resources ?

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml
regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119


Jul 18 '05 #7

P: n/a

Fuzzyman wrote:
Steve Holden wrote:
wi******@hotmail.com wrote:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e) will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP

sockets.
These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can

trap.


So adding :

import socket
socket.setdefaulttimeout()
Is *necessary* in order to avoid hangs when using urllib2 to fetch

web resources ?

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml


Fuzzy,
I use HTTPLIB with timeoutsocket but there is no timeout but the script
freezes sometimes.I suspect the server, from which I download pages,
does that to prevent high traffic. I must re start my script.
Do you think Urllib2 would be better?
Or is there a better solution?
Regards,
Lad

Jul 18 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.