473,396 Members | 2,106 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

How to prevent the script from stopping before it should

I have a script that downloads some webpages.The problem is that,
sometimes, after I download few pages the script hangs( stops).
(But sometimes it finishes in an excellent way ( to the end) and
download all the pages I want to)
I think the script stops if the internet connection to the server (from
where I download the pages) is rather poor.
Is there a solution how to prevent the script from hanging before all
pages are downloaded?

Thanks for help
Lad.

Jul 18 '05 #1
7 1774
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine

Jul 18 '05 #2
wi******@hotmail.com wrote:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine

More generally you may wish to use the timeout features of TCP sockets.
These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.

regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119
Jul 18 '05 #3

Steve Holden wrote:
wi******@hotmail.com wrote:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP

sockets. These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.
regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119


Thank you wi******@hotmail.com and Steve for some ideas.Finding the
fact that the script hanged is not a big problem .
I,however, would need a solution that I will not need to start again
the script but the script re-start by itself. I am thinking about two
threads, the main(master) that will supervise a slave thread.This slave
thread will download the pages and whenever there is a timeout the
master thread restart a slave thread.
Is it a good solution? Or is there a better one?
Thanks for help
Lad

Jul 18 '05 #4
Steve Holden wrote:
You will need to import the socket module and then call socket.setdefaulttimeout() to ensure that
communication with non-responsive servers results in a socket exception that you can trap.


or you can use asynchronous sockets, so your program can keep processing
the sites that do respond at once while it's waiting for the ones that don't. for
one way to do that, see "Using HTTP to Download Files" here:

http://effbot.org/zone/effnews-1.htm

(make sure you read the second and third article as well)

</F>

Jul 18 '05 #5

Fredrik Lundh wrote:
Steve Holden wrote:
You will need to import the socket module and then call socket.setdefaulttimeout() to ensure that communication with non-responsive servers results in a socket
exception that you can trap.
or you can use asynchronous sockets, so your program can keep processing the sites that do respond at once while it's waiting for the ones that don't. for one way to do that, see "Using HTTP to Download Files" here:

http://effbot.org/zone/effnews-1.htm

(make sure you read the second and third article as well)

Dear Fredrik Lundh,
Thank you for the link. I checked it. But I have not found an answer to
my question.
My problem is that I can not finish( sometimes) to download all pages.
Sometimes my script freezes and I can not do nothing but restart the
script from the last successfully downloaded web page. There is no
error saying that was an error. I do not know why; maybe the server is
programed to reduce the numbers of connection or there maybe different
reasons.So, my idea was two threads. One master ,suprevising the slave
thread that would do downloading and if the slave thread stopped,
master thread would start another slave. Is it a good solution? Or is
there a better solution?
Thanks for help
Lad

Jul 18 '05 #6

Steve Holden wrote:
wi******@hotmail.com wrote:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e)
will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP

sockets. These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can trap.
So adding :

import socket
socket.setdefaulttimeout()
Is *necessary* in order to avoid hangs when using urllib2 to fetch web
resources ?

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml
regards
Steve
--
Steve Holden http://www.holdenweb.com/
Python Web Programming http://pydish.holdenweb.com/
Holden Web LLC +1 703 861 4237 +1 800 494 3119


Jul 18 '05 #7

Fuzzyman wrote:
Steve Holden wrote:
wi******@hotmail.com wrote:
#import urllib, sys
#pages = ['http://www.python.org', 'http://xxx']
#for i in pages:
# try:
# u = urllib.urlopen(i)
# print u.geturl()
# except Exception, e:
# print >> sys.stderr, '%s: %s' % (e.__class__.__name__, e) will print an error if a page fails opening, rest opens fine
More generally you may wish to use the timeout features of TCP

sockets.
These were introduced in Python 2.3, though Tim O'Malley's module
"timeoutsocket" (which was the inspiration for the 2.3 upgrade) was
available for earlier versions.

You will need to import the socket module and then call
socket.setdefaulttimeout() to ensure that communication with
non-responsive servers results in a socket exception that you can

trap.


So adding :

import socket
socket.setdefaulttimeout()
Is *necessary* in order to avoid hangs when using urllib2 to fetch

web resources ?

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml


Fuzzy,
I use HTTPLIB with timeoutsocket but there is no timeout but the script
freezes sometimes.I suspect the server, from which I download pages,
does that to prevent high traffic. I must re start my script.
Do you think Urllib2 would be better?
Or is there a better solution?
Regards,
Lad

Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Jerry | last post by:
Hi All, How can I prevent a script from running when a previous instance of the script had already been triggered and the script is running in the background already? So, even when a script is...
2
by: anonieko | last post by:
Scenario: You have a page that is TOO slow to refresh. But it allows partial flushing of html contents. I.e. Submit button already appears but you don't want your users to click on it prematurely...
10
by: Thomas Heller | last post by:
I'm about to add some logging calls to a library I have. How can I prevent that the script that uses the library prints 'No handlers could be found for logger "comtypes.client"' when the script...
3
by: Martin | last post by:
I'm having trouble getting a new PHP/MySQl installation to work. Windows XP Pro, IIS 5.1, PHP 5.1.1, MySQL 5.0.16, ISAPI This is a new computer. The whole setup is for development use only -...
5
by: chris.hearson | last post by:
How do I programmatically prevent a service from stopping? I want to be able to keep my service running (started), under certain conditions, when a user tries to stop it. I have tried throwing an...
1
by: Andy | last post by:
My application is written in .NET (C#) with the inline Edit mode from DataGrid. When a SAVE button is pressed it will perform a action on the database either to add, edit or delete data that a user...
0
by: Anonieko | last post by:
REM This batch file addresses issues that exist with MS03-32 with REM V1.0 of ASP.NET on Windows XP only REM If you have any other configuration, you should not need to run this @echo off if...
3
by: GarryJones | last post by:
I found this handy little script on the net that means the user can only press backspace or numbers in form input. <script type="text/javascript"> function numbersonly(e){ var...
6
Frinavale
by: Frinavale | last post by:
I am trying to ask the user for confirmation before a radio button is selected. Currently I'm handling the radio button's onclick event, asking the user to confirm their action and if they hit the...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.