473,402 Members | 2,064 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,402 software developers and data experts.

Error with long running web spider

Hi everyone:

I have a spider that is relatively long running (somewhere between
12-24 hours). My problem is that I keep having an issue where the
program appears to freeze. Once this freezing happens the activity of
the program drops to zero. No exception is thrown or caught. The
program simply stops doing anything. It even stops printing out its
activity to stdout. The program itself appears to run in about 14
megs of memory. Basically, the program looks up pages on a particular
website, and then reads the HTML of those pages, parses it (lots of
long regular expressions are used), and saves the found information to
an object (which is later translated to SQL and the SQL is written to
a file).

I've actually had this same problem with several long running Python
programs. Any ideas?

Thanks in advance.

Aug 22 '07 #1
3 1163
On Aug 22, 10:58 am, Josh Volz <jdv...@gmail.comwrote:

I'm running this program on Windows XP, using Python 2.5. I'm using
Active State Komodo IDE 4.0 as the run environment.

Thanks,
J.

Hi everyone:

I have a spider that is relatively long running (somewhere between
12-24 hours). My problem is that I keep having an issue where the
program appears to freeze. Once this freezing happens the activity of
the program drops to zero. No exception is thrown or caught. The
program simply stops doing anything. It even stops printing out its
activity to stdout. The program itself appears to run in about 14
megs of memory. Basically, the program looks up pages on a particular
website, and then reads the HTML of those pages, parses it (lots of
long regular expressions are used), and saves the found information to
an object (which is later translated to SQL and the SQL is written to
a file).

I've actually had this same problem with several long running Python
programs. Any ideas?

Thanks in advance.

Aug 22 '07 #2
Josh Volz <jd****@gmail.comwrote:
I have a spider that is relatively long running (somewhere between
12-24 hours). My problem is that I keep having an issue where the
program appears to freeze. Once this freezing happens the activity of
the program drops to zero. No exception is thrown or caught. The
program simply stops doing anything. It even stops printing out its
activity to stdout. The program itself appears to run in about 14
megs of memory. Basically, the program looks up pages on a particular
website, and then reads the HTML of those pages, parses it (lots of
long regular expressions are used), and saves the found information to
an object (which is later translated to SQL and the SQL is written to
a file).

I've actually had this same problem with several long running Python
programs. Any ideas?
If you were running under unix I'd suggest you "strace" the process to
see what it is doing. There are windwows strace programs (which I've
never tried) too!

You'll probably find it is wedged in TCP socket code.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Aug 22 '07 #3
In message <11**********************@l22g2000prc.googlegroups .com>, Josh
Volz wrote:
My problem is that I keep having an issue where the
program appears to freeze. Once this freezing happens the activity of
the program drops to zero. No exception is thrown or caught. The
program simply stops doing anything. It even stops printing out its
activity to stdout.
What happens afterwards? Does it continue running as though nothing had
happened? Throw an exception?

From the output that appears beforehand, does it look like the freeze is
always happening in the same place?
Aug 27 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Auction software | last post by:
Free download full version , all products http://netauction8.url4life.com/ Groupawy --------------- Google Groups Email spider. The first email spider for google groups. Millions of valid...
0
by: mmarkzon | last post by:
I have been struggling compiling linkchecker from http://linkchecker.sourceforge.net/. The last thing I get is "error: command 'gcc' failed with exit status 1" which is not very helpful. This is...
0
by: Auction software | last post by:
Free download full version , all products from Mewsoft dot com http://netauction8.url4life.com/ Groupawy --------------- Google Groups Email spider. The first email spider for google groups....
7
by: newsgroups.comcast.net | last post by:
I am having an odd problem. Every day I get application errors from my website. The errors are only from one particular ip address (The error is System.NullReferenceException: Object reference not...
2
by: f0zzyNUE | last post by:
hi everyone, currently we are testing the performance our application (asp.net based CMS) ... for that reason we wrote a "spider" that starts webrequests for all relevant pages which results in...
8
by: jonbutler88 | last post by:
Just writing a simple website spider in python, keep getting these errors, not sure what to do. The problem seems to be in the feed() function of htmlparser. Traceback (most recent call last):...
2
by: =?Utf-8?B?Q2hhcnRz?= | last post by:
I have been writing C# programs to spider yellow page to get list of restaurant name, address to the database. When I encounter button or hyperlink, I don’t know how to use the program to click...
2
by: akhilesh.noida | last post by:
I am trying to compile glibc-2.5 for ARM based board. But I am getting errors while configuring it. Please check and give your inputs for resolving this. configure command : $...
5
by: Tony | last post by:
I am continuing to develop an Access 2007 application which was originally converted from Access 2003. In Access 2003 I was able to disable the Access Close button in the top righthand corner of...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.