473,544 Members | 1,923 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Error with long running web spider

Hi everyone:

I have a spider that is relatively long running (somewhere between
12-24 hours). My problem is that I keep having an issue where the
program appears to freeze. Once this freezing happens the activity of
the program drops to zero. No exception is thrown or caught. The
program simply stops doing anything. It even stops printing out its
activity to stdout. The program itself appears to run in about 14
megs of memory. Basically, the program looks up pages on a particular
website, and then reads the HTML of those pages, parses it (lots of
long regular expressions are used), and saves the found information to
an object (which is later translated to SQL and the SQL is written to
a file).

I've actually had this same problem with several long running Python
programs. Any ideas?

Thanks in advance.

Aug 22 '07 #1
3 1169
On Aug 22, 10:58 am, Josh Volz <jdv...@gmail.c omwrote:

I'm running this program on Windows XP, using Python 2.5. I'm using
Active State Komodo IDE 4.0 as the run environment.

Thanks,
J.

Hi everyone:

I have a spider that is relatively long running (somewhere between
12-24 hours). My problem is that I keep having an issue where the
program appears to freeze. Once this freezing happens the activity of
the program drops to zero. No exception is thrown or caught. The
program simply stops doing anything. It even stops printing out its
activity to stdout. The program itself appears to run in about 14
megs of memory. Basically, the program looks up pages on a particular
website, and then reads the HTML of those pages, parses it (lots of
long regular expressions are used), and saves the found information to
an object (which is later translated to SQL and the SQL is written to
a file).

I've actually had this same problem with several long running Python
programs. Any ideas?

Thanks in advance.

Aug 22 '07 #2
Josh Volz <jd****@gmail.c omwrote:
I have a spider that is relatively long running (somewhere between
12-24 hours). My problem is that I keep having an issue where the
program appears to freeze. Once this freezing happens the activity of
the program drops to zero. No exception is thrown or caught. The
program simply stops doing anything. It even stops printing out its
activity to stdout. The program itself appears to run in about 14
megs of memory. Basically, the program looks up pages on a particular
website, and then reads the HTML of those pages, parses it (lots of
long regular expressions are used), and saves the found information to
an object (which is later translated to SQL and the SQL is written to
a file).

I've actually had this same problem with several long running Python
programs. Any ideas?
If you were running under unix I'd suggest you "strace" the process to
see what it is doing. There are windwows strace programs (which I've
never tried) too!

You'll probably find it is wedged in TCP socket code.

--
Nick Craig-Wood <ni**@craig-wood.com-- http://www.craig-wood.com/nick
Aug 22 '07 #3
In message <11************ **********@l22g 2000prc.googleg roups.com>, Josh
Volz wrote:
My problem is that I keep having an issue where the
program appears to freeze. Once this freezing happens the activity of
the program drops to zero. No exception is thrown or caught. The
program simply stops doing anything. It even stops printing out its
activity to stdout.
What happens afterwards? Does it continue running as though nothing had
happened? Throw an exception?

From the output that appears beforehand, does it look like the freeze is
always happening in the same place?
Aug 27 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
2372
by: Auction software | last post by:
Free download full version , all products http://netauction8.url4life.com/ Groupawy --------------- Google Groups Email spider. The first email spider for google groups. Millions of valid and active emails in one easy location to collect. Spiderawy
0
1700
by: mmarkzon | last post by:
I have been struggling compiling linkchecker from http://linkchecker.sourceforge.net/. The last thing I get is "error: command 'gcc' failed with exit status 1" which is not very helpful. This is with Python 2.4.1 on Solaris 2.8. Can anyone help? Thank you. -> python setup.py build creating...
0
2012
by: Auction software | last post by:
Free download full version , all products from Mewsoft dot com http://netauction8.url4life.com/ Groupawy --------------- Google Groups Email spider. The first email spider for google groups. Millions of valid and active emails in one easy location to collect. Spiderawy
7
1243
by: newsgroups.comcast.net | last post by:
I am having an odd problem. Every day I get application errors from my website. The errors are only from one particular ip address (The error is System.NullReferenceException: Object reference not set to an instance of an object....) The logs show that the request appear to be coming from some kind of spider or bot. They have no url referrer...
2
3260
by: f0zzyNUE | last post by:
hi everyone, currently we are testing the performance our application (asp.net based CMS) ... for that reason we wrote a "spider" that starts webrequests for all relevant pages which results in filling the pageoutputcache on server side. so far so good. if we start one spider at once our application works fine, but if we run more than...
8
8534
by: jonbutler88 | last post by:
Just writing a simple website spider in python, keep getting these errors, not sure what to do. The problem seems to be in the feed() function of htmlparser. Traceback (most recent call last): File "spider.py", line 38, in <module> s.crawl(site) File "spider.py", line 30, in crawl self.parse(url) File "spider.py", line 21, in parse
2
3423
by: =?Utf-8?B?Q2hhcnRz?= | last post by:
I have been writing C# programs to spider yellow page to get list of restaurant name, address to the database. When I encounter button or hyperlink, I don’t know how to use the program to click the button or hyperlink. Does anyone have this type of sample code in either C#, vb.net? Thanks, Charts
2
6344
by: akhilesh.noida | last post by:
I am trying to compile glibc-2.5 for ARM based board. But I am getting errors while configuring it. Please check and give your inputs for resolving this. configure command : $ ../glibc-2.5/configure --prefix=/mnt/new/Mars/glibc_HQ_test/GLIBC/ install/ --with-__thread --enable-kernel=2.6.11 --enable-shared
5
5868
by: Tony | last post by:
I am continuing to develop an Access 2007 application which was originally converted from Access 2003. In Access 2003 I was able to disable the Access Close button in the top righthand corner of the screen. I have been unable to find any way to disable this button in Access 2007 and subsequently I have been forced to find ways to detect and...
0
7387
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7738
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
5956
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5316
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
4938
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3441
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3436
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1862
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1004
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.