473,396 Members | 1,917 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

help: SIGABRT intermittent crash for threaded website crawleron python 2.4.4c1

I've been experiencing an intermittent crash where no python
stacktrace is provided. It happens for a url downloading process that
can last up to 12 hours and crawls about 50,000 urls.

I'm using urllib2 for the downloads. There are 5-10 downloading
threads, and some custom website exploration code for providing the
urls to crawl.

The downloads are completed in memory (not piped), then saved to a
file. There are also nice per domain / IP guidelines upheld so lots
of concurrent downloads and exploration are either waiting or taking
place sometimes up to 40 at once. As a result, I've seen the process
memory footprint clime upwards of 800 megs.

About 20-40% of the time, the entire process bails out with no
stacktrace, at random memory allocation and running time periods..
sometimes as little as 2 hours. My guess is that there is a bug in
urllib2 or some third party software I'm using, or it was not meant to
be run in a multithreaded environment. Decreasing the
bandwidth/aggressiveness of the crawler MAY seem to have an effect on
the frequency.. haven't done any formal 'studies' on that yet. My
current solution is to restart the crawler, but this is bad business
to the websites (recrawling), and extra crawl time on my part.

I bet if I switch to a 1-download-per-process scenario with pyro for
IPC (to uphold niceness rules, etc), I will fix this situation as I
suspect from reading similar SIGABRT issues that it has something to
do with the multi-threading. But I figured I'd ask around before I
take such drastic measures.

Since the process is so long-running, I have not tried running strace,
and I'm not even sure if it would make sense to me or someone else.
Let me know if you have a method of catching just the last 1000 calls
and not saving earlier ones or whatever, if that would be useful.

I'm using an older version of Python 2.4.4c1. Since the bug is
intermittent, I'm not sure yet if an upgrade to Pyhton 2.5 has solved
my problem.

Does anyone have any clues for me to try? My threading code uses a
messaging queue per thread, and one notification queue that the main
thread checks and assigns new crawls back to free threads. No other
variables are referenced by multiple threads other than the thread
objects themselves (to my knowledge).
Oct 3 '08 #1
0 1197

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Berteun Damman | last post by:
Hello, First I was trying to get PyOSD, but as soon as I did `import pyosd' Python received a SIGABRT. Then I wrote my own module, which looks like: #include <Python.h> static PyMethodDef...
1
by: Roger Davis | last post by:
I am having a problem with a program that allocates very large amounts of memory (approaching 2Gb total) in small chunks, e.g., a few Mb at a time. The program is dumping core because it...
5
by: Charlie | last post by:
Dear all, I'm running a trace analyzer over a large trace file(several gig hz). However it stopped in the middle. I got the call stack from the gdb. I wonder if anyone could figure out the...
8
by: Ben | last post by:
Hi, I am having trouble debugging a segmentation fault...here's my data structure: typedef struct CELL *pCELL; /* Pointers to cells */ struct CELL { SYMBOL symbol; pCELL prev_in_block;...
4
by: Russell Warren | last post by:
I've been having a hard time tracking down a very intermittent problem where I get a "permission denied" error when trying to rename a file to something that has just been deleted (on win32). ...
14
by: Hendrik van Rooyen | last post by:
Hi, I get the following: hvr@LINUXBOXMicrocorp:~/Controller/libpython display.py UpdateStringProc should not be invoked for type font Aborted and I am back at the bash prompt - this is...
0
by: =?Utf-8?B?QnJhZA==?= | last post by:
We are developing a complex ActiveX control and for the most part all is well. We test this in many environments and one thing we have noticed is that ALL of our C# apps (using .NET 2003) have an...
1
by: jpw | last post by:
I am writing a Python / C++ embed app and it need to work on 3 platforms I have the PYTHONPATH variable set correctly and have gone back and downloaded compiled and installed the latest Python...
0
by: Joey Bersche | last post by:
I've been experiencing an intermittent crash where no python stacktrace is provided. It happens for a url downloading process that can last up to 12 hours and crawls about 50,000 urls. I'm...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.