473,238 Members | 1,841 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,238 software developers and data experts.

Library for crawling forums

I'm trying to write a utility to crawl forums and strip posts to be
gone through offline. Just the content, I don't need to get who posted
or sigs or any identifying info.

Can anyone suggest a library that is already geared toward this?

Oct 11 '07 #1
1 1182
I'm trying to write a utility to crawl forums and strip posts to be
gone through offline. Just the content, I don't need to get who posted
or sigs or any identifying info.

Can anyone suggest a library that is already geared toward this?
Maybe a combination of mechanize [1] and BeautifulSoup [2]?

[1] http://wwwsearch.sourceforge.net/mechanize/
[2] http://www.crummy.com/software/BeautifulSoup/
Thomas Wittek
Web: http://gedankenkonstrukt.de/
Jabber: st*********@jabber.i-pobox.net
GPG: 0xF534E231
Oct 11 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

by: Benjamin Lefevre | last post by:
I am currently developping a web crawler, mainly crawling mobile page (wml, mobile xhtml) but not only (also html/xml/...), and I ask myself which speed I can reach. This crawler is developped in...
by: relisoft | last post by:
Seattle, WA -- Seattle-based Reliable Software® announces the release their Windows Library into the public domain. Reliable Software Windows Library, RSWL, is the foundation for their compact,...
by: Björn | last post by:
Hi I´m searching for a dotnet library for creating pdf files on the fly. While searching at google I found a lot of libraries, but most libraries are handling Text as a kind of Image and I have...
by: Brian Henry | last post by:
Is it possible to do this... I want to make a DLL file full of reports done in crystal reports, but then i want to get a listing of all the reports in the dll file (kind of an available report...
by: Tomás | last post by:
Is there anywhere on the internet where you can download actual source code of an implementation of the C++ library? Stuff like: namespace std { class string { // actual code in here } }
by: rincewind | last post by:
No sure it's not an off-topic here (in that case please tell me the right newsgroup), but is there more C++-friendly library for XML processing than implementations of DOM? I think there must have...
by: uanmi | last post by:
Please create an Enterprise Library Forum asap. There is no help on gotdotnet for the many people asking questions. My project is stuck without some answers. -- regards, Mark
by: Nickolai Leschov | last post by:
Hello all, I am programming an embedded controller that has a 'C' library for using its system functions (I/O, timers, all the specific devices). The supplied library has .LIB and .H files. ...
by: teressa | last post by:
Hi Everyone, I was given a task to fix our printer friendly pages: Best practice recommendation was to dynamically load a JavaScript page. I have an asp page which is a printer-friendly page...
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.