473,385 Members | 1,320 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

pagecrawling websites with Python

Hi all,

We've got an application we wrote in Python called pagecrawler that
generates a list of URL's based on sql queries. It then runs through
this list of URL's 'browsing' one of our staging servers for all those
URL's. We do this to build the site dynamically, but each page
generated by the URL is saved as a static HTML file. Anyway, the
pagecrawler program uses Python threads to try and build the pages as
fast as it can. The list of URL's is stored in a queue and the thread
objects get URL's from the queue and run them till the queue is empty.
This works okay but it still seems to take a long time to build the
site this way, even though the actual pages only take milliseconds to
run (the pages are generated with PHP on separate server). Does anyone
have any insight if this is a reasonable approach to build web pages,
or if we should look at another design?

Thanks in advance,
Doug

Jul 18 '05 #1
2 1157
On 1 Apr 2005 11:58:11 -0800, writeson <wr******@charter.net> wrote:
We've got an application we wrote in Python called pagecrawler that <snip /> Does anyone have any insight if this is a reasonable approach to build web pages,
or if we should look at another design?


I don't have an answer to your particular question, but maybe you can
have a look at how the HarvestMan works:

http://freshmeat.net/projects/harvestman
Regards,
--
Swaroop C H
Blog: http://www.swaroopch.info
Book: http://www.byteofpython.info
Jul 18 '05 #2
Swaroop,

Thanks for the reply, I'll take a look at HarvestMan and see if we can
use it directly, or get some ideas from the source code. :)

Doug

Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Markus Franz | last post by:
Hi. I have a difficult problem: An array contains several different URLs. I want to load these websites in parallel by using a HTTP-Request. How can I do this in PHP? Up to now I did this...
1
by: Sebastian Kress | last post by:
Hi, I'm terribly sorry for this very easy question, but I really would like to know that :). I've been programming in Python quite a while now and mostly coped quite ok. Now my Webhoster sent...
6
by: Markus Franz | last post by:
Hi. How can I grab websites with a command-line python script? I want to start the script like this: ../script.py ---xxx--- http://www.address1.com http://www.address2.com...
7
by: Irmen de Jong | last post by:
Hi, Things like Twisted, medusa, etc.... that claim to be able to support hundreds of concurrent connections because of the async I/O framework they're based on.... can someone give a few...
0
by: Magnus Lycka | last post by:
QOTW: " can be very confusing for newbies and peoples having no experience with *dynamic* languages, and I guess control-freaks and static-typing-addicts would runaway screaming. But I like it...
49
by: SamFeltus | last post by:
I am trying to figure out why so little web development in Python uses Flash as a display technology. It seems most Python applications choose HTML/CSS/JS as the display technology, yet Flash is a...
9
by: Chris Pearl | last post by:
Are there Python tools to help webmasters manage static websites? I'm talking about regenerating an entire static website - all the HTML files in their appropriate directories and...
1
by: JimWest1234 | last post by:
Which websites should I look at to post the position, or to find people looking for work? They would ideally be based in the Home Counties, UK We are looking to employ somebody, or for them to work...
1
by: Bruno Desthuilliers | last post by:
Tim Greening-Jackson a écrit : (snip) You're not going to get anywhere without learning (x)html and css IMHO. Even using a "graphical" html editor like Dreamweaver requires having a good enough...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.