pagecrawling websites with Python

writeson

Hi all,

We've got an application we wrote in Python called pagecrawler that
generates a list of URL's based on sql queries. It then runs through
this list of URL's 'browsing' one of our staging servers for all those
URL's. We do this to build the site dynamically, but each page
generated by the URL is saved as a static HTML file. Anyway, the
pagecrawler program uses Python threads to try and build the pages as
fast as it can. The list of URL's is stored in a queue and the thread
objects get URL's from the queue and run them till the queue is empty.
This works okay but it still seems to take a long time to build the
site this way, even though the actual pages only take milliseconds to
run (the pages are generated with PHP on separate server). Does anyone
have any insight if this is a reasonable approach to build web pages,
or if we should look at another design?

Thanks in advance,
Doug

Jul 18 '05 #1

Subscribe Post Reply

1157

Swaroop C H

On 1 Apr 2005 11:58:11 -0800, writeson <wr******@charter.net> wrote:

We've got an application we wrote in Python called pagecrawler that <snip /> Does anyone have any insight if this is a reasonable approach to build web pages,
or if we should look at another design?

I don't have an answer to your particular question, but maybe you can
have a look at how the HarvestMan works:

http://freshmeat.net/projects/harvestman
Regards,
--
Swaroop C H
Blog: http://www.swaroopch.info
Book: http://www.byteofpython.info

Jul 18 '05 #2

writeson

Swaroop,

Thanks for the reply, I'll take a look at HarvestMan and see if we can
use it directly, or get some ideas from the source code. :)

Doug

Jul 18 '05 #3

Similar topics

Loading websites in parallel

by: Markus Franz | last post by:

Hi. I have a difficult problem: An array contains several different URLs. I want to load these websites in parallel by using a HTTP-Request. How can I do this in PHP? Up to now I did this...

PHP

python on websites

by: Sebastian Kress | last post by:

Hi, I'm terribly sorry for this very easy question, but I really would like to know that :). I've been programming in Python quite a while now and mostly coped quite ok. Now my Webhoster sent...

Python

Fetching websites with Python

by: Markus Franz | last post by:

Hi. How can I grab websites with a command-line python script? I want to start the script like this: ../script.py ---xxx--- http://www.address1.com http://www.address2.com...

Python

High volume websites using Python web server software?

by: Irmen de Jong | last post by:

Hi, Things like Twisted, medusa, etc.... that claim to be able to support hundreds of concurrent connections because of the async I/O framework they're based on.... can someone give a few...

Python

Dr. Dobb's Python-URL! - weekly Python news and links (Jan 30)

by: Magnus Lycka | last post by:

QOTW: " can be very confusing for newbies and peoples having no experience with *dynamic* languages, and I guess control-freaks and static-typing-addicts would runaway screaming. But I like it...

Python

Python - Web Display Technology

by: SamFeltus | last post by:

I am trying to figure out why so little web development in Python uses Flash as a display technology. It seems most Python applications choose HTML/CSS/JS as the display technology, yet Flash is a...

Python

Python tools for managing static websites?

by: Chris Pearl | last post by:

Are there Python tools to help webmasters manage static websites? I'm talking about regenerating an entire static website - all the HTML files in their appropriate directories and...

Python

We want to employ or contract with a programmer who knows Python

by: JimWest1234 | last post by:

Which websites should I look at to post the position, or to find people looking for work? They would ideally be based in the Home Counties, UK We are looking to employ somebody, or for them to work...

General

Re: Any tips on Python web development on Mac OS

by: Bruno Desthuilliers | last post by:

Tim Greening-Jackson a écrit : (snip) You're not going to get anywhere without learning (x)html and css IMHO. Even using a "graphical" html editor like Dreamweaver requires having a good enough...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware