473,324 Members | 2,417 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

indexing web pages - in python?


Are there any open source search engines written in python for indexing a
given collection of (internal only) html pages? Right now I'm talking
about dozens, but hopefully it'll be hundreds or thousands at some point.

I'm thinking some sort of CGI script, with perhaps a cron job that updates
the indexes.

I'm not particularly looking for something that has a full RDBMS behind
it - just a file that stores indexes. I'll go with an RDBMS-based
solution if I must, but I don't think that's really needed at this point.

TIA

Apr 19 '07 #1
1 1643
On Apr 18, 8:55 pm, Dan Stromberg <dstromb...@datallegro.comwrote:
Are there any open source search engines written in python for indexing a
given collection of (internal only) html pages? Right now I'm talking
about dozens, but hopefully it'll be hundreds or thousands at some point.

I'm thinking some sort of CGI script, with perhaps a cron job that updates
the indexes.

I'm not particularly looking for something that has a full RDBMS behind
it - just a file that stores indexes. I'll go with an RDBMS-based
solution if I must, but I don't think that's really needed at this point.

TIA
You could try:

http://gnosis.cx/download/indexer.py

There is an extensive write-up by the author at:

http://gnosis.cx/publish/programming..._python_15.txt

Might be something you'd be interested in ...

Apr 19 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
by: Hilde Roth | last post by:
This may have been asked before but I can't find it. If I have a rectangular list of lists, say, l = ,,], is there a handy syntax for retrieving the ith item of every sublist? I know about for i...
108
by: Bryan Olson | last post by:
The Python slice type has one method 'indices', and reportedly: This method takes a single integer argument /length/ and computes information about the extended slice that the slice object would...
3
by: Lauchlan M | last post by:
Hi I have a grid that displays ok. I have activated page indexing, the code to implement it is: private void dGridSessionData_PageIndexChanged(object source,...
0
by: Z D | last post by:
Hello, I've installed and setup MS indexing service on my webserver (IIS v6). It seems to work fine as basic searches are indeed working. The problem, however, is that indexing service does...
1
by: Byron | last post by:
Hey, I'm fussing around with a first attempt at using IIS6's indexing service for a web site search page. The trouble is, my site, while not using a database, is largely dynamic, with much of...
4
by: Emin | last post by:
Dear Experts, How much slower is dict indexing vs. list indexing (or indexing into a numpy array)? I realize that looking up a value in a dict should be constant time, but does anyone have a...
6
by: 78ncp | last post by:
hi... how to implementation algorithm latent semantic indexing in python programming...?? thank's for daniel who answered my question before.. -- View this message in context:...
3
by: Rüdiger Werner | last post by:
Hello! Out of curiosity and to learn a little bit about the numpy package i've tryed to implement a vectorised version of the 'Sieve of Zakiya'. While the code itself works fine it is...
9
by: maheswaran | last post by:
Hi all, I developed one application. From that application i created dynamic pages contact us , about us...(like joomla, but application is not in joomla)... These all are comes from database.In...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.