By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,714 Members | 750 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,714 IT Pros & Developers. It's quick & easy.

Using regular expressions in internet searches

P: n/a
What is the best way to use regular expressions to extract information
from the internet if one wants to search multiple pages? Let's say I
want to search all of www.cnn.com and get a list of all the words that
follow "Michael."

(1) Is Python the best language for this? (Plus is it time-efficient?)
Is there already a search engine that can do this?

(2) How can I search multiple web pages within a single location or
path?

TIA,

Mike

Jul 21 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
mi***********@gmail.com wrote:
What is the best way to use regular expressions to extract information
from the internet if one wants to search multiple pages? Let's say I
want to search all of www.cnn.com and get a list of all the words that
follow "Michael."

(1) Is Python the best language for this? (Plus is it time-efficient?)
Is there already a search engine that can do this?

(2) How can I search multiple web pages within a single location or
path?


You'd probably better off using htdig.

Diez
Jul 21 '05 #2

P: n/a
Python would be good for this, but if you just want a chuck an rumble
solution might be.
bash $wget -r --ignore-robots -l 0 -c -t 3 http://www.cnn.com/
bash $ grep -r "Micheal.*" ./www.cnn.com/*

Or you could do a wget/python mix

like

import sys
import re
sys.os.command("wget -r --ignore-robots -l 0 -c -t 3
http://ww.cnn.com/")
re_iraq=re.compile("iraq .+?",re.IGNORECASE)

while "file in dirs under ./www.cnn.com/ "
iraqs = re_iraq.findall(file.read())
print iraqs

Jul 21 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.