impressed with it's possibilities. I do however need some help
with certain things I'm trying to do which as of yet haven't
managed to find the answer by myself. Hopefully, someone will be
able to give me some pointers :)
First my background, I haven't programmed seriously in over 5
years, but recently have started programming again in
Delphi/Pascal scripting, and that's what I'm most familiar with
right now. I'm also much more confortable with structured
programming in contrast to OO (which isn't helping much with
Python :))
Anyway, I have a very specific project in mind which I've mostly
implemented in Pascal and I'd like to implement it in Python
since the possibilities after that are much more interesting.
Basically, I'm getting a htmlsource from a URL and need to
a.) find specific URLs
b.) find specific data
c.) with specific URLs, load new html pages and repeat.
I've managed to load the html source I want into an object
called htmlsource using:
import urllib
sock = urllib.urlopen("URL Link")
htmlSource = sock.read()
sock.close()
I'm assuming that htmlSource is a string with \n at the end of
each line.
NOTE: I've become very accustomed with the TStringList class in
Delphi so forgive me if I'm trying to work in that way with
Python...
Basically, I want to search through the whole string(
htmlSource), for a specific keyword, when it's found, I want to
know which line it's on so that I can retrieve that line and
then I should be able to parse/extract what I need using Regular
Expressions (which I'm getting quite confortable with). So how
can this be accomplished?
Second main thing I'd like to know has to do with urllister, I'm
very intrigued by it's use of grabbing automatically url links
from the source. but I've only managed to get it to retrive
everything, which is a lot. what are my options in term of
getting it to be more specific? Can I tell it to retrieve a URL
IF a keyword is found on the same string line?
Hopefully someone will be able able/willing to give me a hand, I
think with these roadblocks out of the way, I should be able to
figure out the rest of what I need. Thanks in advance!
Benji99
----------------------------------------------
Posted with NewsLeecher v1.0 Final
* Binary Usenet Leeching Made Easy
* http://www.newsleecher.com/?usenet
----------------------------------------------