Bill Latvin (in
44************* ****@news.veriz on.net) said:
| On Sun, 11 Jun 2006 21:46:03 -0500, "Morris Dovey"
| <mr*****@iedu.c om> wrote:
|
|| Medros (in
11************* *********@j55g2 00...legr oups.com)
|| said:
||
||| I understand that you can strip html out of a txt file so that all
||| the information is left is the visable information that is needed
||| (e.g. everything that has < > around is gone). My question is that
||| I have a table of information that I need to be fed into a program
||| as such. Well kind of I need the program to read it just as you
||| would on paper and be able to use that information like it was
||| entered. I am unsure how strip so much away just to leave me with
||| the information I want and then use it like I want. Any help?
||
|| Start with a simple program that reads and saves one character at a
|| time looking for a '<' character. When it finds a '<', it should
|| throw it (and following characters) away until it finds a '>'.
|| When the program reaches end-of-file, hopefully it's saved what
|| you want to keep.
||
| I remember starting with a simple program like that, and finding to
| my dismay that between the "script" and "/script" tags the '<' and
| '>' characters are used not as tag delimiters but as "greater than"
| and "less than" comparison operators. I had to check for those
| particular tags and discard everything between them, and not let
| the presence of a lone unbalanced '<' in the script cause my logic
| to miss finding the "/string" tag.
Welcome to the club. It's because of things like that that I added my
second paragraph:
"You'll probably discover that you want to add refinements (perhaps to
deal with HTML encodings like and < - but those can wait on
getting the initial version working."
The refinements will depend on whether the OP wants a general solution
or just enough to extract data from one particular page. On
re-reading, I'd guess is that <table>, <tr>, and <td> tags may be his
1st refinement - but the question indicated that he'll probably need
to start at the most basic level.
--
Morris Dovey
DeSoto Solar
DeSoto, Iowa USA
http://www.iedu.com/DeSoto