By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,871 Members | 1,209 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,871 IT Pros & Developers. It's quick & easy.

Need to extract portion of HTML page...

P: n/a
I have few hundred HTML pages.
I need to extract portion of each HTML page into a text/database/HTML
files format. You can imagine it is very tedious to do one by one.
Is there any automatic process/software/tool available that could help
me extract information form mass HTML files?

I can specify what portion of file to take or leave. I have some tag
like:

<!--topic start-->

<!--topic End-->

I need to get information between these two tags. I hope you understand
what I am trying to do here. If you need any explanation/clarification
let me know.

Any help will be highly appreciated.
Thanks,
Rahman15

Jan 23 '06 #1
Share this Question
Share on Google+
3 Replies


P: n/a
In article <11**********************@g14g2000cwa.googlegroups .com>,
"rahman" <ra******@gmail.com> wrote:
I can specify what portion of file to take or leave. I have some tag
like:

<!--topic start-->

<!--topic End-->

I need to get information between these two tags.


Sounds like a job for sed (stream editor), starting with a command like
sed '/begin-marker/,/end-marker/s/this/that/g' inputfile and how
elaborate you get after that depends upon how much processing you want
to do on the text between the starting and end tags. Sed would be
already installed on Unix, Linux and Macintosh. There would also be
free versions available for Windows. Google sed stream editor version
for Windows. Not really an HTML question (although being lazy I now
generate most of my web page table of contents links with a line of sed).

--
http://www.ericlindsay.com
Jan 23 '06 #2

P: n/a
rahman wrote:
I have few hundred HTML pages.
I need to extract portion of each HTML page into a text/database/HTML
files format. You can imagine it is very tedious to do one by one.
Is there any automatic process/software/tool available that could help
me extract information form mass HTML files?

I can specify what portion of file to take or leave. I have some tag
like:

<!--topic start-->

<!--topic End-->


Perl is particularly good at that sort of thing. You can
download it for just about any operating system.
Jan 23 '06 #3

P: n/a
HTML Table Extractor
www.ieext.com

Jan 24 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.