Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old January 23rd, 2006, 05:15 AM
rahman
Guest
 
Posts: n/a
Default Need to extract portion of HTML page...

I have few hundred HTML pages.
I need to extract portion of each HTML page into a text/database/HTML
files format. You can imagine it is very tedious to do one by one.
Is there any automatic process/software/tool available that could help
me extract information form mass HTML files?

I can specify what portion of file to take or leave. I have some tag
like:

<!--topic start-->

<!--topic End-->

I need to get information between these two tags. I hope you understand
what I am trying to do here. If you need any explanation/clarification
let me know.

Any help will be highly appreciated.
Thanks,
Rahman15

  #2  
Old January 23rd, 2006, 06:45 AM
Eric Lindsay
Guest
 
Posts: n/a
Default Re: Need to extract portion of HTML page...

In article <1137992589.882961.140340@g14g2000cwa.googlegroups .com>,
"rahman" <rahman15@gmail.com> wrote:
[color=blue]
> I can specify what portion of file to take or leave. I have some tag
> like:
>
> <!--topic start-->
>
> <!--topic End-->
>
> I need to get information between these two tags.[/color]

Sounds like a job for sed (stream editor), starting with a command like
sed '/begin-marker/,/end-marker/s/this/that/g' inputfile and how
elaborate you get after that depends upon how much processing you want
to do on the text between the starting and end tags. Sed would be
already installed on Unix, Linux and Macintosh. There would also be
free versions available for Windows. Google sed stream editor version
for Windows. Not really an HTML question (although being lazy I now
generate most of my web page table of contents links with a line of sed).

--
http://www.ericlindsay.com
  #3  
Old January 23rd, 2006, 07:15 AM
mbstevens
Guest
 
Posts: n/a
Default Re: Need to extract portion of HTML page...

rahman wrote:[color=blue]
> I have few hundred HTML pages.
> I need to extract portion of each HTML page into a text/database/HTML
> files format. You can imagine it is very tedious to do one by one.
> Is there any automatic process/software/tool available that could help
> me extract information form mass HTML files?
>
> I can specify what portion of file to take or leave. I have some tag
> like:
>
> <!--topic start-->
>
> <!--topic End-->[/color]

Perl is particularly good at that sort of thing. You can
download it for just about any operating system.
  #4  
Old January 24th, 2006, 10:05 AM
ieext@gawab.com
Guest
 
Posts: n/a
Default Re: Need to extract portion of HTML page...

HTML Table Extractor
www.ieext.com

 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles