Connecting Tech Pros Worldwide Forums | Help | Site Map

[TABLE NOT SHOWN] problem with HTML::Parse

Mitchua
Guest
 
Posts: n/a
#1: Jul 19 '05
When I run the well quoted line:
my $ascii =
HTML::FormatText->new->format(HTML::Parse::parse_html($html));
to remove HTML tags from an html document, it replaces all tables with
"[TABLE NOT SHOWN]". Is there a quick and easy way to get the table content
parsed too?

Thanks a lot,
Mitchua



James E Keenan
Guest
 
Posts: n/a
#2: Jul 19 '05

re: [TABLE NOT SHOWN] problem with HTML::Parse



"Mitchua" <mitchua@yahoo.com> wrote in message
news:EJiPa.115702$2ay.14173@news01.bloor.is.net.ca ble.rogers.com...[color=blue]
> When I run the well quoted line:
> my $ascii =
> HTML::FormatText->new->format(HTML::Parse::parse_html($html));
> to remove HTML tags from an html document, it replaces all tables with
> "[TABLE NOT SHOWN]". Is there a quick and easy way to get the table[/color]
content[color=blue]
> parsed too?
>[/color]
The documentation for HTML::FormatText states: "Formatting of HTML tables
and forms is not implemented." So not with that module. The documentation
makes a reference to HTML::Formatter
(http://search.cpan.org/author/SBURKE...L/Formatter.pm
), which in turn contains references to other modules that may be of some
help.


James E Keenan
Guest
 
Posts: n/a
#3: Jul 19 '05

re: [TABLE NOT SHOWN] problem with HTML::Parse



"Mitchua" <mitchua@yahoo.com> wrote in message
news:YRHPa.6477$sI91.949@news04.bloor.is.net.cable .rogers.com...[color=blue]
>
> Are there any other (easy) ways to remove all html tags (including tricky
> tags like comments, etc.) from a web page without using those modules?[/color]
I'm[color=blue]
> looking for a solution beyond a regular expression.
>[/color]
"Easy": no. That's why we have all those modules in the HTML section of
CPAN -- the solution is always difficult, messy and "beyond a regular
expression."

I note that in your OP you used HTML::Parse. The 1-line description of this
indicates that it is deprecated. Have you looked into HTML::Parser? People
speak highly of that module.


Mitchua
Guest
 
Posts: n/a
#4: Jul 19 '05

re: [TABLE NOT SHOWN] problem with HTML::Parse


"James E Keenan" <jkeen@concentric.net> wrote in message
news:beovoq$k2f@dispatch.concentric.net...[color=blue]
>
> "Mitchua" <mitchua@yahoo.com> wrote in message
> news:YRHPa.6477$sI91.949@news04.bloor.is.net.cable .rogers.com...[color=green]
> >
> > Are there any other (easy) ways to remove all html tags (including[/color][/color]
tricky[color=blue][color=green]
> > tags like comments, etc.) from a web page without using those modules?[/color]
> I'm[color=green]
> > looking for a solution beyond a regular expression.
> >[/color]
> "Easy": no. That's why we have all those modules in the HTML section of
> CPAN -- the solution is always difficult, messy and "beyond a regular
> expression."
>
> I note that in your OP you used HTML::Parse. The 1-line description of[/color]
this[color=blue]
> indicates that it is deprecated. Have you looked into HTML::Parser?[/color]
People[color=blue]
> speak highly of that module.
>[/color]

I found this code on the web that uses it:

use HTML::Parser;
$p = HTML::Parser->new;
$p->parse($notes); # parse the HTML in notes
$p->eof; # signal end of parse file
print $p->as_string; # print out the parsed text

but i get the error "Can't locate ../HTML/Parser/as_string.al". I'm looking
for that file now.

Jonathan


Closed Thread