Connecting Tech Pros Worldwide Help | Site Map

HTML to Plain Text

Newbie
 
Join Date: Feb 2007
Posts: 2
#1: Feb 19 '07
Can anybody suggest how to remove html tags while displaying the output

consider code as
<b> name </b>

now i want to display output in perl as
name.


reply as soon as possible.....
KevinADC's Avatar
Expert
 
Join Date: Jan 2007
Location: Southern California USA
Posts: 4,091
#2: Feb 19 '07

re: HTML to Plain Text


if you don't need or want to go the full html parser route:

http://perldoc.perl.org/perlfaq9.htm...om-a-string%3F

but if you do:

http://search.cpan.org/author/GAAS/H...3.56/Parser.pm
miller's Avatar
Moderator
 
Join Date: Oct 2006
Location: San Francisco, CA
Posts: 830
#3: Feb 19 '07

re: HTML to Plain Text


You basically have three options (and variations there of). Which method you choose depends on what the content of your html is, and what type of output that you want. Most likely, you'll use either method #1 or #3.

1) Remove Tags using a regular expression (Quick and Dirty) (Kevin's Suggestion #1)

Perl FAQ 9 - Removing HTML from String

2) HTML Parser - By tag parsing of html (Complicated and Verbose) (Kevin's Suggestion #2)

cpan HTML::Parser

3) HTML::FormatText - Generic Parsing of HTML to Plain Text

cpan HTML::FormatText
Newbie
 
Join Date: Feb 2007
Posts: 2
#4: Feb 21 '07

re: HTML to Plain Text


Thks frds for yr replies,



My problem got solved by using follwing code line

$record=~s/<.*?>/ /g;
miller's Avatar
Moderator
 
Join Date: Oct 2006
Location: San Francisco, CA
Posts: 830
#5: Feb 21 '07

re: HTML to Plain Text


Quote:

Originally Posted by prachi10

Thks frds for yr replies,

My problem got solved by using follwing code line

$record=~s/<.*?>/ /g;

Well done.

That was the first method suggested in the FAQ link. Read the first paragraph (excerpted below) to make sure that you aren't overlooking anything if this is a repeated need:

Quote:

Originally Posted by perldoc

Many folks attempt a simple-minded regular expression approach, like s/<.*?>//g, but that fails in many cases because the tags may continue over line breaks, they may contain quoted angle-brackets, or HTML comment may be present. Plus, folks forget to convert entities--like &lt; for example.

Reply