470,591 Members | 2,306 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,591 developers. It's quick & easy.

using HTML::Parser

Hi,
I need to parse a HTML file, and extract all the text in it (not the
images, tags). I cannot figure out how to do it. I have the HTML file
saved in my local directory. I need to have the text printed/saved in
my local directory. I would really appreciate any help in this regard.

Thanks,
Divya Rao
Jul 19 '05 #1
2 5689
Divya Rao wrote:
I need to parse a HTML file, and extract all the text in it (not the
images, tags). I cannot figure out how to do it. I have the HTML file
saved in my local directory. I need to have the text printed/saved in
my local directory. I would really appreciate any help in this regard.


HTML::Parser comes with one example application that does exactly that.
Unfortunately the examples are not included in the standard Perl
installation, so you will have to download the module and unpack it manually
to find the examples programs.

jue
Jul 19 '05 #2
Divya Rao wrote:
Hi,
I need to parse a HTML file, and extract all the text in it (not the
images, tags). I cannot figure out how to do it. I have the HTML file
saved in my local directory. I need to have the text printed/saved in
my local directory. I would really appreciate any help in this regard.


unix% cat /usr/local/bin/nohtml
#!/usr/bin/perl -w
# Name: nohtml Author: Jo*******@inwap.com 07-Nov-2001
# Purpose: Extracts just the text portions of a document.

use strict;
use HTML::Parser ();

sub text_handler { # Ordinary text
print @_;
}

my $p = HTML::Parser->new(api_version => 3);
$p->handler( text => \&text_handler, "dtext");
$p->parse_file(shift || "-") || die $!;

1;

unix% cat /usr/local/bin/nh
#!/bin/sh
PATH=$PATH:/usr/local/bin; export PATH
nohtml - | less -s

Usage: while reading e-mail, pipe the message into '|nh'.
-Joe
Jul 19 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Himanshu Garg | last post: by
3 posts views Thread by Himanshu Garg | last post: by
14 posts views Thread by WUV999U | last post: by
2 posts views Thread by Craig Kenisston | last post: by
3 posts views Thread by SEGACO | last post: by
1 post views Thread by Aaron Gray | last post: by
5 posts views Thread by Johannes Bauer | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.