using HTML::Parser

Divya Rao

Hi,
I need to parse a HTML file, and extract all the text in it (not the
images, tags). I cannot figure out how to do it. I have the HTML file
saved in my local directory. I need to have the text printed/saved in
my local directory. I would really appreciate any help in this regard.

Thanks,
Divya Rao

Jul 19 '05 #1

Subscribe Post Reply

5877

Jürgen Exner

Divya Rao wrote:

I need to parse a HTML file, and extract all the text in it (not the
images, tags). I cannot figure out how to do it. I have the HTML file
saved in my local directory. I need to have the text printed/saved in
my local directory. I would really appreciate any help in this regard.

HTML::Parser comes with one example application that does exactly that.
Unfortunately the examples are not included in the standard Perl
installation, so you will have to download the module and unpack it manually
to find the examples programs.

jue

Jul 19 '05 #2

Joe Smith

Divya Rao wrote:

Hi,
I need to parse a HTML file, and extract all the text in it (not the
images, tags). I cannot figure out how to do it. I have the HTML file
saved in my local directory. I need to have the text printed/saved in
my local directory. I would really appreciate any help in this regard.

unix% cat /usr/local/bin/nohtml
#!/usr/bin/perl -w
# Name: nohtml Author: Jo*******@inwap.com 07-Nov-2001
# Purpose: Extracts just the text portions of a document.

use strict;
use HTML::Parser ();

sub text_handler { # Ordinary text
print @_;
}

my $p = HTML::Parser->new(api_version => 3);
$p->handler( text => \&text_handler, "dtext");
$p->parse_file(shift || "-") || die $!;

1;

unix% cat /usr/local/bin/nh
#!/bin/sh
PATH=$PATH:/usr/local/bin; export PATH
nohtml - | less -s

Usage: while reading e-mail, pipe the message into '|nh'.
-Joe

Jul 19 '05 #3

by: Mitchua | last post by:

I am trying to use HTML::Parser to parse an HTML file, remove all HTML tags (including comments, etc.), replace all ENTITIES (e.g. &amp), and put the result into a variable as a string. I figure...

Perl

Erroneous Text Extraction using HTML::Parser

by: Himanshu Garg | last post by:

Hello, I am using HTML::Parser to extract text from html pages from http://bbc.co.uk/urdu/ However the encoding of the input text seems to change to some unknown encoding in the output. The...

Perl

Where to look for source of HTML::Parser

by: Himanshu Garg | last post by:

Hello, I am trying to pinpoint an apparent bug in HTML::Parser. The encoding of the text seems to change incorrectly if the locale isn't set properly. However Parser.pm in the directory...

Perl

can I know how to write a html parser in C

by: WUV999U | last post by:

Hi I am fairly familiar in C but not much. I want to know how I can write a html parser in C that only parses for the image file in the html file and display or print all the images found in...

C / C++

HTML parser

by: Craig Kenisston | last post by:

Hi, Could someone recommend any decedent html parser ? All what I need to do is to extract the links from a given page. I've found several tools in codeproject but all seems to be a bit...

C# / C Sharp

HTML Parser

by: SEGACO | last post by:

Hi, Can someone tell me if C# includes something to parse HTML? Thanks.

C# / C Sharp

Which class(es) should I use for an HTML Parser?

by: Jack | last post by:

How to quickly build an HTML parser with C#? Does anything like "HTMLParser" exist? Thanks in advance Jack

C# / C Sharp

malformed HTML parser wanted

by: Aaron Gray | last post by:

I am looking for an HTML parser that will parser malformed HTML Netscape bookmarks.html files. Ideally it will work with callbacks. Many thanks in advance, Aaron

PHP

Python HTML parser chokes on UTF-8 input

by: Johannes Bauer | last post by:

Hello group, I'm trying to use a htmllib.HTMLParser derivate class to parse a website which I fetched via httplib.HTTPConnection().request().getresponse().read(). Now the problem is: As soon as...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

using HTML::Parser

Similar topics