473,545 Members | 2,047 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to use HTML::Parser to remove HTML tags and print result

I am trying to use HTML::Parser to parse an HTML file, remove all HTML tags
(including comments, etc.), replace all ENTITIES (e.g. &amp), and put the
result into a variable as a string. I figure HTML::Parser itself can
somehow preform the filtering, but how do I get it back as a string? I'd
appreciate some sample code if anyone has any. Sorry if this is a real n00b
question.

Thanks a lot,
Mitchua

Jul 19 '05 #1
1 11907

"Mitchua" <mi*****@yahoo. com> wrote in message
news:pv******** ************@ne ws01.bloor.is.n et.cable.rogers .com...
I am trying to use HTML::Parser to parse an HTML file, remove all HTML tags (including comments, etc.), replace all ENTITIES (e.g. &amp), and put the
result into a variable as a string. I figure HTML::Parser itself can
somehow preform the filtering, but how do I get it back as a string? I'd
appreciate some sample code if anyone has any. Sorry if this is a real n00b question.

Thanks a lot,
Mitchua


Try this for a sample of parsing a webpage
http://www.wdvl.com/Authoring/Langua...ummarizer.html
If you are just trying to remove all the html tags, you could just do this
$webpage =~ s/<.*?>//g;

Ice Demon
http://adult-xxx-newsgroups.com
http://adult-cybergames.com
http://adult-spider.com
Jul 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
2712
by: Himanshu Garg | last post by:
Hello, I am using HTML::Parser to extract text from html pages from http://bbc.co.uk/urdu/ However the encoding of the input text seems to change to some unknown encoding in the output. The program is given below. The HTML is in a string to keep the example simple. The same problem appears with HTML in a file.
2
5883
by: Divya Rao | last post by:
Hi, I need to parse a HTML file, and extract all the text in it (not the images, tags). I cannot figure out how to do it. I have the HTML file saved in my local directory. I need to have the text printed/saved in my local directory. I would really appreciate any help in this regard. Thanks, Divya Rao
3
7718
by: Mark | last post by:
Hi, I am using a program that is ultra paranoid about start and end html tags. For example <p>This is a test <br>A new line The above code causes the program to fail
6
2820
by: wilk | last post by:
Is anybody know here any class in .NET that would help me to parse html in C# ? Or maybe you can even tell me how to do it? -- -- -------------------------------------- Pozdrawiam WILK --------------------------------------
14
3119
by: WUV999U | last post by:
Hi I am fairly familiar in C but not much. I want to know how I can write a html parser in C that only parses for the image file in the html file and display or print all the images found in the html file. How to go about it?
7
3640
by: majid | last post by:
I want write a program with c# to pars a html file how ccan i do this with system.mshtml? or there is other way to do it p;ease help me?
5
1410
by: Just Another Victim of the Ambient Morality | last post by:
I'm trying to parse HTML in a very generic way. So far, I'm using SGMLParser in the sgmllib module. The problem is that it forces you to parse very specific tags through object methods like start_a(), start_p() and the like, forcing you to know exactly which tags you want to handle. I want to be able to handle the start tags of any and all...
0
889
by: Rama Jayapal | last post by:
can anyone solve my problem i have developed a webapplication where i have parsed the contents of the webpage using MILHTML parser from codeproject.com i have the document now in html format i need to use the parser's attributes like
2
1581
by: Chris | last post by:
Can anyone recommend a good HTML/XHTML parser, similar to HTMLParser.HTMLParser or htmllib.HTMLParser, but able to intelligently know that certain tags, like <br>, are implicitly closed? I need to iterate through the entire DOM, building up a DOM path, but the stdlib parsers aren't calling handle_endtag() for any implicitly closed tags. I...
0
7475
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7664
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7918
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7436
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
5981
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
4958
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3463
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3446
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1897
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.