473,386 Members | 1,785 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

remove all html tags by perl

Could someone tell me how to remove all html tags (and anything inside tags)
by perl. Some people suggested me to use HTML::TagFilter but i could not
find window version. Thanks very much for your help.

JJL
Jul 19 '05 #1
5 10504
jjliu wrote:
Could someone tell me how to remove all html tags (and anything
inside tags) by perl.


Sure.

s/.*//s;

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Jul 19 '05 #2
Thanks.What i wanted is to remove head tag and anything inside it. Could you
help me out.

"Gunnar Hjalmarsson" <no*****@gunnar.cc> ????
news:KD********************@newsc.telia.net...
jjliu wrote:
Could someone tell me how to remove all html tags (and anything
inside tags) by perl.


Sure.

s/.*//s;

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Jul 19 '05 #3

"Gunnar Hjalmarsson" <no*****@gunnar.cc> wrote in message
news:KD********************@newsc.telia.net...
jjliu wrote:
Could someone tell me how to remove all html tags (and anything
inside tags) by perl.


Sure.

s/.*//s;


That will remove ALL characters. He really needs something along the lines
of:

s/\<[^\<]+\>//;

This only works if the entire TAG is within the same string. If the tag
spans multiple lines, they will need to be concatenated into 1 string.
Jul 19 '05 #4
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"Kris Wempa" <calmincents(NO_SPAM)@yahoo.com> wrote in
news:bm*********@kcweb01.netnews.att.com:

"Gunnar Hjalmarsson" <no*****@gunnar.cc> wrote in message
news:KD********************@newsc.telia.net...
jjliu wrote:
> Could someone tell me how to remove all html tags (and anything
> inside tags) by perl.
Sure.

s/.*//s;


That will remove ALL characters.


Gunnar knows that. :-)

He really needs something along the
lines of:

s/\<[^\<]+\>//;
Why all the backslashes?
Also, I suspect you meant the second < to be a >.

This only works if the entire TAG is within the same string. If the
tag spans multiple lines, they will need to be concatenated into 1
string.


It also doesn't work if anything within the tag or its attributes contain
a > symbol. Example:

<img src="mathexpression.gif" alt="5 is > 4" />
<input type="submit" onclick="if (count > 1) true else false" />

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP4ftJGPeouIeTNHoEQJxpACghIOdjOo5xr7rh9N5zQ6d9E F3KvIAmwdA
R0qdv3U33ZyBzW4L7u8Vq6jf
=sIdz
-----END PGP SIGNATURE-----
Jul 19 '05 #5
jjliu wrote:
Thanks.What i wanted is to remove head tag and anything inside it.
Could you help me out.
Only the head tag? Well, in that case a regexp similar to what Kris
suggested might be sufficient. But please note that normally you'd
better use a module when dealing with HTML code, and even if I have
never used the one you mentioned, it appears to be a good suggestion.
Some people suggested me to use HTML::TagFilter but i could not
find window version.


What do you mean by Windows version? What makes you think that
HTML::TagFilter doesn't work on Windows?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Jul 19 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Mitchua | last post by:
I am trying to use HTML::Parser to parse an HTML file, remove all HTML tags (including comments, etc.), replace all ENTITIES (e.g. &amp), and put the result into a variable as a string. I figure...
3
by: jjliu | last post by:
Could someone tell me how to parse the inside of html tags by perl, such as <meta> </meta> <head> </head> <title> </title> ......... Thanks
1
by: jjl | last post by:
Could someone help me on how to remove javascript tag from HTML file by perl, such as <script language='javascript'> something </script> I want to remove two script tags and something inside....
18
by: Shannon Jacobs | last post by:
Trying to solve this with a regex approach rather than the programmatic approach of counting up and down the levels. I have a fairly complicated HTML page that I want to simplify. I've been able to...
163
by: Shiperton Henethe | last post by:
Hi Know any good utilities to help me strip out the tags that Microsoft Excel 2002 leaved behind when you try and export an HTML format file? This is driving me NUTS. And really makes me...
12
by: Oberon | last post by:
I have a large HTML document. It has hundreds of <span>s which have no attributes so these <span>s are redundant. How can I remove these tags automatically? The document also has <span>s with...
3
by: ad | last post by:
I have a string , it is make up of html tag and some text, like: <font color=red>Town </font></strong<strong>... How can I remove the html tag form this string with C#
0
by: peter pilsl | last post by:
For feeding the content of an xml-file to a search-indexer I need to remove all tags and extract the plaintext out of a xml-file. I use the null-xls-stylesheet <?xml version="1.0"?>...
17
by: V S Rawat | last post by:
I joined this ng and tried to post my first message that had a small php code (HTML and all). my newsserver aioe.net rejected the post saying "HTML Tags". My message was in text format, not in...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.