473,412 Members | 1,944 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,412 software developers and data experts.

removing html tags:

24
hi guys,
i have a html doc,for which i want to remove the tags leaving only the text.can anyone help me out.
Sep 17 '07 #1
2 1357
numberwhun
3,509 Expert Mod 2GB
hi guys,
i have a html doc,for which i want to remove the tags leaving only the text.can anyone help me out.
It is always a good idea to look on CPAN before posting questions. Most of us are going to reference there for answers first as all of the Perl modules are stored there, and are searchable.

If you go to CPAN and type "remove html" into the search box, you will be presented with a link to HTML::Obliterate. Now, I don't know how well the module works, but its always worth a shot to try something that sounds like it will do what you want to.

Regards,

Jeff
Sep 17 '07 #2
KevinADC
4,059 Expert 2GB
Jeff,

A quick look at that modules source code is revealing:

Expand|Select|Wrap|Line Numbers
  1. sub remove_html_from_string {
  2.     my($string) = @_;
  3.     $string =~ s{ < \W* \w+ [^>]* > }{}xmsg;
  4.     return $string;
  5. }
So it's just using that one regexp to try and remove html tags. It probably works OK for the most part but is not a real html parser, like HTML::Parser is.
Sep 17 '07 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: chotiwallah | last post by:
i have a little database driven content managment system. people can load up html-docs. some of them use ms word as their html-editor, which resultes in loads of "class" and "style" attributes -...
11
by: rajarao | last post by:
hi I want to remove the content embedded in <script> and </script> tags submitted via text box. My java script should remove the content embedded between <script> and </script> tag. my current...
2
by: Raja Kannan | last post by:
Is there a way to remove text portion from the HTML keeping the HTML Tags using the browser, say javascript RegEx or something ? I have seen lot of examples removing HTML tags to get the text...
3
by: keith | last post by:
Hi, I'm using WebClient to retrieve the contents of a particular page. I would like to get a string containing only the page's text and no html markup. How can I do this? Is there a class to...
2
by: Nathan Sokalski | last post by:
I have a section in my ASP.NET code where I have an HTML unordered list. Visual Studio keeps removing the closing list item tags, except for the last list item. In other words, Visual Studio makes...
4
by: dave | last post by:
I have a csss/html question that perhaps the experts here can help me with. The following code leaves small gaps between the div tags. how can I remove them? <div > <div > <div <div > <img...
3
by: sebzzz | last post by:
Hi, I'm doing a little script with the help of the BeautifulSoup HTML parser and uTidyLib (HTML Tidy warper for python). Essentially what it does is fetch all the html files in a given...
3
by: alimsdb | last post by:
By using Formatter.pm in perl we can remove all html tags. But I want to keep tag <a href and remove all other tags. Can any body help me to change Formatter.pm file to do this task. sub a_start...
5
by: litun | last post by:
when i am reading the html file i am getting da tags .after storing it as a text file also same tags are stored but when i want to extract a particular sentence i am getting da same tags in the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.