474,048 Members | 1,757 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

reg exp to clean html tags

hi to all!
I've tried in all the way but I can't find a solution
I show you an example:
I have the following html code:

<div id="aaa">text inside<br/> a div</div><span class="bbb"> text
inside a span</span><img src="blabla"/><p class='ccc'
style='blablabl a'>text inside a <b style="FONT-SIZE:
14px">paragraph </b> with some<!--Hey I'm a f$*!#ng comment--> bold
text</p>

I need to tranform this text into clean html, like this:

text inside a div text inside a span<p>text inside a paragraph with
some bold text</p>

I've tried this:

function cleareTags(){
var stringToReplace = String(document .theform.Articl eText.value);
var re=/(<\/?p)(?:\s[^>]*)?(>)|<[^>]*>/gi;
document.thefor m.ArticleText.v alue = stringToReplace .replace(re,'') ;
}

but nothing...it removes all the tags

I've tried also this way

function cleareTags(){
var r=/<(\w{1})([^>]*)>(.*)<\/\1>/gmi;
document.thefor m.ArticleText.v alue = stringToReplace .replace(r,
function(s,tag, attr,inner) {
//alert (inner)
return
(tag.toUpperCas e()=="P"?"<"+ta g+">"+inner+" </"+tag+">":inner );
}
);
}

but nothing happen...

please help me
My resources are at the end
bye
marco

Jul 23 '05 #1
5 8164
Jc
ma*******@gmail .com wrote:
hi to all!
I've tried in all the way but I can't find a solution
I show you an example:
I have the following html code:

<div id="aaa">text inside<br/> a div</div><span class="bbb"> text
inside a span</span><img src="blabla"/><p class='ccc'
style='blablabl a'>text inside a <b style="FONT-SIZE:
14px">paragraph </b> with some<!--Hey I'm a f$*!#ng comment--> bold
text</p>

I need to tranform this text into clean html, like this:

text inside a div text inside a span<p>text inside a paragraph with
some bold text</p>


You can probably do this using regular expression's and a replace()
call, using the buffer and non-greedy operators (as one option), but
rather than write such a regex for you it would be best if you could
allow an actual HTML parser (such as a browser) to do the work for you.

If you don't care about browser compatibility (you are using this just
for yourself), you could use innerHTML and innerText in IE. For
example, add the HTML to an object in the document using innerHTML or
insertAdjacentH TML, and then read it back using innerText.

Jul 23 '05 #2
thanks for the answer...
the compatibility is to be only with IE6

but sorry I haven't understood quite well your answer...javasc ript it's
not my language and the boss has given me anyway this problem to solve

how can I do with innerHTML?

Jul 23 '05 #3
Jc
ma*******@gmail .com wrote:
thanks for the answer...
the compatibility is to be only with IE6

but sorry I haven't understood quite well your answer...javasc ript it's
not my language and the boss has given me anyway this problem to solve

how can I do with innerHTML?


Here's an example:

<body>

<div id="divContaine r"></div>

<script>
var sHTML = "<div>a<div >b</div></div>";
divContainer.in nerHTML = sHTML;
alert(divContai ner.innerText);
</script>

</body>

Jul 23 '05 #4
<ma*******@gmai l.com> wrote in message news:11******** **************@ g43g2000cwa.goo glegroups.com.. .
hi to all!
I've tried in all the way but I can't find a solution
I show you an example:
I have the following html code:

<div id="aaa">text inside<br/> a div</div><span class="bbb"> text
inside a span</span><img src="blabla"/><p class='ccc'
style='blablabl a'>text inside a <b style="FONT-SIZE:
14px">paragraph </b> with some<!--Hey I'm a f$*!#ng comment--> bold
text</p>

I need to tranform this text into clean html, like this:

text inside a div text inside a span<p>text inside a paragraph with
some bold text</p>

I've tried this:

function cleareTags(){
var stringToReplace = String(document .theform.Articl eText.value);
var re=/(<\/?p)(?:\s[^>]*)?(>)|<[^>]*>/gi;
document.thefor m.ArticleText.v alue = stringToReplace .replace(re,'') ;
}

but nothing...it removes all the tags

I've tried also this way

function cleareTags(){
var r=/<(\w{1})([^>]*)>(.*)<\/\1>/gmi;
document.thefor m.ArticleText.v alue = stringToReplace .replace(r,
function(s,tag, attr,inner) {
file://alert (inner)
return
(tag.toUpperCas e()=="P"?"<"+ta g+">"+inner+" </"+tag+">":inner );
}
);
}

but nothing happen...

please help me
My resources are at the end
bye
marco


I assume you're trying to remove all tags except <p></p>
Unless someone knows another way, you can do it in two separate operations.
The first removes all tags except <p></p> but does not remove parameters within the opening <p> tag.

str=str.replace (/<(?!\s*p\s*|\w> |\s*\/\s*p\s*>)[^>]+>/gi, "")

str=str.replace (/<\s*p[^>]+>/gi, "<p>");

This isn't a universal solution, but on recent browsers it works on your example text.
--
Stephen Chalmers http://makeashorterlink.com/?H3E82245A

547265617375726 520627572696564 206174204F2E532 E207265663A2054 51323437393134

Jul 23 '05 #5
<ma*******@gmai l.com> wrote in message news:11******** **************@ g43g2000cwa.goo glegroups.com.. .
hi to all!
I've tried in all the way but I can't find a solution
I show you an example:
I have the following html code:

<div id="aaa">text inside<br/> a div</div><span class="bbb"> text
inside a span</span><img src="blabla"/><p class='ccc'
style='blablabl a'>text inside a <b style="FONT-SIZE:
14px">paragraph </b> with some<!--Hey I'm a f$*!#ng comment--> bold
text</p>

I need to tranform this text into clean html, like this:

text inside a div text inside a span<p>text inside a paragraph with
some bold text</p>

I've tried this:

function cleareTags(){
var stringToReplace = String(document .theform.Articl eText.value);
var re=/(<\/?p)(?:\s[^>]*)?(>)|<[^>]*>/gi;
document.thefor m.ArticleText.v alue = stringToReplace .replace(re,'') ;
}

but nothing...it removes all the tags

I've tried also this way

function cleareTags(){
var r=/<(\w{1})([^>]*)>(.*)<\/\1>/gmi;
document.thefor m.ArticleText.v alue = stringToReplace .replace(r,
function(s,tag, attr,inner) {
file://alert (inner)
return
(tag.toUpperCas e()=="P"?"<"+ta g+">"+inner+" </"+tag+">":inner );
}
);
}

but nothing happen...

please help me
My resources are at the end
bye
marco

The first operation removes all tags except <p></p>; the second removes parameters within the opening <p> tag.
It won't handle nested <> characters.
I have avoided the use of lookahead assertions.

str=str.replace (/<\s*([a-oq-z]|p\w|\!)[^>]*>|<\s*\/\s*([a-oq-z]|p\w)[^>]*>/gi, "");

str=str.replace (/<\s*p[^>]+>/gi, "<p>");

--
Stephen Chalmers


Jul 23 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
1901
by: shank | last post by:
I'm querying a text field with an 8000 character limit. The text also contains HTML tags like <p> <br> and more. Is there a way to strip all HTML tags in the resulting recordset, or do I have to replace each tag individually? thanks
5
3805
by: Donald Firesmith | last post by:
Are html tags allowed within meta tags? Specifically, if I have html tags within a <definition> tag within XML, can I use the definition as the content within the <meta content="description> tag? If not, is there an easy way to strip the html tags from the <definition> content before inserting the content into the meta tag?
15
6067
by: Jeff North | last post by:
Hi, I'm using a control called HTMLArea which allows a person to enter text and converts the format instructions to html tags. Most of my users know nothing about html so this is perfect for my use. http://www.interactivetools.com/products/htmlarea/ This only works with IE5.5+. What I need to do is to take this html formatted text and only display part of the text on a web page (much like a news article which shows only part of the...
18
10557
by: Robert Bowen | last post by:
Hello peeplez. I have an odd problem. When I put the ANSI symbol for "less than" ("<"), the word STRONG and then the ANSI symbol for "greater than" (">") in my web page, followed by some text, then close the STRONG tag the same way, my text appears in bold. No problem. When I do the same things with the corresponding HTML tags (&lt; , &gt; ) the tag is not interpreted, it is simply displayed: <STRONG>text</STRONG>
1
1445
by: coder10 | last post by:
Two things 1. Why does ASP.Net not respect code formatting for the HTML tags. I mean, when I write my HTML, I try to use appropriate tabs and spaces and lines beween my tags, but when I save and open in Design View and come back, my HTML view loses all its code formatting. 2. ASP.Net has been inserting "DESGNTIMESP" tags in random places in my HTML view, because of which my pages do not look what they are supposed to look like in design...
4
4195
by: Spondishy | last post by:
Hi, I'm looking for help with a regular expression and c#. I want to remove all tags from a piece of html except the following. <a> <b> <h1> <h2>
10
3143
by: Barry L. Camp | last post by:
Hi all... hope someone can help out. Not a unique situation, but my search for a solution has not yielded what I need yet. I'm trying to come up with a regular expression for a RegularExpressionValidator that will allow certain HTML tags: <a>, <b>, <blockquote>, <br>, <i>, <img>, <li>, <ol>, <p>, <quote>, <ul>
17
1835
by: V S Rawat | last post by:
I joined this ng and tried to post my first message that had a small php code (HTML and all). my newsserver aioe.net rejected the post saying "HTML Tags". My message was in text format, not in html format, but it obviously had html tags. Now, a php or a perl/ cgi or a javascript ng is always going to had html tags in posted messages. What is the point in not allowing Html tags?
1
3013
by: SM | last post by:
Hello, I have a couple of XML files that represent articles. Each XML file is unique. Meaning that overall the structure is the same but some tags in the xml file are not in the same place or doesn't exist. i.e. <DOC> <title>Book title</title> <p>This is a paragrap</p> <header>Header</header>
0
10556
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10357
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
12161
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
11147
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
10329
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8716
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6667
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
6864
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
4951
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.