473,396 Members | 1,760 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Regular expression to remove all html tags except for p and br

Hi all

Can someone help me out with a regex to remove all html tags except for <p>,</p>,<br>,<br/> from a string

Thank

Jim
Jul 21 '05 #1
4 11532
Hi jim,

Thanks for posting in the community.

Currently I am looking for somebody who could help you on it. We will reply
here with more information as soon as possible.
If you have any more concerns on it, please feel free to post here.
Thanks!

Best regards,

Gary Chang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------

Jul 21 '05 #2
Hi Gary

I'm just following up to see if you have had any luck with this

Thank

Jim
Jul 21 '05 #3
Hello Jim,

Thanks for your post. I wrote the following pattern which will remove all
html tags except for <p>, </p>, <br> and </br>:

<[^/bp][^>]*>|<p[a-z][^>]*>|<b[^r][^>]*>|<br[a-z][^>]*>|</[^bp]+>|</p[a-z]+>
|</b[^r]+>|</br[a-z]+>

Please check it on your side and let know your result.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Jul 21 '05 #4
With negative lookahead in .NET regular expressions, you can write this in a
much simpler form:

<(?!br|/br|p|/p>.+?>

That will match everything inside of <> except for br, /br, p, or /p, and
you can use that to replace all those tags with an empty string. This is
also more robust as you don't have to make sure you hit all the tags. I
noticed that <script> is noticeably absent from the list below, which could
possibly lead to a security exploit (somebody enters script code, and when
it gets echoed back, it executes on a user's computer).

You will want to use a case-insensitive match or you won't allow the
uppercase versions of the strings.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Tian Min Huang" <ti******@online.microsoft.com> wrote in message
news:YC****************@cpmsftngxa06.phx.gbl...
Hello Jim,

Thanks for your post. I wrote the following pattern which will remove all
html tags except for <p>, </p>, <br> and </br>:

<[^/bp][^>]*>|<p[a-z][^>]*>|<b[^r][^>]*>|<br[a-z][^>]*>|</[^bp]+>|</p[a-z]+> |</b[^r]+>|</br[a-z]+>

Please check it on your side and let know your result.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Jul 21 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Andrew Dixon - Depictions.net | last post by:
Hi Everyone. I have been working on some code that strips the HTML code out of an HTML page leaving just the text on the page. At the moment this is what I have: // Strip all tags...
1
by: dave | last post by:
I have an html document created through MS Word (save as html). I would like to find a regular expression that can be used to remove all of the formatting. Any help would be greatly appreciated....
2
by: applemonster100 | last post by:
I have an xml string which I need to remove certain <error> node from. I can recognise the <error> nodes I want to delete from their attributes. For example, I need to replace the following with a...
4
by: James Geurts | last post by:
Hi all Can someone help me out with a regex to remove all html tags except for <p>,</p>,<br>,<br/> from a string Thank Jim
1
by: yonido | last post by:
hello, my goal is to get patterns out of email files - say "message forwarding" patterns (message forwarded from: xx to: yy subject: zz) now lets say there are tons of these patterns (by gmail,...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
1
by: Steve B. | last post by:
Hi, I'm building a web site that can render html from various user input. The problem is that the html cannot be trusted, so I need to ensure it does not contain script attack injection. That's...
1
by: AndiSmith | last post by:
Hi, I'm adding some old database fields to view in our new C# .NET system with cascading stylesheets, and unfortunately the old data has HTML formatting in there which I would like to be removed...
14
by: Andy B | last post by:
I need to create a regular expression that will match a 5 digit number, a space and then anything up to but not including the next closing html tag. Here is an example: <startTag>55555 any...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.