473,326 Members | 2,081 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

Regular expression to remove all html tags except for p and br

Hi all

Can someone help me out with a regex to remove all html tags except for <p>,</p>,<br>,<br/> from a string

Thank

Jim
Jul 21 '05 #1
4 11511
Hi jim,

Thanks for posting in the community.

Currently I am looking for somebody who could help you on it. We will reply
here with more information as soon as possible.
If you have any more concerns on it, please feel free to post here.
Thanks!

Best regards,

Gary Chang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------

Jul 21 '05 #2
Hi Gary

I'm just following up to see if you have had any luck with this

Thank

Jim
Jul 21 '05 #3
Hello Jim,

Thanks for your post. I wrote the following pattern which will remove all
html tags except for <p>, </p>, <br> and </br>:

<[^/bp][^>]*>|<p[a-z][^>]*>|<b[^r][^>]*>|<br[a-z][^>]*>|</[^bp]+>|</p[a-z]+>
|</b[^r]+>|</br[a-z]+>

Please check it on your side and let know your result.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Jul 21 '05 #4
With negative lookahead in .NET regular expressions, you can write this in a
much simpler form:

<(?!br|/br|p|/p>.+?>

That will match everything inside of <> except for br, /br, p, or /p, and
you can use that to replace all those tags with an empty string. This is
also more robust as you don't have to make sure you hit all the tags. I
noticed that <script> is noticeably absent from the list below, which could
possibly lead to a security exploit (somebody enters script code, and when
it gets echoed back, it executes on a user's computer).

You will want to use a case-insensitive match or you won't allow the
uppercase versions of the strings.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://weblogs.asp.net/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
"Tian Min Huang" <ti******@online.microsoft.com> wrote in message
news:YC****************@cpmsftngxa06.phx.gbl...
Hello Jim,

Thanks for your post. I wrote the following pattern which will remove all
html tags except for <p>, </p>, <br> and </br>:

<[^/bp][^>]*>|<p[a-z][^>]*>|<b[^r][^>]*>|<br[a-z][^>]*>|</[^bp]+>|</p[a-z]+> |</b[^r]+>|</br[a-z]+>

Please check it on your side and let know your result.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

Jul 21 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Andrew Dixon - Depictions.net | last post by:
Hi Everyone. I have been working on some code that strips the HTML code out of an HTML page leaving just the text on the page. At the moment this is what I have: // Strip all tags...
1
by: dave | last post by:
I have an html document created through MS Word (save as html). I would like to find a regular expression that can be used to remove all of the formatting. Any help would be greatly appreciated....
2
by: applemonster100 | last post by:
I have an xml string which I need to remove certain <error> node from. I can recognise the <error> nodes I want to delete from their attributes. For example, I need to replace the following with a...
4
by: James Geurts | last post by:
Hi all Can someone help me out with a regex to remove all html tags except for <p>,</p>,<br>,<br/> from a string Thank Jim
1
by: yonido | last post by:
hello, my goal is to get patterns out of email files - say "message forwarding" patterns (message forwarded from: xx to: yy subject: zz) now lets say there are tons of these patterns (by gmail,...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
1
by: Steve B. | last post by:
Hi, I'm building a web site that can render html from various user input. The problem is that the html cannot be trusted, so I need to ensure it does not contain script attack injection. That's...
1
by: AndiSmith | last post by:
Hi, I'm adding some old database fields to view in our new C# .NET system with cascading stylesheets, and unfortunately the old data has HTML formatting in there which I would like to be removed...
14
by: Andy B | last post by:
I need to create a regular expression that will match a 5 digit number, a space and then anything up to but not including the next closing html tag. Here is an example: <startTag>55555 any...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.