473,503 Members | 1,722 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Regular expression for cleaning html safely

Hi,

I'm building a web site that can render html from various user input.
The problem is that the html cannot be trusted, so I need to ensure it does
not contain script attack injection.
That's why I'd like to provide a set of allowed tag and to remove other
ones.

I think about regular expression. However, I was able to find some regex
samples that remove a set a untrusted tags (scripts, iframe, etc), but I'd
like to allow only a set of tag, because the regex can only remove "well
formed" tags : <scriptw/o </scriptwont't be removed.

So does anyone have a regex that remove any content between tags that are
not in a safe list ?
And if possible, is it possible to remove any attribute that can be
potentially dangerous ? (<span onload="javascript:attack(...)">)

Thanks in advance
Sep 4 '06 #1
1 2022
You may give www.regexlib.com a shot.

"Steve B." <st**********@com.msn_swap_msn_and_comwrote in message
news:%2****************@TK2MSFTNGP03.phx.gbl...
Hi,

I'm building a web site that can render html from various user input.
The problem is that the html cannot be trusted, so I need to ensure it does
not contain script attack injection.
That's why I'd like to provide a set of allowed tag and to remove other
ones.

I think about regular expression. However, I was able to find some regex
samples that remove a set a untrusted tags (scripts, iframe, etc), but I'd
like to allow only a set of tag, because the regex can only remove "well
formed" tags : <scriptw/o </scriptwont't be removed.

So does anyone have a regex that remove any content between tags that are
not in a safe list ?
And if possible, is it possible to remove any attribute that can be
potentially dangerous ? (<span onload="javascript:attack(...)">)

Thanks in advance
Sep 4 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2360
by: sheffdog | last post by:
Hello, I often find myself cleaning up strings like the following: setAttr ".ftn" -type "string" /assets/chars/ /boya/geo/textures/lod1/ppbhat.tga"; Using regular expressions, the best I...
4
5098
by: Buddy | last post by:
Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following...
4
3204
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...
3
347
by: Mark | last post by:
I'm having trouble creating a regular expression to parse bits of data from a string and was hoping someone could lead me in the right direction. Consider the following string 423456 Victor...
3
1282
by: ProvoWallis | last post by:
Hi, I'm looking for a little advice about regular expressions. I want to capture a string of text that falls between an opening squre bracket and a closing square bracket (e.g., "") but I've run...
25
5128
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
12
4835
by: ll | last post by:
I am trying to finalize a regular expression in javascript to only allow emails with a certain domain to be valid. Here is what I have so far: var emailFilter2=/\@aol.com/;...
3
1816
by: =?Utf-8?B?VEo=?= | last post by:
Hi, I want to know how Regular Expression can be used in this situation. I want to replace some string in specific condition.. The condition is to replace string only if the string is NOT inside...
8
3102
by: Uwe Schmitt | last post by:
Hi, Is anobody aware of this post: http://swtch.com/~rsc/regexp/regexp1.html ? Are there any plans to speed up Pythons regular expression module ? Or is the example in this artricle too...
0
7086
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
6991
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7462
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5578
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5014
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4673
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3154
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1512
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
382
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.