473,651 Members | 2,580 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

regex/replace white list

Hi,

What is the best way to white list a set of allowable characters using
regex or replace? I understand it is safer to whitelist than to
blacklist, but am not sure how to go about it.

Many thanks!

Feb 17 '06 #1
4 4326
jg*****@gmail.c om wrote:
Hi,

What is the best way to white list a set of allowable characters using
regex or replace? I understand it is safer to whitelist than to
blacklist, but am not sure how to go about it.


Whether to use a white list (i.e. list of allowed characters) or a black
list (list of not allowed characters) is probably best decided by which
one gives the smaller list. I'm not sure 'safety' is an issue.

As far as a regular expression is concerned, the difference between the
two is whether to use the NOT (!) operator or not (or use an else
statement).

To build the white/black list, use a string of characters and the
RegExp() function as a constructor, e.g. if you want to disallow the
letter 'a' in a string, then:

var re = new RegExp('a');

will create a regular expression that can be used to match the letter
'a' anywhere, e.g.:

if ( re.test(someStr ing) )
{
// someString contains the letter 'a'
} else {
// someString doesn't contain the letter 'a'
}

or:

if ( ! re.test(someStr ing) )
{
// someString doesn't contain the letter 'a'
}

To make the regular expression case-insensitive, add the 'i' flag:

var re = new RegExp('a','i') ;
To match any word character or the '$' character:

var re = new RegExp('[\\w$]');
To match any non-word character (not part of: a-z, A-Z, 0-9):

var re = new RegExp('\\W');
You can build the expression and flags as string variables and use those:

var reString = '\\W'; // Expression string
var flString = 'g'; // Flag string
var re = new RegExp(reString , flString);
and so on... Search the archives for lots of examples.

--
Rob
Feb 17 '06 #2
RobG wrote:
To build the white/black list, use a string of characters and the
RegExp() function as a constructor, e.g. if you want to disallow the
letter 'a' in a string, then:

var re = new RegExp('a');

will create a regular expression that can be used to match the letter
'a' anywhere, [...]


While there is not much point in using the RegExp() constructor instead
of a Regular Expression literal when the expression is invariant. As was
discussed here recently, efficiency and compatibility are seldom an issue:

As for efficiency, the RegExp object created by a RegExp literal is created
before execution, and the literal is then merely a reference to that
object. The RegExp object is not recreated by repeated use of the same
literal (say, in a loop). (Which must be considered regarding efficiency,
though, since this will create a new RegExp object always if the expression
differs, unconditionally . Even if the object is used only when a certain
condition applies.)

As for compatibility, even though RegExp literals have not been specified
before ECMAScript Edition 3 (issued 1999, seven years ago already, though),
they are supported since JavaScript 1.2 (Netscape 4.0, June 1997) except
of the `m' modifier. They are supported including the `m' modifier since
JavaScript 1.5 (Mozilla/5.0 rv:0.6, November 2000) and JScript 3.0
(Internet Explorer 4.0, and Internet Information Server 4.0, October 1997).
(The problems that remain compared to ECMAScript Edition 3 are non-capturing
parantheses and non-greedy expressions that are not universally supported,
but you have to deal with those problems with the RegExp() constructor as
well.)

However, using the RegExp constructor removes and introduces a maintenance
problem. It removes the problem that Regular Expressions cannot span lines
because string concatenation serves the purpose. It introduces the problem
that one has to escape the expression twice: one time to avoid escape
sequences in the string literal, and again to have RegExp special
characters parsed as expression atoms instead. (This is often very
confusing to people who are fairly new to the language.)

var re = /a/;

and the like certainly suffices here.

As I final note, I want to add that if special features of Regular
Expressions compared to strings are not used, it is probably more
efficient not to use Regular Expressions at all. Instead of writing

if (re.test(someSt ring))

using the RegExp() constructor or the above RegExp object initializer,
it is probably more efficient to write

if (someString.ind exOf("a") > -1)

instead.
PointedEars
Feb 17 '06 #3
Thomas 'PointedEars' Lahn wrote:
RobG wrote:

To build the white/black list, use a string of characters and the
RegExp() function as a constructor, e.g. if you want to disallow the
letter 'a' in a string, then:

var re = new RegExp('a');

will create a regular expression that can be used to match the letter
'a' anywhere, [...]

While there is not much point in using the RegExp() constructor instead
of a Regular Expression literal when the expression is invariant.


My understanding of the request is that the string *is* variant. The OP
wishes to build a list of characters to allow/disallow, I presumed it
would not be hard-coded - though it might be built that way at the
server where the value is extracted from a database and the appropriate
value hard-coded into the script.

But I supposed that the value would written to some variable, which is
then accessed by the script, e.g.

var blackList = '$%#';

and then later:

var re = new RegExp('[' + blacklist + ']');

of a Regular Expression literal when the expression is invariant. As was
discussed here recently, efficiency and compatibility are seldom an issue:

As for efficiency, the RegExp object created by a RegExp literal is created
before execution, and the literal is then merely a reference to that
object. The RegExp object is not recreated by repeated use of the same
literal (say, in a loop). (Which must be considered regarding efficiency,
though, since this will create a new RegExp object always if the expression
differs, unconditionally . Even if the object is used only when a certain
condition applies.)
Quite true, I was addressing efficiency from the point of view of the
length of the expression. e.g. to allow only letters and digits, \w
will do the trick. To disallow only '@#$' then - [@#$] - is much
shorter than a list of everything else.

The difference in efficiency between using RegExp as a constructor and
using a literal in the above scenario is likely irrelevant (though I
understand your point and in general much prefer to use literals).

[...] However, using the RegExp constructor removes and introduces a maintenance
problem. It removes the problem that Regular Expressions cannot span lines
because string concatenation serves the purpose. It introduces the problem
that one has to escape the expression twice: one time to avoid escape
sequences in the string literal, and again to have RegExp special
characters parsed as expression atoms instead.
Escaping characters is always an issue, especially if multi-line input
is accepted. Should new lines & line feeds be allowed? The solution is
for the OP to learn about matching characters and apply that to their
particular circumstance.
[...]
var re = /a/;

and the like certainly suffices here.
Probably a result of my trivial example - a better example is below.

As I final note, I want to add that if special features of Regular
Expressions compared to strings are not used, it is probably more
efficient not to use Regular Expressions at all. Instead of writing

if (re.test(someSt ring))

using the RegExp() constructor or the above RegExp object initializer,
it is probably more efficient to write

if (someString.ind exOf("a") > -1)


If the need was a test for a specific character, then that would be
fine. Maybe you could use it with a loop to go through each character
in the black list, but how many characters/loops would it take before a
regular expression was faster?

The following example may be better:

<script type="text/javascript">

function checkList(blID, strID)
{
var blackList = document.getEle mentById(blID). value;
var inString = document.getEle mentById(strID) .value;
var re = new RegExp('[' + blackList + ']');
document.getEle mentById('xx'). innerHTML = re.test(inStrin g);
}
</script>
<label for="blackList" >Blacklist characters:<inp ut
type="text" id="blackList" value="\^\]$#@"></label><br>

<label for="inputText" >String to check:<input
type="text" id="inputText" value="Cost: $6"></label>

<input type="button" value="Check input with blacklist"
onclick="checkL ist('blackList' ,'inputText');" >

<div>Result: <span id="xx" style="font-weight: bold;">
<i>no check done yet...</i></span></div>
If new lines, line feeds, etc. need to be tested too, use a textarea
instead of a text input for the input string. Variations on how
browsers represent new lines may need to be accommodated too.

--
Rob
Feb 20 '06 #4
RobG wrote:
Thomas 'PointedEars' Lahn wrote:
However, using the RegExp constructor removes and introduces a
maintenance problem. It removes the problem that Regular Expressions
cannot span lines because string concatenation serves the purpose. It
introduces the problem that one has to escape the expression twice: one
time to avoid escape sequences in the string literal, and again to have
RegExp special characters parsed as expression atoms instead.
Escaping characters is always an issue, especially if multi-line input
is accepted. Should new lines & line feeds be allowed?


You misunderstood. This was not about matching newline in the input.
The solution is for the OP to learn about matching characters and apply
that to their particular circumstance.
My point was that

var rx = /very_long_Regul ar_Expression.a .b.c.d.e.f.g.h. i.j.k.l.m.n.o.p .
r.s.t.u.v.w.x.y .z.\..#.#.4.2.1 .3.3.7./

is not possible (consider the above a _hard_ line break to avoid crossing
the 80-columns border), but

var rx = new RegExp(
"very_long_Regu lar_Expression. a.b.c.d.e.f.g.h .i.j.k.l.m.n.o. p."
+ "r.s.t.u.v.w.x. y.z.\\..#.#.4.2 .1.3.3.7.");

(and the like) is. The latter introduces the maintenance problem that the
literal "." must be escaped twice, but it removes the maintenance problem
that literals are not allowed to span lines (in the source code).
As I final note, I want to add that if special features of Regular
Expressions compared to strings are not used, it is probably more
efficient not to use Regular Expressions at all. Instead of writing

if (re.test(someSt ring))

using the RegExp() constructor or the above RegExp object initializer,
it is probably more efficient to write

if (someString.ind exOf("a") > -1)


If the need was a test for a specific character, then that would be
fine. Maybe you could use it with a loop to go through each character
in the black list, but how many characters/loops would it take before a
regular expression was faster?


I do not know. This was a general note.
The following example may be better:
Maybe not :)
<script type="text/javascript">

function checkList(blID, strID)
{
var blackList = document.getEle mentById(blID). value;
var inString = document.getEle mentById(strID) .value;
A `form' element would have avoided the inefficient and not downwards
compatible referencing.

function checkList(f, blId, strID)
{
var es;
if (blID && strID
&& f && (es = f.elements)
&& es[blID] && es[strID])
{
var blackList = es[blID].value;
var inString = es[strID].value;

// ...
}
else
{
window.alert("f oobar!");
}

return false;
}

<form action="..."
onsubmit="check List(this, 'blackList', 'inputText');">
...
<input type="submit" value="Check input with blacklist">
</form>
var re = new RegExp('[' + blackList + ']');
What about the escaping part? You do not want the user to handle that,
do you?
document.getEle mentById('xx'). innerHTML = re.test(inStrin g);
Mixing standards compliant and proprietary DOM features unnecessarily.

es["xx"].style.fontStyl e = "normal"; // I prefer setStylePropert y()[1]
es["xx"].value = re.test(inStrin g);

<form ...>
...
<div>Result: <input id="xx"
value="no check done yet..."
style="border:0 ; font-weight:bold; font-style:italic"></div>
</form>
[...]

PointedEars
___________
[1] <URL:http://pointedears.de/scripts/dhtml.js>
Feb 20 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2695
by: lawrence | last post by:
When users enter urls or other long strings it can destroy the formatting of a page. A long url, posted in a comment, can cause page distortions that make the page unreadable, till the website owner logs in and deletes the comment. To protect against that, I'd like to break up long strings in the comments (anything submitted by anonymous sources). One thing I'd like to add to the following function is the ability to break up long...
1
5335
by: Ada | last post by:
i'm trying to use Regex to match a 4 number group pattern. once a match is found, write it to RichTextBox in red. test data: This is my 4567 test data. 1234this is another line of data. Here's two more 9876 5432. What a headache!!!
7
2609
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b) every occurrence of characters "p", "q", "r", "s" with "y". Right now, I do it as follows:
6
1770
by: tshad | last post by:
Is there a way to use Regex inside of a tag, such as asp:label? I tried something like this but can't make it work: <asp:label id="Phone" text=Regex.Replace('<%# Container.DataItem("Phone") %>',"(\d{3})(\d{3})(\d{4})","($1) $2-$3") runat="server"/> I have this inside my Repeater and want it to filter the field during bind. I can do it afterwards by just looping through the repeater items, but that is extra work and time.
17
3960
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher http://forta.com/books/0672325667/
17
1645
by: steve | last post by:
here's the deal...cvs, tick encapsulted data. trying to use regex's to validate records. here's an example row: 'AD,'BF','132465','06/09/2004','','BNSF','A','TYPE','1278','','BR','2999','' ,'LX','','01','09','1','','','','','','','','','CUSTOM JOB CODE TEST' record type is in the 8th column ('1278'). using regex b/c there are a miriad of types that cause other data w/n the record (or related records) to be in/valid. i'm having problems...
13
2361
by: Chris Lieb | last post by:
I am trying to write a regex that will parse BBcode into HTML using JavaScript. Everything was going smoothly using the string class replace() operator with regex's until I got to the list tag. Implementing the list tag itself was fairly easy. What was not was trying to handle the list items. For some reason, in BBcode, they didn't bother defining an end tag for a list item. I guess that they designed it with bad old HTML 3.2 in mind...
17
2787
by: Howard | last post by:
I need to write a regular exp to replace more than 1 space to a single space input: this is a computer program output this is a computer program output = Regex.Replace(input, "something here", " ")
3
3496
by: Pascal | last post by:
bonjour hello I would like to trim a string of all its white spaces so i used myString.trim() but it doesn't work as supposed : unsecable space are remaining in the middle of my string... i read in msdn : and notice that trim only Removes all occurrences of white space characters from the beginning and end of this instance. So what for the middle ? .NET Framework Class Library
0
8349
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8795
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8695
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8460
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8576
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7296
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4281
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2696
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1906
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.