473,320 Members | 2,193 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

regex/replace white list

Hi,

What is the best way to white list a set of allowable characters using
regex or replace? I understand it is safer to whitelist than to
blacklist, but am not sure how to go about it.

Many thanks!

Feb 17 '06 #1
4 4265
jg*****@gmail.com wrote:
Hi,

What is the best way to white list a set of allowable characters using
regex or replace? I understand it is safer to whitelist than to
blacklist, but am not sure how to go about it.


Whether to use a white list (i.e. list of allowed characters) or a black
list (list of not allowed characters) is probably best decided by which
one gives the smaller list. I'm not sure 'safety' is an issue.

As far as a regular expression is concerned, the difference between the
two is whether to use the NOT (!) operator or not (or use an else
statement).

To build the white/black list, use a string of characters and the
RegExp() function as a constructor, e.g. if you want to disallow the
letter 'a' in a string, then:

var re = new RegExp('a');

will create a regular expression that can be used to match the letter
'a' anywhere, e.g.:

if ( re.test(someString) )
{
// someString contains the letter 'a'
} else {
// someString doesn't contain the letter 'a'
}

or:

if ( ! re.test(someString) )
{
// someString doesn't contain the letter 'a'
}

To make the regular expression case-insensitive, add the 'i' flag:

var re = new RegExp('a','i');
To match any word character or the '$' character:

var re = new RegExp('[\\w$]');
To match any non-word character (not part of: a-z, A-Z, 0-9):

var re = new RegExp('\\W');
You can build the expression and flags as string variables and use those:

var reString = '\\W'; // Expression string
var flString = 'g'; // Flag string
var re = new RegExp(reString, flString);
and so on... Search the archives for lots of examples.

--
Rob
Feb 17 '06 #2
RobG wrote:
To build the white/black list, use a string of characters and the
RegExp() function as a constructor, e.g. if you want to disallow the
letter 'a' in a string, then:

var re = new RegExp('a');

will create a regular expression that can be used to match the letter
'a' anywhere, [...]


While there is not much point in using the RegExp() constructor instead
of a Regular Expression literal when the expression is invariant. As was
discussed here recently, efficiency and compatibility are seldom an issue:

As for efficiency, the RegExp object created by a RegExp literal is created
before execution, and the literal is then merely a reference to that
object. The RegExp object is not recreated by repeated use of the same
literal (say, in a loop). (Which must be considered regarding efficiency,
though, since this will create a new RegExp object always if the expression
differs, unconditionally. Even if the object is used only when a certain
condition applies.)

As for compatibility, even though RegExp literals have not been specified
before ECMAScript Edition 3 (issued 1999, seven years ago already, though),
they are supported since JavaScript 1.2 (Netscape 4.0, June 1997) except
of the `m' modifier. They are supported including the `m' modifier since
JavaScript 1.5 (Mozilla/5.0 rv:0.6, November 2000) and JScript 3.0
(Internet Explorer 4.0, and Internet Information Server 4.0, October 1997).
(The problems that remain compared to ECMAScript Edition 3 are non-capturing
parantheses and non-greedy expressions that are not universally supported,
but you have to deal with those problems with the RegExp() constructor as
well.)

However, using the RegExp constructor removes and introduces a maintenance
problem. It removes the problem that Regular Expressions cannot span lines
because string concatenation serves the purpose. It introduces the problem
that one has to escape the expression twice: one time to avoid escape
sequences in the string literal, and again to have RegExp special
characters parsed as expression atoms instead. (This is often very
confusing to people who are fairly new to the language.)

var re = /a/;

and the like certainly suffices here.

As I final note, I want to add that if special features of Regular
Expressions compared to strings are not used, it is probably more
efficient not to use Regular Expressions at all. Instead of writing

if (re.test(someString))

using the RegExp() constructor or the above RegExp object initializer,
it is probably more efficient to write

if (someString.indexOf("a") > -1)

instead.
PointedEars
Feb 17 '06 #3
Thomas 'PointedEars' Lahn wrote:
RobG wrote:

To build the white/black list, use a string of characters and the
RegExp() function as a constructor, e.g. if you want to disallow the
letter 'a' in a string, then:

var re = new RegExp('a');

will create a regular expression that can be used to match the letter
'a' anywhere, [...]

While there is not much point in using the RegExp() constructor instead
of a Regular Expression literal when the expression is invariant.


My understanding of the request is that the string *is* variant. The OP
wishes to build a list of characters to allow/disallow, I presumed it
would not be hard-coded - though it might be built that way at the
server where the value is extracted from a database and the appropriate
value hard-coded into the script.

But I supposed that the value would written to some variable, which is
then accessed by the script, e.g.

var blackList = '$%#';

and then later:

var re = new RegExp('[' + blacklist + ']');

of a Regular Expression literal when the expression is invariant. As was
discussed here recently, efficiency and compatibility are seldom an issue:

As for efficiency, the RegExp object created by a RegExp literal is created
before execution, and the literal is then merely a reference to that
object. The RegExp object is not recreated by repeated use of the same
literal (say, in a loop). (Which must be considered regarding efficiency,
though, since this will create a new RegExp object always if the expression
differs, unconditionally. Even if the object is used only when a certain
condition applies.)
Quite true, I was addressing efficiency from the point of view of the
length of the expression. e.g. to allow only letters and digits, \w
will do the trick. To disallow only '@#$' then - [@#$] - is much
shorter than a list of everything else.

The difference in efficiency between using RegExp as a constructor and
using a literal in the above scenario is likely irrelevant (though I
understand your point and in general much prefer to use literals).

[...] However, using the RegExp constructor removes and introduces a maintenance
problem. It removes the problem that Regular Expressions cannot span lines
because string concatenation serves the purpose. It introduces the problem
that one has to escape the expression twice: one time to avoid escape
sequences in the string literal, and again to have RegExp special
characters parsed as expression atoms instead.
Escaping characters is always an issue, especially if multi-line input
is accepted. Should new lines & line feeds be allowed? The solution is
for the OP to learn about matching characters and apply that to their
particular circumstance.
[...]
var re = /a/;

and the like certainly suffices here.
Probably a result of my trivial example - a better example is below.

As I final note, I want to add that if special features of Regular
Expressions compared to strings are not used, it is probably more
efficient not to use Regular Expressions at all. Instead of writing

if (re.test(someString))

using the RegExp() constructor or the above RegExp object initializer,
it is probably more efficient to write

if (someString.indexOf("a") > -1)


If the need was a test for a specific character, then that would be
fine. Maybe you could use it with a loop to go through each character
in the black list, but how many characters/loops would it take before a
regular expression was faster?

The following example may be better:

<script type="text/javascript">

function checkList(blID, strID)
{
var blackList = document.getElementById(blID).value;
var inString = document.getElementById(strID).value;
var re = new RegExp('[' + blackList + ']');
document.getElementById('xx').innerHTML = re.test(inString);
}
</script>
<label for="blackList">Blacklist characters:<input
type="text" id="blackList" value="\^\]$#@"></label><br>

<label for="inputText">String to check:<input
type="text" id="inputText" value="Cost: $6"></label>

<input type="button" value="Check input with blacklist"
onclick="checkList('blackList','inputText');">

<div>Result: <span id="xx" style="font-weight: bold;">
<i>no check done yet...</i></span></div>
If new lines, line feeds, etc. need to be tested too, use a textarea
instead of a text input for the input string. Variations on how
browsers represent new lines may need to be accommodated too.

--
Rob
Feb 20 '06 #4
RobG wrote:
Thomas 'PointedEars' Lahn wrote:
However, using the RegExp constructor removes and introduces a
maintenance problem. It removes the problem that Regular Expressions
cannot span lines because string concatenation serves the purpose. It
introduces the problem that one has to escape the expression twice: one
time to avoid escape sequences in the string literal, and again to have
RegExp special characters parsed as expression atoms instead.
Escaping characters is always an issue, especially if multi-line input
is accepted. Should new lines & line feeds be allowed?


You misunderstood. This was not about matching newline in the input.
The solution is for the OP to learn about matching characters and apply
that to their particular circumstance.
My point was that

var rx = /very_long_Regular_Expression.a.b.c.d.e.f.g.h.i.j.k .l.m.n.o.p.
r.s.t.u.v.w.x.y.z.\..#.#.4.2.1.3.3.7./

is not possible (consider the above a _hard_ line break to avoid crossing
the 80-columns border), but

var rx = new RegExp(
"very_long_Regular_Expression.a.b.c.d.e.f.g.h.i.j. k.l.m.n.o.p."
+ "r.s.t.u.v.w.x.y.z.\\..#.#.4.2.1.3.3.7.");

(and the like) is. The latter introduces the maintenance problem that the
literal "." must be escaped twice, but it removes the maintenance problem
that literals are not allowed to span lines (in the source code).
As I final note, I want to add that if special features of Regular
Expressions compared to strings are not used, it is probably more
efficient not to use Regular Expressions at all. Instead of writing

if (re.test(someString))

using the RegExp() constructor or the above RegExp object initializer,
it is probably more efficient to write

if (someString.indexOf("a") > -1)


If the need was a test for a specific character, then that would be
fine. Maybe you could use it with a loop to go through each character
in the black list, but how many characters/loops would it take before a
regular expression was faster?


I do not know. This was a general note.
The following example may be better:
Maybe not :)
<script type="text/javascript">

function checkList(blID, strID)
{
var blackList = document.getElementById(blID).value;
var inString = document.getElementById(strID).value;
A `form' element would have avoided the inefficient and not downwards
compatible referencing.

function checkList(f, blId, strID)
{
var es;
if (blID && strID
&& f && (es = f.elements)
&& es[blID] && es[strID])
{
var blackList = es[blID].value;
var inString = es[strID].value;

// ...
}
else
{
window.alert("foobar!");
}

return false;
}

<form action="..."
onsubmit="checkList(this, 'blackList', 'inputText');">
...
<input type="submit" value="Check input with blacklist">
</form>
var re = new RegExp('[' + blackList + ']');
What about the escaping part? You do not want the user to handle that,
do you?
document.getElementById('xx').innerHTML = re.test(inString);
Mixing standards compliant and proprietary DOM features unnecessarily.

es["xx"].style.fontStyle = "normal"; // I prefer setStyleProperty()[1]
es["xx"].value = re.test(inString);

<form ...>
...
<div>Result: <input id="xx"
value="no check done yet..."
style="border:0; font-weight:bold; font-style:italic"></div>
</form>
[...]

PointedEars
___________
[1] <URL:http://pointedears.de/scripts/dhtml.js>
Feb 20 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: lawrence | last post by:
When users enter urls or other long strings it can destroy the formatting of a page. A long url, posted in a comment, can cause page distortions that make the page unreadable, till the website...
1
by: Ada | last post by:
i'm trying to use Regex to match a 4 number group pattern. once a match is found, write it to RichTextBox in red. test data: This is my 4567 test data. 1234this is another line of data. Here's...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
6
by: tshad | last post by:
Is there a way to use Regex inside of a tag, such as asp:label? I tried something like this but can't make it work: <asp:label id="Phone" text=Regex.Replace('<%# Container.DataItem("Phone")...
17
by: clintonG | last post by:
I'm using an .aspx tool I found at but as nice as the interface is I think I need to consider using others. Some can generate C# I understand. Your preferences please... <%= Clinton Gallagher ...
17
by: steve | last post by:
here's the deal...cvs, tick encapsulted data. trying to use regex's to validate records. here's an example row: 'AD,'BF','132465','06/09/2004','','BNSF','A','TYPE','1278','','BR','2999',''...
13
by: Chris Lieb | last post by:
I am trying to write a regex that will parse BBcode into HTML using JavaScript. Everything was going smoothly using the string class replace() operator with regex's until I got to the list tag....
17
by: Howard | last post by:
I need to write a regular exp to replace more than 1 space to a single space input: this is a computer program output this is a computer program output = Regex.Replace(input, "something...
3
by: Pascal | last post by:
bonjour hello I would like to trim a string of all its white spaces so i used myString.trim() but it doesn't work as supposed : unsecable space are remaining in the middle of my string... i...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.