469,939 Members | 2,393 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,939 developers. It's quick & easy.

Validate URL script help

MJ
For some reason the following script does not work in Netscape/Mozilla, but
works fine in IE and Opera. It is supposed to check the syntax, make sure
there is a valid TLD (yes, those are all of the current TLDs), and allow for
addresses with or without trailing slashes or page addresses.

Anybody have any ideas on how to get this to work in Netscape? I suspect it
has something to do with the regular expression, but I can't get it to work.
Any help would be GREALY appreciated!

// Validate URL
re3 =
/^(http|https):\/\/\S+\.(ac|ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arp a|as|at
|au|aw|az|ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo| br|bs|bt|bv|bw|by|bz|ca|cc
|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|com|coop|cr|cu|cv|c x|cy|cz|de|dj|dk|dm|do|dz|
ec|edu|ee|eg|er|es|et|fi|fj|fk|fm|fo|fr|ga|gb|gd|g e|gf|gg|gh|gi|gl|gm|gn|gov
|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|i l|im|in|info|int|io|iq|ir|
is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kr|kw|ky|kz|la |lb|lc|li|lk|lr|ls|lt|lu|l
v|ly|ma|mc|md|mg|mh|mil|mk|ml|mm|mn|mo|mp|mq|mr|ms |mt|mu|museum|mv|mw|mx|my|
mz|na|name|nc|ne|net|nf|ng|ni|nl|no|np|nr|nu|nz|om |org|pa|pe|pf|pg|ph|pk|pl|
pm|pn|pr|pro|ps|pt|pw|py|qa|re|ro|ru|rw|sa|sb|sc|s d|se|sg|sh|si|sj|sk|sl|sm|
sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tm|tn |to|tp|tr|tt|tv|tw|tz|ua|u
g|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt| yu|za|zm|zw)([/]\S+|)$/i;

function validateURL(textfield){
if (textfield.value == ""){
return true;
} else {
if (textfield.value.substring(0,7) != "http://" &&
textfield.value.substring(0,8) != "https://") {
textfield.value = "http://" + textfield.value;
}
if (!re3.test(textfield.value)){
alert("Invalid web site address");
textfield.focus();
}
return false;
}
}

It is being called by a simple:

<input name="Website" type="text" onBlur="validateURL(this)">
Jul 23 '05 #1
3 3161
"MJ" <no*****@thank.you> writes:
For some reason the following script does not work in Netscape/Mozilla, but
works fine in IE and Opera.
"does not work" how? Do you get an error message or does it accept the wrong
strings?
/^(http|https):\/\/\S+\.(ac|ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arp a|as|at
|au|aw|az|ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo| br|bs|bt|bv|bw|by|bz|ca|cc
Your newsclient has wrapped the line. It should be on one line to work.
g|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt| yu|za|zm|zw)([/]\S+|)$/i;

^
That slash should be escaped. Change "[/]" to "\/".

Not tested (didn't want to rewrap the lines :)
/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Jul 23 '05 #2
MJ
Ah, it was the regular expression. Escaping the slash fixed it. I could
have sworn I had tried that before, but I guess not.

Thanks for the help! You're a life saver.
"Lasse Reichstein Nielsen" <lr*@hotpop.com> wrote in message
news:ll**********@hotpop.com...
"MJ" <no*****@thank.you> writes:
For some reason the following script does not work in Netscape/Mozilla, but works fine in IE and Opera.
"does not work" how? Do you get an error message or does it accept the

wrong strings?
/^(http|https):\/\/\S+\.(ac|ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arp a|as|at
|au|aw|az|ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo| br|bs|bt|bv|bw|by|bz|ca|cc
Your newsclient has wrapped the line. It should be on one line to work.
g|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt| yu|za|zm|zw)([/]\S+|)$/i; ^
That slash should be escaped. Change "[/]" to "\/".

Not tested (didn't want to rewrap the lines :)
/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html> 'Faith without judgement merely degrades the spirit divine.'

Jul 23 '05 #3
MJ wrote:
// Validate URL re3 = /^(http|https):
The alternation can be written as /https?/ which is generally
more efficient ("http" can be matched always only once).
\/\/\S+\.(ac|ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arp a|as|at
|au|aw|az|ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo| br|bs|bt|bv|bw|by|bz|ca|cc
|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|com|coop|cr|cu|cv|c x|cy|cz|de|dj|dk|dm|do|dz|
ec|edu|ee|eg|er|es|et|fi|fj|fk|fm|fo|fr|ga|gb|gd|g e|gf|gg|gh|gi|gl|gm|gn|gov
|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|i l|im|in|info|int|io|iq|ir|
is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kr|kw|ky|kz|la |lb|lc|li|lk|lr|ls|lt|lu|l
v|ly|ma|mc|md|mg|mh|mil|mk|ml|mm|mn|mo|mp|mq|mr|ms |mt|mu|museum|mv|mw|mx|my|
mz|na|name|nc|ne|net|nf|ng|ni|nl|no|np|nr|nu|nz|om |org|pa|pe|pf|pg|ph|pk|pl|
pm|pn|pr|pro|ps|pt|pw|py|qa|re|ro|ru|rw|sa|sb|sc|s d|se|sg|sh|si|sj|sk|sl|sm|
sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tm|tn |to|tp|tr|tt|tv|tw|tz|ua|u
g|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt| yu|za|zm|zw)
That can be shortened very much to

\/\/\S+\.(aero|arpa|a[c-gilm-oq-uwz]|biz|b[a-bd-fjm-or-tvwyz]|com|coop
|c[acdf-ik-oruvx-z|d[ejkmoz]|edu|e[cegr-t]|f[i-kmor]|gov|g[abd-il-npr-uwy]
|h[kmnrtu]|info|int|i[del-oqr-t]|j[emop]|k[eghimnrwyz]|l[abcikr-vy]
|museum|mil|m[acdghkl-z]|name|net|n[acefgilopruz]|org|om|pro|p[aefghk-nr-twy]
|qa|r[eouw]|s[a-eg-ort-vyz]|t[cdf-hjkm-prtvwz]|u[agkmsyz]|v[aceginu]
|w[fs]|y[etu]|z[amw])

(if I have not missed a character or two, but I think you get the
idea). That is not only shorter but can be more efficient than
complete alternation, depending on the type of RegExp engine used.
With a NFA, character classes are much more efficient than alternation
because matching can be done in parallel and thus much faster. (See
<http://www.oreilly.com/catalog/regex/chapter/ch04.html>, "Character
Classes vs. Alternation".) For ECMAScript and implementations, a NFA
is clearly involved in the matching process, if not the only engine
type used, as backreferences and capturing parantheses are supported.
So it is clearly a Good Thing to replace alternation with character
classes here and avoid alternation where possible.

But:

Are you sure you need the top level domain this precise while
the rest is checked rather sloppy? Are you prepared to maintain
that script as top level domains evolve? Why don't you allow
IPv4 addresses in URLs? They can be static. Why don't you stick
(close) to RFC 2396? The BNF grammar can be easily implemented
as a RegExp.
([/]\S+|)$/i;


As Lasse already pointed out, in RegExp literals every forward slash
must be escaped, even in character classes. In fact, the character
class does not help here. But you probably meant /((\/\S+)?)$/i.
Rule of thumb: Do not use alternation when not necessary, see above.
PointedEars

P.S.
Please take heed of Usenet/Internet standards and use an existing "From:"
address. Avoiding spam is no excuse for breaking standards and thus helping
to destroy the functionality of the involved media:
<http://www.interhack.net/pubs/munging-harmful/>
Jul 23 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Mark Griffin | last post: by
6 posts views Thread by Brian | last post: by
2 posts views Thread by Sev | last post: by
2 posts views Thread by . . | last post: by
5 posts views Thread by hoozdiss | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.