469,290 Members | 1,806 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,290 developers. It's quick & easy.

Regular Expression for validating a url field

What is wrong with that?

regex =
/^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/

if(field.hpage.value != regex.test(field.hpage.value)){
alert("Bad Homepage")
field.hpage.focus()
field.hpage.select()
return false
}
return true
}

The regex should be all right... also the name for the fields are 100%
okay.. if I change the != for ==, it will go throught... which is
completely wrong =o(

Thanks guys.. cheers
Tizzah
tizzah.co.nr

Mar 1 '06 #1
7 12688
Tizzah wrote:
What is wrong with that?

regex =
/^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]
{1,5})?\/.*)?$/

if(field.hpage.value != regex.test(field.hpage.value)){
alert("Bad Homepage")
field.hpage.focus()
field.hpage.select()
return false
}
return true
}
(Do not use the tab character for indentation, at least in postings.
Use multiples of two or four space characters instead.)

Plenty of things are wrong with this. From top to bottom, and left to
right:

- `(https?)' is equivalent to `(http|https)' and more efficient than
the latter.
- Valid domain names may contain uppercase ASCII characters.
- Valid domain names may contain more than one consecutive hyphen (`-'),
ref. IDN, and they may begin or end with a hyphen or a dot.
- The literal hyphen does not need to be escaped at the beginning
or the end of a character class (`[...]').
- The literal dot (`.') does not need not to be escaped in a character
class.
- The {1} quantifier is redundant always.
- ([0-9]{1,5})? is equivalent to \d{,5} (not considering backreferences).
- Valid domain names may contain more than 5 consecutive decimal digits.
- Valid top-level domain names must not contain any decimal digit.
- Valid top-level domain names are not restricted to five letters, and
the .test TLD specified in RFC2606 for testing purposes has only four
letters.
- A URI does not need to include the path delimiter `/' if there are no
further path components.
The regex should be all right...
For appropriate values of "all right".
also the name for the fields are 100% okay.. if I change the != for ==,
it will go throught... which is completely wrong =o(


Your code simply does not make sense. RegExp.prototype.test() returns a
boolean value, either `true' or `false'. You are comparing that value
against a supposed string value, and since you do not do perform a strict
comparison (`!==' or `==='), you are forcing implicit type conversion on
both operands. (Skip the following section if you are not interested in
the inner workings of the language.)

,-[ECMAScript 3 Final]
|
| 11.9.2 The Does-not-equals Operator ( != )
|
| The production
| EqualityExpression : EqualityExpression != RelationalExpression
| is evaluated as follows:
|
| 1. Evaluate EqualityExpression.
| 2. Call GetValue(Result(1)).
| 3. Evaluate RelationalExpression.
| 4. Call GetValue(Result(3)).
| 5. Perform the comparison Result(4) == Result(2). (Section 11.9.3.)
| [...]
| 11.9.3 The Abstract Equality Comparison Algorithm
|
| The comparison x == y, where x and y are values, produces true or false.
| Such a comparison is performed as follows:
|
| 1. If Type(x) is different from Type(y), go to step 14.

Type(x) = String, Type(y) = Boolean. Condition applies, go to step 14.

| [...]
| 14. If x is null and y is undefined, return true.
| 15. If x is undefined and y is null, return true.
| 16. If Type(x) is Number and Type(y) is String,
| return the result of the comparison x == ToNumber(y).
| 17. If Type(x) is String and Type(y) is Number,
| return the result of the comparison ToNumber(x) == y.
| 18. If Type(x) is Boolean, return the result of the comparison
| ToNumber(x) == y.

None of the above applies, continue.

| 19. If Type(y) is Boolean, return the result of the comparison
| x == ToNumber(y).

Condition applies. Return the result of x == ToNumber(y).

x_1 := x
y_1 := y

| The comparison x == y, where x and y are values, produces true or
| false.

x := x_1
y := ToNumber(y_1).

| 1. If Type(x) is different from Type(y), go to step 14.

Case 1: y_1 = false (no match). ToNumber(false) = 0 --> y := 0.
Case 2: y_1 = true (match). ToNumber(true) = 1 --> y := 1.

In both cases:

Type(x) = String, Type(y) = Number. Condition applies, go to step 14.

| 14. If x is null and y is undefined, return true.
| 15. If x is undefined and y is null, return true.
| 16. If Type(x) is Number and Type(y) is String,
| return the result of the comparison x == ToNumber(y).

None of the above applies, continue.

| 17. If Type(x) is String and Type(y) is Number,
| return the result of the comparison ToNumber(x) == y.

x_2 := x
y_2 := y

| The comparison x == y, where x and y are values, produces true or
| false.

x := ToNumber(x_2)
y := y_2.

| 1. If Type(x) is different from Type(y), go to step 14.

Case 1: x_2 = "" (empty string). ToNumber("") = 0 --> x := 0.

Case 2: x_2 = "N" (not empty). In that case, ToNumber("N") always
returns a number value. If "N" is not the string representation
of a numeric literal, that value is NaN.

In both cases:

Type(x) = Number, Type(y) = Number. Condition does not apply,
continue.

| 2. If Type(x) is Undefined, return true.
| 3. If Type(x) is Null, return true.
| 4. If Type(x) is not Number, go to step 11.
| 5. If x is NaN, return false.

This condition applies if x_1 is not a string representation
of a numeric literal, read: could be a URI. In that case,
`false' is returned to the calling algorithm, so ultimately
`false' is returned to the algorithm of `!=', its Result(5)
being `false':

| 5. Perform the comparison Result(4) == Result(2). (Section 11.9.3.)

Result(5) := false

| 6. If Result(5) is true, return false. Otherwise, return true.

Therefore, `true' is returned then!
___________

If the condition ("x is NaN") does not apply, i.e. x_1 can be
interpreted as a number (read: is definitely not a URI), continue.

| 6. If y is NaN, return false.

This applies never here, continue always.

| 7. If x is the same number value as y, return true.

Case 1: x = y. Applies if

- x_1 (being the value of field.hpage.value) is the empty string,
because "" is converted to 0, and there can be no match for
/^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5
(([0-9]{1,5})?\/.*)?$/ in "", so `false' [being the result of
regex.test(field.hpage.value)] is converted to 0. x = y = 0.

- x_1 is "0...0" or "0x0...0", and there is no match, because x_1
is converted to 0, and `false' is converted to 1. x = y = 0.

- x_1 is "0...01" and "0x0...1", and there is a match, because
x_1 is converted to a 1, and `true' is converted to 1. x = y = 1.
Since the Regular Expression never matches "0...01" or "0x0...1",
this sub-case never applies.

`true' is returned here to the calling algorithm, and to its calling
algorithm, so ultimately `true' is returned to the algorithm of
`!=', its Result(5) being `true':

| 5. Perform the comparison Result(4) == Result(2). (Section 11.9.3.)

Result(5) := true

| 6. If Result(5) is true, return false. Otherwise, return true.

Therefore, `false' is returned then!
___________

Case 2: x != y. Applies if

- x_1 is different from "", "0...0" and "0x0...0", and there is
no match, because x_1 is then converted to a value n != 0, and
`false' is converted to 0. 0 != n = x != y = 0.

The condition

| 7. If x is the same number value as y, return true.

would not apply in case 2, therefore we continue.

| 8. If x is +0 and y is -0, return true.
| 9. If x is -0 and y is +0, return true.

None of above applies, continue.

| 10. Return false.

`false' is returned here to the calling algorithm, and to its calling
algorithm, so ultimately `false' is returned to the algorithm of `!=',
its Result(5) being `false':

| 5. Perform the comparison Result(4) == Result(2). (Section 11.9.3.)

Result(5) := false

| 6. If Result(5) is true, return false. Otherwise, return true.

Therefore, it returns `true' then!
__________________________________________________ _____________________

The outcome of the algorithm for the `==' operator is (of course) the
boolean opposite of the algorithm result for the `!=' operator, and
vice-versa.

So if the control's value is "", the (equals-)condition

field.hpage.value == regex.test(field.hpage.value)

is true:

0. "" == false ("no match")
1. "" == ToNumber(false)
2. "" == 0
3. ToNumber("") == 0
4. 0 == 0
5. true

If the control's value is "0...0" or "0x0...0", the condition is true:

0. "0...0" == false ("no match")
1. "0...0" == ToNumber(false)
2. "0...0" == 0
3. ToNumber("0...0") == 0
4. 0 == 0
5. true

If the control's value is "0...1" or "0x0...1", or another value that
can be interpreted as a number different from 0, the condition is false:

0. "0...1" == false ("no match")
1. "0...1" == ToNumber(false)
2. "0...1" == 0
3. ToNumber("0...1") == 0
4. 1 == 0
5. false

If the control's value is "http://f/" (not a URL, according to your
standards), the condition is false.

0 "http://f/" == false ("no match")
1. "http://f/" == ToNumber(false)
2. "http://f/" == 0
3. ToNumber("http://f/") == 0
4. NaN == 0
5. false (according to 11.9.3, step 5)

If the control's value is "http://x.org" (a URL, according to your
standards), the condition is _false_:

0 "http://x.org" == true ("match")
1. "http://x.org" == ToNumber(true)
2. "http://x.org" == 1
3. ToNumber("http://x.org") == 1
4. NaN == 1
5. false
You are looking for

if (!regex.test(field.hpage.value))
{
alert("Bad Homepage");
// ...
return false;
}
return true;

and probably a Regular Expression for matching URLs that makes sense,
see RFC3986.
PointedEars
Mar 1 '06 #2
Tizzah wrote:
What is wrong with that?

regex =
/^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/

if(field.hpage.value != regex.test(field.hpage.value)){
alert("Bad Homepage")
field.hpage.focus()
field.hpage.select()
return false
}
return true
}

The regex should be all right... also the name for the fields are 100%
okay.. if I change the != for ==, it will go throught... which is
completely wrong =o(


The problem with using a regular expression to check URLs (or any other
address such as an e-mail address) is that even if the string fits
within the range of valid addresses, you don't know if it's actually valid.

The only real way to validate a URL is to test it - ping comes to mind.
Let the user enter whatever they want at the client. When they send
the data to the server, validate it there and if the URL is bogus, tell
the user in the subsequent page.

If they want to persist with an invalid address, you can either assume
the site isn't up but might be later, or that the user doesn't want to
enter a valid address - how you deal with that is up to you.
--
Rob
Mar 2 '06 #3
RobG wrote:
The problem with using a regular expression to check URLs (or any other
address such as an e-mail address) is that even if the string fits
within the range of valid addresses, you don't know if it's actually
valid.
Exactly.
The only real way to validate a URL is to test it - ping comes to mind.


Only if you do not think it through. For example, have you ever tried to
ping(1) microsoft.com or one of their subdomains? ;-) (They, among others,
are DROPping or filtering ICMP requests, which is considered antisocial.)

What comes to /my/ mind here is of course to use DNS directly, therefore
host(1) or nslookup(1) (from the BIND9 host utilities), where the latter
is deprecated.
PointedEars
Mar 2 '06 #4
Thomas 'PointedEars' Lahn wrote:
RobG wrote:

The problem with using a regular expression to check URLs (or any other
address such as an e-mail address) is that even if the string fits
within the range of valid addresses, you don't know if it's actually
valid.

Exactly.

The only real way to validate a URL is to test it - ping comes to mind.

Only if you do not think it through. For example, have you ever tried to
ping(1) microsoft.com or one of their subdomains? ;-) (They, among others,
are DROPping or filtering ICMP requests, which is considered antisocial.)


Ping returns resolved domain names if they can be found (e.g. ping
www.microsoft.com and the resolved address is returned, even though the
request will time out).

I don't presume it is the best strategy, just one that came quickly to
mind. Without knowing what the OP's criteria are for a valid address,
all we can do is toss up a few possibilities.

[...]
--
Rob
Mar 2 '06 #5
RobG <rg***@iinet.net.au> writes:
Ping returns resolved domain names if they can be found (e.g. ping
www.microsoft.com and the resolved address is returned, even though
the request will time out).


If that's what you want, you could just use nslookup instead.

Neither pinging or dns resolution will tell you if a web server is
running, though.
/L
--
Lasse Reichstein Nielsen - lr*@hotpop.com
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
Mar 2 '06 #6
Lasse Reichstein Nielsen wrote:
RobG <rg***@iinet.net.au> writes:
Ping returns resolved domain names if they can be found (e.g. ping
www.microsoft.com and the resolved address is returned, even though
the request will time out).


If that's what you want, you could just use nslookup instead.

Neither pinging or dns resolution will tell you if a web server is
running, though.


True, to test this, one would have to make a HTTP request to a standard HTTP
port (using nc/netcat(1), HEAD(1) from the libwww-perl package, server-side
XMLHTTPRequest and the like). Which would still include the possibility of
a Web server that was temporarily down for maintenance being recognized as
not existing.
PointedEars
Mar 3 '06 #7
JRS: In article <11*********************@i40g2000cwc.googlegroups. com>,
dated Tue, 28 Feb 2006 20:30:58 remote, seen in
news:comp.lang.javascript, Tizzah <ti****@gmail.com> posted :
What is wrong with that?

One can, in principle, validate the full grammar of a URL against all
current applicable RFCs - but one then needs to watch for new RFCs which
may change the situation - and it's possible that there are sites
offering URLs that work but do not comply perfectly with the grammar.

AFAICS, only two forms of test are reasonable.

One can attempt to access the alleged URL in some manner, and see what
reply that gives; one learns something about the validity of that URL at
that instant.

Or one can look at a string to see whether there's a reasonable chance
of it being a valid URL or whether it cannot be but may be some other
form of data; that's easier of course if context permits testing for a
specific type of protocol.

Example : http://xxx and https://xxx and mailto:xxx can certainly start
a URL; faxto:xxx might well indicate a new protocol; c:xxx and c:/xxx
almost certainly indicate a mistake.

It's reasonable to check that mailto: is followed by a match for
..+@.+\..+ and that http:// is followed by .+\..*/.+ though, maybe with
another \..+ .
For the OP's purpose, there's no need to validate the alleged URL
locally. A URL-fetching agent needs to survive being offered any string
whatsoever; and if it works, the URL was right. No validator can
possibly check whether a URL is actually right; for example, <URL:http:/
/www.merlyn.demon.co.uk/astro.htm> is grammatically valid, and can be
fetched. However, it can hardly be what the summoner actually will
want.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://www.jibbering.com/faq/> JL/RC: FAQ of news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
Mar 3 '06 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by paulsmith5 | last post: by
8 posts views Thread by Rajeev Soni | last post: by
5 posts views Thread by John . | last post: by
14 posts views Thread by olekristianvillabo | last post: by
7 posts views Thread by graphicsxp | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.