473,237 Members | 1,265 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,237 software developers and data experts.

preg_match() regex to validate URL

As I understand it, the characters that make up an Internet domain name can
consist of only alpha-numeric characters and a hyphen
(http://tools.ietf.org/html/rfc3696)

So I'm trying to write regex that will provide a basic url format validation:

starts with http or https (the only 2 prots I'm interested in), is followed by
'://', then ([any alpha-numeric or hyphen] followed by a '.' appearing 1 or more
times), then followed by anything *, and is case-insensitive.

I tried this:

if (preg_match('/^(http|https):\/\/([a-z0-9-]\.+)*/i', $urlString))
{
$valid == true;
}
else
{
$valid == false;
}

but no luck.

Any suggestions welcome...

Thanks in advance.

Feb 12 '07 #1
9 30440
Rik
deko wrote:

Deko, while you enthusiasm is appreciated, please stay in the same thread
when making a post about the same subject. Starting several threads not
only creates confusion about answers already given and context, it also
gives off the feeling of being very pushy.
As I understand it, the characters that make up an Internet domain name
can consist of only alpha-numeric characters and a hyphen
(http://tools.ietf.org/html/rfc3696)
...."Any characters, or combination of bits (as octets), are permitted in
DNS names. However, there is a preferred form that is required by most
applications.".....
So I'm trying to write regex that will provide a basic url format
validation:

starts with http or https (the only 2 prots I'm interested in), is
followed by '://', then ([any alpha-numeric or hyphen] followed by a '.'
appearing 1 or more times), then followed by anything *, and is
case-insensitive.

I tried this:

if (preg_match('/^(http|https):\/\/([a-z0-9-]\.+)*/i', $urlString))
This bit "([a-z0-9-]\.+)" does not do what you think it does, it matches
_one_ single character in the [a-z0-9-]-range, followed by at least one,
but an arbitrary amount of literal dots. And that repeated zero or more
times.. So 'http://a.b.c.d......d..e....a......' would match.

Further more, you seem to have anchorder this with ^, so it will only
match if http(s):// is at the very beginning of the string. Is that whatr
you want?

'/^https?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)+/i'

--
Rik Wasmus
Feb 12 '07 #2
>As I understand it, the characters that make up an Internet domain name can
>consist of only alpha-numeric characters and a hyphen
(http://tools.ietf.org/html/rfc3696)
..."Any characters, or combination of bits (as octets), are permitted in DNS
names. However, there is a preferred form that is required by most
applications.".....
I just tried registering various domain names with an underscore. The
registrar's system rejected it. While this may not be the best verification, I
have yet to see a valid Internet domain with an underscore or any other
non-alphanumeric character (other than a hyphen).
'/^https?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)+/i'
Thanks, Rik

Feb 12 '07 #3
Rik
On Mon, 12 Feb 2007 10:29:26 +0100, deko <de**@nospam.comwrote:
>>As I understand it, the characters that make up an Internet domain
name can consist of only alpha-numeric characters and a hyphen
(http://tools.ietf.org/html/rfc3696)
..."Any characters, or combination of bits (as octets), are permitted
in DNS names. However, there is a preferred form that is required by
most applications.".....

I just tried registering various domain names with an underscore. The
registrar's system rejected it. While this may not be the best
verification, I have yet to see a valid Internet domain with an
underscore or any other non-alphanumeric character (other than a hyphen).
There are efforts to fully internationalise DNS entries, so even non-roman
based character sets are allowed. See for instance
<http://www.ietf.org/rfc/rfc4185.txt>. We're not there yet by a long shot,
but there's no doubt it will happen.

--
Rik Wasmus
Feb 12 '07 #4
>>>As I understand it, the characters that make up an Internet domain name
>>>can consist of only alpha-numeric characters and a hyphen
(http://tools.ietf.org/html/rfc3696)
..."Any characters, or combination of bits (as octets), are permitted in
DNS names. However, there is a preferred form that is required by most
applications.".....

I just tried registering various domain names with an underscore. The
registrar's system rejected it. While this may not be the best
verification, I have yet to see a valid Internet domain with an underscore
or any other non-alphanumeric character (other than a hyphen).

There are efforts to fully internationalise DNS entries, so even non-roman
based character sets are allowed. See for instance
<http://www.ietf.org/rfc/rfc4185.txt>. We're not there yet by a long shot,
but there's no doubt it will happen.
Eventually, I'm sure.

Getting back to my regex question, I wonder if it would be better to check for
illegal characters:

if
(preg_match('/(`|~|!|@|#|$|%|^|&|*|(|\)|_|\+|=|\[|\{|\]|\}|\||;|\:|\'|\"|\<|\>|\?|)/',
$url_a['host'])) ???

I'm not having much luck catching invalid hostnames otherwise...

Feb 12 '07 #5
Rik wrote:
There are efforts to fully internationalise DNS entries, so even non-roman
based character sets are allowed. See for instance
<http://www.ietf.org/rfc/rfc4185.txt>. We're not there yet by a long shot,
but there's no doubt it will happen.
Not there yet?!

Try telling that to "www.ν•œκΈ€.kr"!

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Feb 12 '07 #6
..oO(Rik)
>'/^https?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)+/i'
With another delimiter you could avoid the escaping of slashes and make
the regexp a bit more readable (IMHO):

'#^https?://[a-z0-9-]+(\.[a-z0-9-]+)+#i'

Just my 2 cents.

Micha
Feb 12 '07 #7
Rik
On Mon, 12 Feb 2007 16:39:24 +0100, Toby A Inkster
<us**********@tobyinkster.co.ukwrote:
Rik wrote:
>There are efforts to fully internationalise DNS entries, so even
non-roman
based character sets are allowed. See for instance
<http://www.ietf.org/rfc/rfc4185.txt>. We're not there yet by a long
shot,
but there's no doubt it will happen.

Not there yet?!

Try telling that to "www.ν•œκΈ€.kr"!
Yup, works. Isn't understood by a lot of programs though, most browsers
will handle it just fine, but browsing is not the only thing we want to
use it for.

Simple example: I cannot ping this with ease in my Windows version...
--
Rik Wasmus
Feb 12 '07 #8
>>'/^https?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)+/i'
>
With another delimiter you could avoid the escaping of slashes and make
the regexp a bit more readable (IMHO):

'#^https?://[a-z0-9-]+(\.[a-z0-9-]+)+#i'
Thanks for the tip.

I recently found this: http://baseclass.modulweb.dk/urlvali...viewsource.php

which looks interesting, if not overkill.
Feb 13 '07 #9
Rik wrote:
Yup, works. Isn't understood by a lot of programs though, most browsers
will handle it just fine, but browsing is not the only thing we want to
use it for.

Simple example: I cannot ping this with ease in my Windows version...
There are IDN-enabled versions of ping available, but few if any operating
systems ship with them as standard yet.

Though the hope is that operating systems will integrate IDN support
directly into their own gethostbyname() type functions, so there is no
need to explicitly compile IDN support into all software that uses domain
names.

(On the other hand, many software has to parse URLs too, in which case
they'd probably need to update their URL-parsing code to cope with IDN.)

libidn exists, which makes it really easy to drop in support for
internationalised domain names into existing network apps. It's LGPL too,
which even makes it available for use by closed-source software.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!
Feb 13 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Colin Reid | last post by:
Hey MS, here's an apparent problem in the base class library. I pulled the email validation pattern "^((*)*@(()+(*)*\.)+{2,9})" from http://regexlib.com. If I validate the email address...
2
by: Tim Conner | last post by:
Hi, Thanks to Peter, Chris and Steven who answered my previous answer about regex to split a string. Actually, it was as easy as create a regex with the pattern "/*-+()," and most of my string...
7
by: Ali-R | last post by:
Hi all, I am getting a CSV file like this from our client: "C1","2","12344","Mr","John","Chan","05/07/1976"......... I need to validate **each filed value** against a set of rules ,for...
2
by: Jan | last post by:
Hi all, I have got the following problem: User fills in excel sheet, this is loaded in Acces. After this I run a validation tool to validate the field formats. One fields is allowed to be...
0
by: Fletch | last post by:
Any thoughts on how to validate an Excel range with RegEx? Acceptable inputs would include $A1:$BD25, C:C, B4 etc. I'm close to coming up with an answer but I'm not sure how to stop invalid range...
4
by: ad | last post by:
I am useing VS2005 to develop wep application. I use a RegularExpress both in RegularExpressionValidator and Regex class to validate a value. The RegularExpress is 20|\-9|\-1|?\d{1} When I...
20
numberwhun
by: numberwhun | last post by:
Hello everyone! I am still learning (TONS every day) and having an absolute blast. Unfortunately, I have an issue that is puzzling and bewildering me. Seeing as how the best way to learn is to...
6
by: Phil Barber | last post by:
I am using Regex to validate a file name. I have everything I need except I would like the dot(.) in the filename only to appear once. My question is it possible to allow one instance of character...
5
by: shapper | last post by:
Hello, What is the Regex expression to validate a date time format as follows: dd-mm-yyyy hh:mm:ss An example: 20-10-2008 10:32:45
6
by: mohaaron | last post by:
Hello all, I'm not very good with writing regular expressions and need some help with this one. I need to validate an email address which has the full name of the person appended to the...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: Aftab Ahmad | last post by:
Hello Experts! I have written a code in MS Access for a cmd called "WhatsApp Message" to open WhatsApp using that very code but the problem is that it gives a popup message everytime I clicked on...
0
by: Aftab Ahmad | last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below. Dim IE As Object Set IE =...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.