As I understand it, the characters that make up an Internet domain name can
consist of only alpha-numeric characters and a hyphen
( http://tools.ietf.org/html/rfc3696)
So I'm trying to write regex that will provide a basic url format validation:
starts with http or https (the only 2 prots I'm interested in), is followed by
'://', then ([any alpha-numeric or hyphen] followed by a '.' appearing 1 or more
times), then followed by anything *, and is case-insensitive.
I tried this:
if (preg_match('/^(http|https):\/\/([a-z0-9-]\.+)*/i', $urlString))
{
$valid == true;
}
else
{
$valid == false;
}
but no luck.
Any suggestions welcome...
Thanks in advance. 9 30482
deko wrote:
Deko, while you enthusiasm is appreciated, please stay in the same thread
when making a post about the same subject. Starting several threads not
only creates confusion about answers already given and context, it also
gives off the feeling of being very pushy.
As I understand it, the characters that make up an Internet domain name
can consist of only alpha-numeric characters and a hyphen
(http://tools.ietf.org/html/rfc3696)
...."Any characters, or combination of bits (as octets), are permitted in
DNS names. However, there is a preferred form that is required by most
applications.". ....
So I'm trying to write regex that will provide a basic url format
validation:
starts with http or https (the only 2 prots I'm interested in), is
followed by '://', then ([any alpha-numeric or hyphen] followed by a '.'
appearing 1 or more times), then followed by anything *, and is
case-insensitive.
I tried this:
if (preg_match('/^(http|https):\/\/([a-z0-9-]\.+)*/i', $urlString))
This bit "([a-z0-9-]\.+)" does not do what you think it does, it matches
_one_ single character in the [a-z0-9-]-range, followed by at least one,
but an arbitrary amount of literal dots. And that repeated zero or more
times.. So 'http://a.b.c.d......d. .e....a......' would match.
Further more, you seem to have anchorder this with ^, so it will only
match if http(s):// is at the very beginning of the string. Is that whatr
you want?
'/^https?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)+/i'
--
Rik Wasmus
>As I understand it, the characters that make up an Internet domain name can
>consist of only alpha-numeric characters and a hyphen (http://tools.ietf.org/html/rfc3696)
..."Any characters, or combination of bits (as octets), are permitted in DNS
names. However, there is a preferred form that is required by most
applications.". ....
I just tried registering various domain names with an underscore. The
registrar's system rejected it. While this may not be the best verification, I
have yet to see a valid Internet domain with an underscore or any other
non-alphanumeric character (other than a hyphen).
'/^https?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)+/i'
Thanks, Rik
On Mon, 12 Feb 2007 10:29:26 +0100, deko <de**@nospam.co mwrote:
>>As I understand it, the characters that make up an Internet domain name can consist of only alpha-numeric characters and a hyphen (http://tools.ietf.org/html/rfc3696)
..."Any characters, or combination of bits (as octets), are permitted in DNS names. However, there is a preferred form that is required by most applications.". ....
I just tried registering various domain names with an underscore. The
registrar's system rejected it. While this may not be the best
verification, I have yet to see a valid Internet domain with an
underscore or any other non-alphanumeric character (other than a hyphen).
There are efforts to fully internationalis e DNS entries, so even non-roman
based character sets are allowed. See for instance
<http://www.ietf.org/rfc/rfc4185.txt>. We're not there yet by a long shot,
but there's no doubt it will happen.
--
Rik Wasmus
>>>As I understand it, the characters that make up an Internet domain name
>>>can consist of only alpha-numeric characters and a hyphen (http://tools.ietf.org/html/rfc3696) ..."Any characters, or combination of bits (as octets), are permitted in DNS names. However, there is a preferred form that is required by most applications. ".....
I just tried registering various domain names with an underscore. The registrar's system rejected it. While this may not be the best verification , I have yet to see a valid Internet domain with an underscore or any other non-alphanumeric character (other than a hyphen).
There are efforts to fully internationalis e DNS entries, so even non-roman
based character sets are allowed. See for instance
<http://www.ietf.org/rfc/rfc4185.txt>. We're not there yet by a long shot,
but there's no doubt it will happen.
Eventually, I'm sure.
Getting back to my regex question, I wonder if it would be better to check for
illegal characters:
if
(preg_match('/(`|~|!|@|#|$|%| ^|&|*|(|\)|_|\+ |=|\[|\{|\]|\}|\||;|\:|\'| \"|\<|\>|\?| )/',
$url_a['host'])) ???
I'm not having much luck catching invalid hostnames otherwise...
Rik wrote:
There are efforts to fully internationalis e DNS entries, so even non-roman
based character sets are allowed. See for instance
<http://www.ietf.org/rfc/rfc4185.txt>. We're not there yet by a long shot,
but there's no doubt it will happen.
Not there yet?!
Try telling that to "www.한글.kr" !
--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux
* = I'm getting there!
..oO(Rik)
>'/^https?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)+/i'
With another delimiter you could avoid the escaping of slashes and make
the regexp a bit more readable (IMHO):
'#^https?://[a-z0-9-]+(\.[a-z0-9-]+)+#i'
Just my 2 cents.
Micha
On Mon, 12 Feb 2007 16:39:24 +0100, Toby A Inkster
<us**********@t obyinkster.co.u kwrote:
Rik wrote:
>There are efforts to fully internationalis e DNS entries, so even non-roman based character sets are allowed. See for instance <http://www.ietf.org/rfc/rfc4185.txt>. We're not there yet by a long shot, but there's no doubt it will happen.
Not there yet?!
Try telling that to "www.한글.kr" !
Yup, works. Isn't understood by a lot of programs though, most browsers
will handle it just fine, but browsing is not the only thing we want to
use it for.
Simple example: I cannot ping this with ease in my Windows version...
--
Rik Wasmus
>>'/^https?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)+/i'
>
With another delimiter you could avoid the escaping of slashes and make
the regexp a bit more readable (IMHO):
'#^https?://[a-z0-9-]+(\.[a-z0-9-]+)+#i'
Thanks for the tip.
I recently found this: http://baseclass.modulweb.dk/urlvali...viewsource.php
which looks interesting, if not overkill.
Rik wrote:
Yup, works. Isn't understood by a lot of programs though, most browsers
will handle it just fine, but browsing is not the only thing we want to
use it for.
Simple example: I cannot ping this with ease in my Windows version...
There are IDN-enabled versions of ping available, but few if any operating
systems ship with them as standard yet.
Though the hope is that operating systems will integrate IDN support
directly into their own gethostbyname() type functions, so there is no
need to explicitly compile IDN support into all software that uses domain
names.
(On the other hand, many software has to parse URLs too, in which case
they'd probably need to update their URL-parsing code to cope with IDN.)
libidn exists, which makes it really easy to drop in support for
internationalis ed domain names into existing network apps. It's LGPL too,
which even makes it available for use by closed-source software.
--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux
* = I'm getting there! This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Colin Reid |
last post by:
Hey MS, here's an apparent problem in the base class library. I pulled
the email validation pattern
"^((*)*@(()+(*)*\.)+{2,9})"
from http://regexlib.com.
If I validate the email address "test@someverylongemailaddress.com"
against it by just creating a RegEx and calling IsMatch it works fine,
but if I create a schema defining a simple type restricting an
xs:string by the regex pattern, it takes over full minute at 100% cpu
to match....
|
by: Tim Conner |
last post by:
Hi,
Thanks to Peter, Chris and Steven who answered my previous answer about
regex to split a string. Actually, it was as easy as create a regex with the
pattern "/*-+()," and most of my string was splitted.
I am fascinated to the powerfull use of this RegEx class, so I wonder if it
could go a step further.
As a question, can regex be used to valid a set of different functions ?
Example : Suppose I have to verify the correctness of an...
|
by: Ali-R |
last post by:
Hi all,
I am getting a CSV file like this from our client:
"C1","2","12344","Mr","John","Chan","05/07/1976".........
I need to validate **each filed value** against a set of rules ,for instance
for "05/07/1976" ,I need to make sure that it's in the right format ,It's
not later than today and lots of other rules ,Is there somebody who can help
me how to that?Can I map it to some sort of xml schema or something?
|
by: Jan |
last post by:
Hi all,
I have got the following problem:
User fills in excel sheet, this is loaded in Acces.
After this I run a validation tool to validate the field formats.
One fields is allowed to be null, empty (length = 0) and max 30 chars.
I use the following expression: ^.{0,30}\b
It seems that this is not working as I would expect when fields are empty or
|
by: Fletch |
last post by:
Any thoughts on how to validate an Excel range with RegEx?
Acceptable inputs would include $A1:$BD25, C:C, B4 etc.
I'm close to coming up with an answer but I'm not sure how to stop
invalid range references such as C3:A2 from being accepted.
Thanks.
| |
by: ad |
last post by:
I am useing VS2005 to develop wep application.
I use a RegularExpress both in RegularExpressionValidator and Regex class to
validate a value.
The RegularExpress is 20|\-9|\-1|?\d{1}
When I enter 33 and validate with RegularExpressionValidator, it fail to
pass.
But when I validate with regex class :
Regex.IsMatch(Sight0L, @"20|\-9|\-1|?\d{1}");
|
by: numberwhun |
last post by:
Hello everyone! I am still learning (TONS every day) and having an absolute blast. Unfortunately, I have an issue that is puzzling and bewildering me.
Seeing as how the best way to learn is to re-invent the wheel, I am trying to write a script to validate an IP address (IPv4) as valid. I am only in the first part of this, where I validate that each of the octets has 1 to 3 digits. Well, this just isn't working.
I have tried the...
|
by: Phil Barber |
last post by:
I am using Regex to validate a file name. I have everything I need except I
would like the dot(.) in the filename only to appear once. My question is it
possible to allow one instance of character but not two or more?
example
myfile.doc = good
My.file.doc = not good
if you could give an example of the expression pattern that would most
helpful.
thanks
phil
|
by: shapper |
last post by:
Hello,
What is the Regex expression to validate a date time format as
follows:
dd-mm-yyyy hh:mm:ss
An example:
20-10-2008 10:32:45
|
by: mohaaron |
last post by:
Hello all,
I'm not very good with writing regular expressions and need some help
with this one. I need to validate an email address which has the full
name of the person appended to the beginning of it. Here is an example
of what I’m trying to validate.
“firstname lastname” <username@domain.com>
I need to enforce this format so the web form won’t allow submit
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |