473,473 Members | 2,104 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Best way to parse a url for validity?

I have checkURL(http://globalwarmingawareness2007.org.uk,
globalwarmingawareness2007.org.uk)

I see almost everyone using regular expressions. But I don't completely
trust them. Don't know if this code is the best way to find if a user
entered a valid URL and to avoid SQL injection from the URL.

function checkURL($url, $name)
{
global $incorrect_input;

$data=parse_url("http://".$url);
if(!$data)
die($incorrect_input[1].$name);
$host=$data['host'];
$path=$data['path'];
$query=$data['query'];
$fragment=$data['fragment'];

//url does not start with a letter, number
if (!preg_match('/^[A-Za-z0-9]/i',$host))
die($incorrect_input[1].$name);

//url does not contain a .
if (!preg_match('/([A-Za-z0-9]+\.)+/i',$host))
die($incorrect_input[1].$name);

//url ends with .
if (preg_match('/\.$/i',$host))
die($incorrect_input[1].$name);

$array=split('\.',$host);
$arraysize=count($array);

for ($i = 0; $i < $arraysize; $i++)
{
if (preg_match('/[^A-Za-z0-9\-\_]+/i',$array[$i]))
die($incorrect_input[1].$name);
}

//Only allow alphanumeric letters, _,-,/
if($path)
{
$len=strlen($path);
for ($i = 0; $i < $len; $i++)
{
$ascii = ord($path[$i]);
if (($ascii < 65 || $ascii 90) &&
($ascii < 48 || $ascii 57) &&
($ascii < 97 || $ascii 122))
if ($ascii != 45 && $ascii != 46 && $ascii != 95 && $ascii != 47)
die($incorrect_input[1].$name);
}
}

//Do not allow more than one consecutive slash for the path
if (preg_match('/[\/]{2,}/i', $path))
die($incorrect_input[1].$name);
if($query)
{
if (preg_match('/[^A-Za-z0-9\/\-\_\=\&]+/i',$query))
die($incorrect_input[1].$name);
if (preg_match('/[\=\&]{2,}/i',$query))
die($incorrect_input[1].$name);
}

if($fragment)
{
if (preg_match('/[^A-Za-z0-9\-\_\.]+/i',$fragment))
die($incorrect_input[1].$name);
}

return($url);
}
Apr 26 '07 #1
2 1902
On Apr 26, 11:52 pm, Rick Stem <ricks...@yahoo.comwrote:
I have checkURL(http://globalwarmingawareness2007.org.uk,
globalwarmingawareness2007.org.uk)

I see almost everyone using regular expressions. But I don't completely
trust them. Don't know if this code is the best way to find if a user
entered a valid URL and to avoid SQL injection from the URL.

function checkURL($url, $name)
{
global $incorrect_input;

$data=parse_url("http://".$url);
if(!$data)
die($incorrect_input[1].$name);
$host=$data['host'];
$path=$data['path'];
$query=$data['query'];
$fragment=$data['fragment'];

//url does not start with a letter, number
if (!preg_match('/^[A-Za-z0-9]/i',$host))
die($incorrect_input[1].$name);

//url does not contain a .
if (!preg_match('/([A-Za-z0-9]+\.)+/i',$host))
die($incorrect_input[1].$name);

//url ends with .
if (preg_match('/\.$/i',$host))
die($incorrect_input[1].$name);

$array=split('\.',$host);
$arraysize=count($array);

for ($i = 0; $i < $arraysize; $i++)
{
if (preg_match('/[^A-Za-z0-9\-\_]+/i',$array[$i]))
die($incorrect_input[1].$name);
}

//Only allow alphanumeric letters, _,-,/
if($path)
{
$len=strlen($path);
for ($i = 0; $i < $len; $i++)
{
$ascii = ord($path[$i]);
if (($ascii < 65 || $ascii 90) &&
($ascii < 48 || $ascii 57) &&
($ascii < 97 || $ascii 122))
if ($ascii != 45 && $ascii != 46 && $ascii != 95 && $ascii != 47)
die($incorrect_input[1].$name);
}
}

//Do not allow more than one consecutive slash for the path
if (preg_match('/[\/]{2,}/i', $path))
die($incorrect_input[1].$name);

if($query)
{
if (preg_match('/[^A-Za-z0-9\/\-\_\=\&]+/i',$query))
die($incorrect_input[1].$name);
if (preg_match('/[\=\&]{2,}/i',$query))
die($incorrect_input[1].$name);
}

if($fragment)
{
if (preg_match('/[^A-Za-z0-9\-\_\.]+/i',$fragment))
die($incorrect_input[1].$name);
}

return($url);

}
it isnt the best way no, th above code restricts the url to a small
subset of valid urls, and doesnt prevent sql inject which can occur
inside POST payload as well as GET.
Architecturally it isnt the right way to think about the problem
either, IMHO, its the easy answer - restrict restrict restrict - its
no substitute for allowing all the valid urls, even ones with
injection, and then filtering the input/output of your scripts.
this kind of approach though can have validity, have you tried using
mod_security?
Within php means you will be restricting yourself from application
adjustments, rewrites, non-ascii language implementation, besides all
this, the approach above doesnt lend itself to easy adjustment,
whereas a simple block of more readable reg exp would do, once youve
made the leap of faith (shown by others to be a worthwhile leap) into
the world of reg exps which you can indeed trust despite their
complexity.

Apr 27 '07 #2

"Rick Stem" <ri******@yahoo.comwrote in message
news:f0*********@news4.newsguy.com...
|I have checkURL(http://globalwarmingawareness2007.org.uk,
| globalwarmingawareness2007.org.uk)
|
| I see almost everyone using regular expressions. But I don't completely
| trust them. Don't know if this code is the best way to find if a user
| entered a valid URL and to avoid SQL injection from the URL.

JESUS CHRIST!!!

'dont' trust them'? you mean 'i couldn't write one if it meant i'd get
laid'.

i don't 'trust' the code you've just written! have you completely overlooked
the fact that php has built-in functions that break out a url into the
pieces you're looking for? do you not know that even if it 'looks' valid, it
may point to nowhere?

'don't trust them'...i'm still laughing.
Apr 27 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: jacob nikom | last post by:
Hi, I would like to store XML files in MySQL. What is the best solution: 1. Convert it to string and store it as CLOB/text 2. Serialize it and store as byte array 3. Flatten it out and create...
136
by: Matt Kruse | last post by:
http://www.JavascriptToolbox.com/bestpractices/ I started writing this up as a guide for some people who were looking for general tips on how to do things the 'right way' with Javascript. Their...
1
by: AMeador | last post by:
I am doing a project where we need to read a write data to a database (SQL Server 2K in this case). I have seen the use of data binding to link a property of a control to a field in a record set...
9
by: optimistx | last post by:
Which url in your opinion would be a good or even the best example of javascript usage in a set of pages at least say 10 or more pages? How to use css, how to split js-code to files, how to code...
14
by: Rob Meade | last post by:
Hi all, I'm working on a project where there are just under 1300 course files, these are HTML files - my problem is that I need to do more with the content of these pages - and the thought of...
29
by: gs | last post by:
let say I have to deal with various date format and I am give format string from one of the following dd/mm/yyyy mm/dd/yyyy dd/mmm/yyyy mmm/dd/yyyy dd/mm/yy mm/dd/yy dd/mmm/yy mmm/dd/yy
19
by: Steve | last post by:
I have to create 2 strings and then parse one string out to save the data into the database. My first string looks like this: ...
4
by: istillshine | last post by:
I have a function foo, shown below. Is it a good idea to test each argument against my assumption? I think it is safer. However, I notice that people usually don't test the validity of...
10
by: Fred | last post by:
I'm researching and I see various approaches, with or without regex. So I'm asking for opinions on which is the best - that is, the most thorough. Thanks.
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.