473,395 Members | 1,452 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

URL character check

I am looking for a function that checks there isn't non-valid URL
characters in some input:

http://www.blooberry.com/indexdot/ht...rlencoding.htm

Not sure what happened to the % character here.

I am having trouble crafting the regular expression as of course:

if (preg_match("/[^a-zA-Z0-9$-_.+!*'(),]/", $uri))

doesn't work and no end of escaping seems to work.

Dec 8 '05 #1
5 1534
*** hendry escribió/wrote (7 Dec 2005 17:46:44 -0800):
if (preg_match("/[^a-zA-Z0-9$-_.+!*'(),]/", $uri))

doesn't work and no end of escaping seems to work.


This means: given a list of all possible characters, return TRUE if $uri
contains a character not included in the list.

You must be aware that certain characters have a special meaning in regular
expressions, just like '*' has a special meaning in a Windows or Linux
directory listing. You need to escape them when they aren't special
characters.

http://www.php.net/manual/en/ref.pcre.php
http://www.php.net/manual/en/referen...ern.syntax.php
http://www.php.net/manual/en/referen....modifiers.php
--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--
Dec 8 '05 #2
Yes and I do escape them and it doesn't work. *sigh* regex suck.

Dec 9 '05 #3
*** hendry escribió/wrote (8 Dec 2005 18:37:41 -0800):
Yes and I do escape them and it doesn't work. *sigh* regex suck.


If unsure about how to escape them, preg_quote() will do the trick
wonderfully.

--
-+ Álvaro G. Vicario - Burgos, Spain
++ http://bits.demogracia.com es mi sitio para programadores web
+- http://www.demogracia.com es mi web de humor libre de cloro
--
Dec 9 '05 #4
hendry wrote:
I am looking for a function that checks there isn't non-valid URL
characters in some input:

http://www.blooberry.com/indexdot/ht...rlencoding.htm

if (preg_match("/[^a-zA-Z0-9$-_.+!*'(),]/", $uri))


Remember that in a character class the hypen (-) character has a very
special meaning. So escape it, or move it to the front or back of the class:

$-_

Means all the characters between $ and _ including $ and _

And escape the $-character to be sure php doesn't try to parse it as a
variable

/[^a-zA-Z0-9\$_.+!*'(),-]/
Dec 9 '05 #5
hendry wrote:
I am looking for a function that checks there isn't non-valid URL
characters in some input:
That's harder than you make out, even if your only concern is HTTP URLs,
because not all parts allow the same set of characters. For example, a
blanket ban on <?> because it can't occur in paths doesn't take into
account that it can occur in queries and fragments; likewise, a free pass
to digits because they can occur almost anywhere doesn't take into account
that they can't occur first in a scheme.

You'd have to, for starters, slice the URL into its components -
scheme, authority, path, etc. - before performing any checks, unless of
course your check is simply for characters that aren't allowed at all
(spaces, double quotes, and the like). Then, and I'm afraid there's no
getting around this, you'd have to examine the relevant sections of RFCs
3986 and 2616 to find out exactly what can go where.
http://www.blooberry.com/indexdot/ht...rlencoding.htm


(That page is long past its best-by. Burn that bookmark!)

--
Jock
Dec 11 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Yazar Yolait | last post by:
I want to skip lines in a file that are blank and that start with "&". So I strip(None) them and then startswith("&") but the only problem is if the line has nothing but white space and I...
5
by: Daniel | last post by:
Hi, is there a way to check if a letter entered is an uppercase ASCII character? Thanks Daniel
0
by: Thiko | last post by:
Hi According to the official mysql manual: http://www.mysql.com/doc/en/Charset-SHOW-CHARSET.html The syntax to show all available character sets is the SHOW CHARACTER SET command. It takes...
9
by: Safalra | last post by:
The idea here is relatively simple: a java program (I'm using JDK1.4 if that makes a difference) that loads an HTML file, removes invalid characters (or replaces them in the case of common ones...
9
by: MSUTech | last post by:
Hello, What is the best way to check each character within a string? For doing something like encryption, where you check character 1 and replace it with a different character.. then check...
5
by: lkrubner | last post by:
I know I'm missing something obvious, but I looked hard at this page and did not see the format of the return specified: http://us3.php.net/manual/en/function.ord.php >From the limited example...
15
by: Beeeeeves | last post by:
Is there a quick way to find the index of the first character different in two strings? For instance, if I had the strings "abcdefghijkl" and "abcdefxyz" I would want the return value to be...
2
by: John Dalberg | last post by:
The below html validates correctly on w3.org's html validator when the file has an html extension. When the same file gets an aspx extension, I get the error below from the validator. This tells me...
25
by: lovecreatesbeauty | last post by:
Hello experts, I write a function named palindrome to determine if a character string is palindromic, and test it with some example strings. Is it suitable to add it to a company/project library...
1
by: sonald | last post by:
Dear All, I am working on a module that validates the provided CSV data in a text format, which must be in a predefined format. We check for the : 1. Number of fields provided in the text file,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.