473,386 Members | 1,706 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

String Validation With UTF-8 Support

Hello,

I am looking for a way to check whether a string contains only word
characters and a single space (!= any whitespace char), *regardless of
the current locale*. In other words, any character that is a word
character in any locale should be allowed. This check:

preg_match("/^[\w ]*$/", $_GET[whatever]);

in which the $_GET variable contains an UTF-8 encoded string, only
seems to work with whatever locale is currently defined. Of course, I
could change the locale using setlocale(), but that would still limit
the check to a subset of all possible input values.

I also created this function from information that I found on the web:

--------------------------------
function is_utf8($_string) {
return preg_match('/^([\x00-\x7f]|'
. '[\xc2-\xdf][\x80-\xbf]|'
. '\xe0[\xa0-\xbf][\x80-\xbf]|'
. '[\xe1-\xec][\x80-\xbf]{2}|'
. '\xed[\x80-\x9f][\x80-\xbf]|'
. '[\xee-\xef][\x80-\xbf]{2}|'
. 'f0[\x90-\xbf][\x80-\xbf]{2}|'
. '[\xf1-\xf3][\x80-\xbf]{3}|'
. '\xf4[\x80-\x8f][\x80-\xbf]{2})*$/',
$_string) > 0;
}
--------------------------------

However, this does not seem to be completely accurate, as it still
allows characters such as this:

http://debain.org/software/tefinch/d...214&forum_id=1
(sorry for the external link, I just don't know how to create such
characters here.)

According to the W3C Validator, those characters are still invalid.
http://validator.w3.org/check?uri=ht...tomatically%29

I know there must be an answer somewhere on the web already, but I have
not found any reference in Google nor in the archives of this
newsgroup.

Any help appreciated.

-Samuel

Oct 6 '05 #1
1 2485
Hi!

I hope I got your problem right. In the PHP Manual contributed notes
theres a very good function to validate (and proof) UTF-8 encoded data.

http://de3.php.net/manual/en/functio...code.php#48160

It works perfectly for me. This function returns false when the given
text has chars in it, which are not part of the UTF-8 standard i.e.
ISO/ANSI above 128. If your Webpage has the correct meta-tag (charset
UTF-8) or the corresponding header (look in the php.ini, there's a
default setting!), the browser should then send you UTF-8 encoded data.

By the way have a look at the mb_string extension. It delivers a set of
string functions that replace the existing php functions which don't
support multi-byte char strings.

Hope that helped you a bit.

Greetings,
Benjamin Wilger

Oct 7 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: lkrubner | last post by:
What I need to do is find out what characters in a string are not supported by the UTF-8 encoding. The problem arises when someone logs in and uses my php script to create a weblog post. They are...
2
by: Olaf Meyer | last post by:
Apprentently xerces 2.6.0 (Java) does not validate against contraints specified in the schema (e.g. constraints specified via unique element). The validation works with the XML editor I'm using...
21
by: Zenobia | last post by:
I can't understand the warning I'm getting from the W3C validator. Here it is, along with the source code that it is not fully satisfied with. What meta-tags should I be including? Here is the...
3
by: Michael Skulsky | last post by:
Hi all, I've got the following validation problem. There are 2 schemas and a document: ----------------------------------------------------------------- bar.xsd ====== <?xml version="1.0"...
4
by: Michelle A. | last post by:
I have a form that takes in a credit card number, just a series of numbers 1234123412341234. When they get to the "Review" page and display all the information a user has entered, I would like the...
1
by: willie | last post by:
>willie wrote: wrote:
8
by: frohlinger | last post by:
Hi, I have a search textbox in my website. I validate the search string with a "white list" of allowed characters: if((/^+$/).test(theSearchWord) == false) { return; }
0
by: Michael Nemtsev [MVP] | last post by:
Hello Neil, did you try to google first? :)...
1
by: NamelessNumberheadMan | last post by:
I can't seem to get Struts 2 validations to work. I have been converting from Strust 1 to Struts 2. So far I've refactored all the code (for this particular module) on the back end, rewrote the jsp...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.