Marcus wrote:
Quote:
I am trying to determine if data entered in a $_POST variable in a form
contains all ASCII (0 - 127) characters or not. To do this I am using
mb_detect_encoding(). I am running into problems with non-English
characters, however - for example, I translated the word 'test' into
Russian and got 'испытание'.If I feed this into the function as:
>
$_POST['var'] = 'испытание'; // from form
echo mb_detect_encoding($_POST['var']);
>
it returns ASCII. After thinking about it and running some tests I
figured out that it is doing this because PHP is feeding
mb_detect_encoding the string after it is converted to its html
representation, i.e. instead of 'испытание' mb_detect_encoding() is
getting
'испытание'.
Obviously all of these characters are ASCII, and as far as I can tell
this is what's happening.
>
Is there a way that I can tell if data entered is ASCII or not BEFORE it
is converted? With the example above, I would want this test to fail
(not return ASCII). Thanks in advance.
|
The form data is converted into html entities on the client side before
php receives the data, convert the html entities back into a string
using html_entity_decode()
Even then mb_detect_encodings() might not work, the user notes in the
php manual aren't encouraging anyway. Someone gave a regular expression
for detecting utf-8 that can be adapted
I think preg_match( '/[^\x09\x0A\x0D\x20-\x7E]/xs',
html_entity_decode($_POST['var']) ) will work
Tim Hunt