Marcus wrote:
Quote:
I am trying to determine if data entered in a $_POST variable in a form
contains all ASCII (0 - 127) characters or not. To do this I am using
mb_detect_encoding(). I am running into problems with non-English
characters, however - for example, I translated the word 'test' into
Russian and got 'испытание'.If I feed this into the function as:
>
$_POST['var'] = 'испытание'; // from form
echo mb_detect_encoding($_POST['var']);
>
it returns ASCII. After thinking about it and running some tests I
figured out that it is doing this because PHP is feeding
mb_detect_encoding the string after it is converted to its html
representation, i.e. instead of 'испытание' mb_detect_encoding() is
getting
'испытание'.
Obviously all of these characters are ASCII, and as far as I can tell
this is what's happening.
>
Is there a way that I can tell if data entered is ASCII or not BEFORE it
is converted? With the example above, I would want this test to fail
(not return ASCII). Thanks in advance.
The form data is converted into html entities on the client side before
php receives the data, convert the html entities back into a string
using html_entity_decode()
Even then mb_detect_encodings() might not work, the user notes in the
php manual aren't encouraging anyway. Someone gave a regular expression
for detecting utf-8 that can be adapted
I think preg_match( '/[^\x09\x0A\x0D\x20-\x7E]/xs',
html_entity_decode($_POST['var']) ) will work
Tim Hunt