By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,905 Members | 1,637 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,905 IT Pros & Developers. It's quick & easy.

problem with mb_detect_encoding

P: n/a
I am trying to determine if data entered in a $_POST variable in a form
contains all ASCII (0 - 127) characters or not. To do this I am using
mb_detect_encoding(). I am running into problems with non-English
characters, however - for example, I translated the word 'test' into
Russian and got 'испытание'. If I feed this into the function as:

$_POST['var'] = 'испытание'; // from form
echo mb_detect_encoding($_POST['var']);

it returns ASCII. After thinking about it and running some tests I
figured out that it is doing this because PHP is feeding
mb_detect_encoding the string after it is converted to its html
representation, i.e. instead of 'испытание' mb_detect_encoding() is
getting
'испытание'.
Obviously all of these characters are ASCII, and as far as I can tell
this is what's happening.

Is there a way that I can tell if data entered is ASCII or not BEFORE it
is converted? With the example above, I would want this test to fail
(not return ASCII). Thanks in advance.
Aug 9 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a

Marcus wrote:
I am trying to determine if data entered in a $_POST variable in a form
contains all ASCII (0 - 127) characters or not. To do this I am using
mb_detect_encoding(). I am running into problems with non-English
characters, however - for example, I translated the word 'test' into
Russian and got 'испытание'.If I feed this into the function as:

$_POST['var'] = 'испытание'; // from form
echo mb_detect_encoding($_POST['var']);

it returns ASCII. After thinking about it and running some tests I
figured out that it is doing this because PHP is feeding
mb_detect_encoding the string after it is converted to its html
representation, i.e. instead of 'испытание' mb_detect_encoding() is
getting
'испытание'.
Obviously all of these characters are ASCII, and as far as I can tell
this is what's happening.

Is there a way that I can tell if data entered is ASCII or not BEFORE it
is converted? With the example above, I would want this test to fail
(not return ASCII). Thanks in advance.
The form data is converted into html entities on the client side before
php receives the data, convert the html entities back into a string
using html_entity_decode()

Even then mb_detect_encodings() might not work, the user notes in the
php manual aren't encouraging anyway. Someone gave a regular expression
for detecting utf-8 that can be adapted

I think preg_match( '/[^\x09\x0A\x0D\x20-\x7E]/xs',
html_entity_decode($_POST['var']) ) will work

Tim Hunt

Aug 10 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.