Connecting Tech Pros Worldwide Help | Site Map

problem with mb_detect_encoding

  #1  
Old August 9th, 2006, 07:55 PM
Marcus
Guest
 
Posts: n/a
I am trying to determine if data entered in a $_POST variable in a form
contains all ASCII (0 - 127) characters or not. To do this I am using
mb_detect_encoding(). I am running into problems with non-English
characters, however - for example, I translated the word 'test' into
Russian and got 'испытание'. If I feed this into the function as:

$_POST['var'] = 'испытание'; // from form
echo mb_detect_encoding($_POST['var']);

it returns ASCII. After thinking about it and running some tests I
figured out that it is doing this because PHP is feeding
mb_detect_encoding the string after it is converted to its html
representation, i.e. instead of 'испытание' mb_detect_encoding() is
getting
'испытание'.
Obviously all of these characters are ASCII, and as far as I can tell
this is what's happening.

Is there a way that I can tell if data entered is ASCII or not BEFORE it
is converted? With the example above, I would want this test to fail
(not return ASCII). Thanks in advance.
  #2  
Old August 10th, 2006, 06:05 AM
Tim Hunt
Guest
 
Posts: n/a

re: problem with mb_detect_encoding



Marcus wrote:
Quote:
I am trying to determine if data entered in a $_POST variable in a form
contains all ASCII (0 - 127) characters or not. To do this I am using
mb_detect_encoding(). I am running into problems with non-English
characters, however - for example, I translated the word 'test' into
Russian and got 'испытание'.If I feed this into the function as:
>
$_POST['var'] = 'испытание'; // from form
echo mb_detect_encoding($_POST['var']);
>
it returns ASCII. After thinking about it and running some tests I
figured out that it is doing this because PHP is feeding
mb_detect_encoding the string after it is converted to its html
representation, i.e. instead of 'испытание' mb_detect_encoding() is
getting
'испытание'.
Obviously all of these characters are ASCII, and as far as I can tell
this is what's happening.
>
Is there a way that I can tell if data entered is ASCII or not BEFORE it
is converted? With the example above, I would want this test to fail
(not return ASCII). Thanks in advance.
The form data is converted into html entities on the client side before
php receives the data, convert the html entities back into a string
using html_entity_decode()

Even then mb_detect_encodings() might not work, the user notes in the
php manual aren't encouraging anyway. Someone gave a regular expression
for detecting utf-8 that can be adapted

I think preg_match( '/[^\x09\x0A\x0D\x20-\x7E]/xs',
html_entity_decode($_POST['var']) ) will work

Tim Hunt

Closed Thread


Similar Threads
Thread Thread Starter Forum Replies Last Post
Does PHP send out corrupted string ? (charset issue) Gulzor answers 7 September 4th, 2008 11:13 AM
LDAP Function with intl' character set. Thone answers 1 June 7th, 2007 10:55 PM
No mbstring function for finding suitable encoding. Lucas Kruijswijk answers 2 December 11th, 2006 09:55 PM
Problem with string type Benot answers 0 July 25th, 2005 02:35 PM