ng**********@rediffmail.com (R. Rajesh Jeba Anbiah) wrote in message news:<ab**************************@posting.google. com>...

Polite cross-post to comp.programming. Original thread was in

comp.lang.php <http://groups.google.com/groups?threadm=abc4d8b8.0406090601.66a05a58%40post ing.google.com>

I post here to get some logic or idea.

<Previous Post>

ng**********@rediffmail.com (R. Rajesh Jeba Anbiah) wrote in message news:<ab**************************@posting.google. com>... I have found a nice code to convert ordinal values to utf (code2utf()

<http://in2.php.net/utf8_encode#34214> ). But, couldn't find the

reverse of that. The uniord() function found at usernotes

<http://in.php.net/ord#42778> is bit close, but it uses mbstring.

Anyone have any idea to do it without using mbstring? TIA

</Previous Post>

So, I'm here looking for how to get ordinal value for given Unicode

character. For example, if we pass integer 'A' to the function, it has

to return 65; if we pass 'க' to the function it has to return

0x0B95; and so on.

Any help on logic or concept is highly appreciated. TIA

Never mind, after scratching my head last night, I have found it

is nothing but UTF-8 to Unicode conversion. And found a nice algorithm

at <http://www1.tip.nl/~t876506/utf8tbl.html> Based on that I have

also written a code which seems to work fine.

<?php

//http://www1.tip.nl/~t876506/utf8tbl.html

//ordinal value for given Unicode character (in UTF-8)

//Logic: UTF-8 to Unicode conversion

function uniord($c)

{

$ud = 0;

if (ord($c{0})>=0 && ord($c{0})<=127) {

//If z is between and including 0 - 127, then there is 1 byte z. The

decimal Unicode value ud = the value of z.

$ud = $c{0};

}

if (ord($c{0})>=192 && ord($c{0})<=223) {

//If z is between and including 192 - 223, then there are 2 bytes z

y; ud = (z-192)*64 + (y-128)

$ud = (ord($c{0})-192)*64 + (ord($c{1})-128);

}

if (ord($c{0})>=224 && ord($c{0})<=239) {

//If z is between and including 224 - 239, then there are 3 bytes z y

x; ud = (z-224)*4096 + (y-128)*64 + (x-128)

$ud = (ord($c{0})-224)*4096 + (ord($c{1})-128)*64 +

(ord($c{2})-128);

}

if (ord($c{0})>=240 && ord($c{0})<=247) {

//If z is between and including 240 - 247, then there are 4 bytes z y

x w; ud = (z-240)*262144 + (y-128)*4096 + (x-128)*64 + (w-128)

$ud = (ord($c{0})-240)*262144 + (ord($c{1})-128)*4096 +

(ord($c{2})-128)*64 + (ord($c{3})-128);

}

if (ord($c{0})>=248 && ord($c{0})<=251) {

//If z is between and including 248 - 251, then there are 5 bytes z y

x w v; ud = (z-248)*16777216 + (y-128)*262144 + (x-128)*4096 +

(w-128)*64 + (v-128)

$ud = (ord($c{0})-248)*16777216 + (ord($c{1})-128)*262144 +

(ord($c{2})-128)*4096 + (ord($c{3})-128)*64 + (ord($c{4})-128);

}

if (ord($c{0})>=252 && ord($c{0})<=253) {

//If z is 252 or 253, then there are 6 bytes z y x w v u; ud =

(z-252)*1073741824 + (y-128)*16777216 + (x-128)*262144 + (w-128)*4096

+ (v-128)*64 + (u-128)

$ud = (ord($c{0})-252)*1073741824 + (ord($c{1})-128)*16777216 +

(ord($c{2})-128)*262144 + (ord($c{3})-128)*4096 + (ord($c{4})-128)*64

+ (ord($c{5})-128);

}

if (ord($c{0})>=254 && ord($c{0})<=255) {

//If z = 254 or 255 then there is something wrong!

// die('Error');

$ud = false;

}

return $ud;

}

?>

--

| Just another PHP saint |

Email: rrjanbiah-at-Y!com