Connecting Tech Pros Worldwide Forums | Help | Site Map

chinese and arrays

Kobi Lurie
Guest
 
Posts: n/a
#1: Jul 17 '05

Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_contents
substr
taking into an array the text substr took
then
array_count_values
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_contents, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly

Henk Verhoeven
Guest
 
Posts: n/a
#2: Jul 17 '05

re: chinese and arrays


Hi Kobi,

I do not know how chinese is represented in the bytes of your file, but
i guess the file_get_contents and substr work on the bytes, not on the
chinese characters/signs. So to use substr you need to use byte indexes.

Once you got the correct substrings putting them into an array should
not change anything.

I guess sorting the array will not work with chinese, except for uksort
with a custom comparison function. To write a comparision function you
need to know how to compare the bytes that represent your characters/signs.

I hope someone reacts and tells me i am wrong, that there is a locale
setting for chinese and that it actually works properly (you may try it
with the strcoll function in your string comparision function). But i am
not optimistic, given the mess i got myself in to with european numbers,
dates, automatic type conversion and MySQL. My solution was to use US
locale settings, us numbers and dates in literals, and code the
conversions myself in the user interface. But i admit, that may be
substantially more work with chinese then with Dutch...

For what it is worth a link to the setlocale function in the manual:
http://www.php.net/manual/en/function.setlocale.php - sorry for the
english, there where three kinds of chinese and
http://www.php.net/manual/zh/function.setlocale.php does not look very
chinese anyhow)

Greetings,

Henk Verhoeven,
www.phppeanuts.org

Kobi Lurie wrote:
[color=blue]
>
> Hello all,
> I'm trying to make a simple script
> beginner level script, with just functions.
>
> it uses the functions:
> file_get_contents
> substr
> taking into an array the text substr took
> then
> array_count_values
> and sort by value
>
> the text used is chinese text, and after it is taken into an array or
> maybe even in file_get_contents, I think it's no longer chinese
> but converted somehow.
>
> anybody knows how to deal with this?
> do i need to convert before, or perform something?
>
> I use echo to screen, but can also write to file the results.
> it doesn't look like chinese.
> any help is appreciated. thanks in advance, kobi.
> you can email me directly[/color]

Henk Verhoeven
Guest
 
Posts: n/a
#3: Jul 17 '05

re: chinese and arrays


Kobi,

I came across another function that may be relevant to your problem:

htmlentities ( string string [, int quote_style [, string charset]])

the third parameter, charset, can be set to:
BIG5 Traditional Chinese, mainly used in Taiwan.
GB2312 Simplified Chinese, national standard character set.
BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.

see http://www.php.net/manual/en/function.htmlentities.php

Probably your file contains a normal chinese encoded string, while the
browser needs it to be encoded for one of the above variants of chinese
html. This is what htmlentities does (if you use the right charset
parameter).

html_entity_decode ( string string [, int quote_style [, string charset]])

will do the opposite: decode from html to normal string you can put in a
file.

I hope this helps.

Greetings,

Henk Verhoeven,
www.phpPeanuts.org.


Kobi Lurie wrote:[color=blue]
>
> Hello all,
> I'm trying to make a simple script
> beginner level script, with just functions.
>
> it uses the functions:
> file_get_contents
> substr
> taking into an array the text substr took
> then
> array_count_values
> and sort by value
>
> the text used is chinese text, and after it is taken into an array or
> maybe even in file_get_contents, I think it's no longer chinese
> but converted somehow.
>
> anybody knows how to deal with this?
> do i need to convert before, or perform something?
>
> I use echo to screen, but can also write to file the results.
> it doesn't look like chinese.
> any help is appreciated. thanks in advance, kobi.
> you can email me directly[/color]

Closed Thread