By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,771 Members | 1,741 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,771 IT Pros & Developers. It's quick & easy.

chinese and arrays

P: n/a

Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_contents
substr
taking into an array the text substr took
then
array_count_values
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_contents, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly
Jul 17 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
Hi Kobi,

I do not know how chinese is represented in the bytes of your file, but
i guess the file_get_contents and substr work on the bytes, not on the
chinese characters/signs. So to use substr you need to use byte indexes.

Once you got the correct substrings putting them into an array should
not change anything.

I guess sorting the array will not work with chinese, except for uksort
with a custom comparison function. To write a comparision function you
need to know how to compare the bytes that represent your characters/signs.

I hope someone reacts and tells me i am wrong, that there is a locale
setting for chinese and that it actually works properly (you may try it
with the strcoll function in your string comparision function). But i am
not optimistic, given the mess i got myself in to with european numbers,
dates, automatic type conversion and MySQL. My solution was to use US
locale settings, us numbers and dates in literals, and code the
conversions myself in the user interface. But i admit, that may be
substantially more work with chinese then with Dutch...

For what it is worth a link to the setlocale function in the manual:
http://www.php.net/manual/en/function.setlocale.php - sorry for the
english, there where three kinds of chinese and
http://www.php.net/manual/zh/function.setlocale.php does not look very
chinese anyhow)

Greetings,

Henk Verhoeven,
www.phppeanuts.org

Kobi Lurie wrote:

Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_contents
substr
taking into an array the text substr took
then
array_count_values
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_contents, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly


Jul 17 '05 #2

P: n/a
Kobi,

I came across another function that may be relevant to your problem:

htmlentities ( string string [, int quote_style [, string charset]])

the third parameter, charset, can be set to:
BIG5 Traditional Chinese, mainly used in Taiwan.
GB2312 Simplified Chinese, national standard character set.
BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.

see http://www.php.net/manual/en/function.htmlentities.php

Probably your file contains a normal chinese encoded string, while the
browser needs it to be encoded for one of the above variants of chinese
html. This is what htmlentities does (if you use the right charset
parameter).

html_entity_decode ( string string [, int quote_style [, string charset]])

will do the opposite: decode from html to normal string you can put in a
file.

I hope this helps.

Greetings,

Henk Verhoeven,
www.phpPeanuts.org.
Kobi Lurie wrote:

Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_contents
substr
taking into an array the text substr took
then
array_count_values
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_contents, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly


Jul 17 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.