Hi Kobi,
I do not know how chinese is represented in the bytes of your file, but
i guess the file_get_contents and substr work on the bytes, not on the
chinese characters/signs. So to use substr you need to use byte indexes.
Once you got the correct substrings putting them into an array should
not change anything.
I guess sorting the array will not work with chinese, except for uksort
with a custom comparison function. To write a comparision function you
need to know how to compare the bytes that represent your characters/signs.
I hope someone reacts and tells me i am wrong, that there is a locale
setting for chinese and that it actually works properly (you may try it
with the strcoll function in your string comparision function). But i am
not optimistic, given the mess i got myself in to with european numbers,
dates, automatic type conversion and MySQL. My solution was to use US
locale settings, us numbers and dates in literals, and code the
conversions myself in the user interface. But i admit, that may be
substantially more work with chinese then with Dutch...
For what it is worth a link to the setlocale function in the manual:
http://www.php.net/manual/en/function.setlocale.php - sorry for the
english, there where three kinds of chinese and
http://www.php.net/manual/zh/function.setlocale.php does not look very
chinese anyhow)
Greetings,
Henk Verhoeven,
www.phppeanuts.org
Kobi Lurie wrote:
Hello all,
I'm trying to make a simple script
beginner level script, with just functions.
it uses the functions:
file_get_contents
substr
taking into an array the text substr took
then
array_count_values
and sort by value
the text used is chinese text, and after it is taken into an array or
maybe even in file_get_contents, I think it's no longer chinese
but converted somehow.
anybody knows how to deal with this?
do i need to convert before, or perform something?
I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly