473,386 Members | 1,820 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

chinese and arrays


Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_contents
substr
taking into an array the text substr took
then
array_count_values
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_contents, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly
Jul 17 '05 #1
2 3070
Hi Kobi,

I do not know how chinese is represented in the bytes of your file, but
i guess the file_get_contents and substr work on the bytes, not on the
chinese characters/signs. So to use substr you need to use byte indexes.

Once you got the correct substrings putting them into an array should
not change anything.

I guess sorting the array will not work with chinese, except for uksort
with a custom comparison function. To write a comparision function you
need to know how to compare the bytes that represent your characters/signs.

I hope someone reacts and tells me i am wrong, that there is a locale
setting for chinese and that it actually works properly (you may try it
with the strcoll function in your string comparision function). But i am
not optimistic, given the mess i got myself in to with european numbers,
dates, automatic type conversion and MySQL. My solution was to use US
locale settings, us numbers and dates in literals, and code the
conversions myself in the user interface. But i admit, that may be
substantially more work with chinese then with Dutch...

For what it is worth a link to the setlocale function in the manual:
http://www.php.net/manual/en/function.setlocale.php - sorry for the
english, there where three kinds of chinese and
http://www.php.net/manual/zh/function.setlocale.php does not look very
chinese anyhow)

Greetings,

Henk Verhoeven,
www.phppeanuts.org

Kobi Lurie wrote:

Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_contents
substr
taking into an array the text substr took
then
array_count_values
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_contents, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly


Jul 17 '05 #2
Kobi,

I came across another function that may be relevant to your problem:

htmlentities ( string string [, int quote_style [, string charset]])

the third parameter, charset, can be set to:
BIG5 Traditional Chinese, mainly used in Taiwan.
GB2312 Simplified Chinese, national standard character set.
BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.

see http://www.php.net/manual/en/function.htmlentities.php

Probably your file contains a normal chinese encoded string, while the
browser needs it to be encoded for one of the above variants of chinese
html. This is what htmlentities does (if you use the right charset
parameter).

html_entity_decode ( string string [, int quote_style [, string charset]])

will do the opposite: decode from html to normal string you can put in a
file.

I hope this helps.

Greetings,

Henk Verhoeven,
www.phpPeanuts.org.
Kobi Lurie wrote:

Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_contents
substr
taking into an array the text substr took
then
array_count_values
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_contents, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly


Jul 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Knackeback | last post by:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with chinese character content encoded in UTF-8. I wrote something like: <?xml version="1.0" encoding="UTF-8"?> <test> <chinese>¼»</chinese>...
6
by: Zhang Weiwu | last post by:
Hello. I am working with a php software project, in it (www.egroupware.org) Chinese simplified locate is "zh" while Traditional Chinese "tw". I wish to send correct language attribute in http...
1
by: Anthony Liu | last post by:
I believe that topic related to Chinese processing was discussed before. I could not dig out the info I want from the mail list archive. My Python script reads some Chinese text and then split...
8
by: pabv | last post by:
Hello all, I am having a few issues with encoding to chinese characters and perhaps someone might be able to assist. At the moment I am only able to see chinese characters when displayed as...
7
by: c.verma | last post by:
I have a web application. There is a page which has a datagrid on it.The datagrid displays the data that comes from SAP. SAP sends the chinese characters to this grid. Before I display CHinese...
2
by: Kevin | last post by:
Hi All, I want to validate a string, and see if it contains any Chinese character (simple or traditional). I'm trying to use RegExp and Encoding, but no result. Can someone point me a...
12
by: Steven Nagy | last post by:
Hi all, I have to do a website in chinese! Basically I just need to know how to output chinese characters. I am assuming its very easy, but have never done it before. I can however do simple...
12
by: Steve Howell | last post by:
The never-ending debate about PEP 3131 got me thinking about natural languages with respect to Python, and I have a bunch of mostly simple observations (some factual, some anecdotal). I present...
2
by: Wassy | last post by:
Hi, i have a website which contains both chinese and english content which is stored in a database. Each record in the dB has an english and Chinese field. If a user enters a search string i have...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.