473,748 Members | 8,760 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

chinese and arrays


Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_conten ts
substr
taking into an array the text substr took
then
array_count_val ues
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_conten ts, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly
Jul 17 '05 #1
2 3185
Hi Kobi,

I do not know how chinese is represented in the bytes of your file, but
i guess the file_get_conten ts and substr work on the bytes, not on the
chinese characters/signs. So to use substr you need to use byte indexes.

Once you got the correct substrings putting them into an array should
not change anything.

I guess sorting the array will not work with chinese, except for uksort
with a custom comparison function. To write a comparision function you
need to know how to compare the bytes that represent your characters/signs.

I hope someone reacts and tells me i am wrong, that there is a locale
setting for chinese and that it actually works properly (you may try it
with the strcoll function in your string comparision function). But i am
not optimistic, given the mess i got myself in to with european numbers,
dates, automatic type conversion and MySQL. My solution was to use US
locale settings, us numbers and dates in literals, and code the
conversions myself in the user interface. But i admit, that may be
substantially more work with chinese then with Dutch...

For what it is worth a link to the setlocale function in the manual:
http://www.php.net/manual/en/function.setlocale.php - sorry for the
english, there where three kinds of chinese and
http://www.php.net/manual/zh/function.setlocale.php does not look very
chinese anyhow)

Greetings,

Henk Verhoeven,
www.phppeanuts.org

Kobi Lurie wrote:

Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_conten ts
substr
taking into an array the text substr took
then
array_count_val ues
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_conten ts, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly


Jul 17 '05 #2
Kobi,

I came across another function that may be relevant to your problem:

htmlentities ( string string [, int quote_style [, string charset]])

the third parameter, charset, can be set to:
BIG5 Traditional Chinese, mainly used in Taiwan.
GB2312 Simplified Chinese, national standard character set.
BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.

see http://www.php.net/manual/en/function.htmlentities.php

Probably your file contains a normal chinese encoded string, while the
browser needs it to be encoded for one of the above variants of chinese
html. This is what htmlentities does (if you use the right charset
parameter).

html_entity_dec ode ( string string [, int quote_style [, string charset]])

will do the opposite: decode from html to normal string you can put in a
file.

I hope this helps.

Greetings,

Henk Verhoeven,
www.phpPeanuts.org.
Kobi Lurie wrote:

Hello all,
I'm trying to make a simple script
beginner level script, with just functions.

it uses the functions:
file_get_conten ts
substr
taking into an array the text substr took
then
array_count_val ues
and sort by value

the text used is chinese text, and after it is taken into an array or
maybe even in file_get_conten ts, I think it's no longer chinese
but converted somehow.

anybody knows how to deal with this?
do i need to convert before, or perform something?

I use echo to screen, but can also write to file the results.
it doesn't look like chinese.
any help is appreciated. thanks in advance, kobi.
you can email me directly


Jul 17 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
6522
by: Knackeback | last post by:
Hi, I wrote a XML file with GNU emacs 21.2.2 and with chinese character content encoded in UTF-8. I wrote something like: <?xml version="1.0" encoding="UTF-8"?> <test> <chinese>¼»</chinese> <chinese>ÄÎ</chinese> </test>
6
22270
by: Zhang Weiwu | last post by:
Hello. I am working with a php software project, in it (www.egroupware.org) Chinese simplified locate is "zh" while Traditional Chinese "tw". I wish to send correct language attribute in http header, I found "zh" is not standard. I found this line in apache2's default httpd.conf # Simplified Chinese (zh-CN) AddLanguage zh-CN .zh-cn
1
2365
by: Anthony Liu | last post by:
I believe that topic related to Chinese processing was discussed before. I could not dig out the info I want from the mail list archive. My Python script reads some Chinese text and then split a line delimited by white spaces. I got lists like
8
11983
by: pabv | last post by:
Hello all, I am having a few issues with encoding to chinese characters and perhaps someone might be able to assist. At the moment I am only able to see chinese characters when displayed as part of a datagrid. When an input textbox is displayed it does not display chinese characters, but rather the unicode characters stored in the mssql 2000 server backend.
7
4313
by: c.verma | last post by:
I have a web application. There is a page which has a datagrid on it.The datagrid displays the data that comes from SAP. SAP sends the chinese characters to this grid. Before I display CHinese charactes, I have to use the following code to let it display on the web page: Public Function ToSCUnicode(ByVal str As String) As String Dim enc1252 As System.Text.Encoding = System.Text.Encoding.GetEncoding(1252) Dim arrByte_GBK As Byte() Dim...
2
5168
by: Kevin | last post by:
Hi All, I want to validate a string, and see if it contains any Chinese character (simple or traditional). I'm trying to use RegExp and Encoding, but no result. Can someone point me a direction? Kind regards, Kevin
12
3225
by: Steven Nagy | last post by:
Hi all, I have to do a website in chinese! Basically I just need to know how to output chinese characters. I am assuming its very easy, but have never done it before. I can however do simple things like changing the formats of currency and calendars and so on. I am guessing the answer is quite simple given; I assume Unicode would support all the chinese characters right? Ideally I'd like them to be able to enter their own content...
12
2742
by: Steve Howell | last post by:
The never-ending debate about PEP 3131 got me thinking about natural languages with respect to Python, and I have a bunch of mostly simple observations (some factual, some anecdotal). I present these mostly as food for thought, but I do make my own continent-by-continent recommendations at the bottom of the email. (My own linguistic biases are also disclosed at the bottom of the email.) Nationality of various technologists who use...
2
6293
by: Wassy | last post by:
Hi, i have a website which contains both chinese and english content which is stored in a database. Each record in the dB has an english and Chinese field. If a user enters a search string i have to be able to detect which characters are latin based and which are chinese ideographs. eg) a user may enter "hello ÐÂÎÅÍø world" this is because many Chinese search phrases (especially those involved with technology may include English words...
0
8991
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8831
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
9325
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9249
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8244
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6076
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4607
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3315
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2787
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.