Hi,everyone:
Have you any ideas?
Say whatever you know about this.
thanks. 19 32704
many_years_after wrote:
Hi,everyone:
Have you any ideas?
Say whatever you know about this.
Perhaps you had better explain what you mean by "ascii code of Chinese
characters". Chinese characters ("hanzi") can be represented in many
ways on a computer, in Unicode as well as many different "legacy"
encodings, such as GB, GBK, big5, two different 4-digit telegraph
codes, etc etc. They can also be spelled out in "roman" letters with or
without tone indications (digits or "accents") in the pinyin system --
is that what you mean by "ascii code"?
Perhaps you might like to tell us what you want to do in Python with
hanzi and "ascii codes", so that we can give you a specific answer.
With examples, please -- like what are the "ascii codes" for the two
characters in the common greeting that comes across in toneless pinyin
as "ni hao"?
Cheers,
John
Philippe Martin wrote:
many_years_after wrote:
>Hi,everyone:
Have you any ideas?
Say whatever you know about this.
thanks.
Hi,
You mean unicode I assume: http://www.rikai.com/library/kanjita....unicode.shtml
Regards,
Philippe
Hi,
I have received a personnal email on this:
Kanji is indeed a Japanese subset of the Chinese Character set.
I just thought it would be relevant as it includes ~47000 characters.
If I hurt any feeling, sorry.
Regards,
Philippe
hi:
what I want to do is just to make numbers as people input some Chinese
character(hanzi,i mean).The same character will create the same
number.So I think ascii code can do this very well.
John Machin wrote:
many_years_after wrote:
Hi,everyone:
Have you any ideas?
Say whatever you know about this.
Perhaps you had better explain what you mean by "ascii code of Chinese
characters". Chinese characters ("hanzi") can be represented in many
ways on a computer, in Unicode as well as many different "legacy"
encodings, such as GB, GBK, big5, two different 4-digit telegraph
codes, etc etc. They can also be spelled out in "roman" letters with or
without tone indications (digits or "accents") in the pinyin system --
is that what you mean by "ascii code"?
Perhaps you might like to tell us what you want to do in Python with
hanzi and "ascii codes", so that we can give you a specific answer.
With examples, please -- like what are the "ascii codes" for the two
characters in the common greeting that comes across in toneless pinyin
as "ni hao"?
Cheers,
John
In <11**********************@i42g2000cwa.googlegroups .com>,
many_years_after wrote:
what I want to do is just to make numbers as people input some Chinese
character(hanzi,i mean).The same character will create the same
number.So I think ascii code can do this very well.
No it can't. ASCII doesn't contain Chinese characters. http://en.wikipedia.org/wiki/ASCII
Ciao,
Marc 'BlackJack' Rintsch
* many_years_after (2006-08-19 12:18 +0100)
Hi,everyone:
Have you any ideas?
Say whatever you know about this.
contradictio in adiecto
On 2006-08-19 12:42:31, Marc 'BlackJack' Rintsch wrote:
many_years_after wrote:
>what I want to do is just to make numbers as people input some Chinese character(hanzi,i mean).The same character will create the same number.So I think ascii code can do this very well.
No it can't. ASCII doesn't contain Chinese characters.
Well, ASCII can represent the Unicode numerically -- if that is what the OP
wants. For example, "U+81EC" (all ASCII) is one possible -- not very
readable though <g-- representation of a Hanzi character (see http://www.cojak.org/index.php?funct...kup&term=81EC).
(I don't know anything about Hanzi or Mandarin... But that's Unicode, so
this works :)
Gerhard
Gerhard Fiedler wrote:
Well, ASCII can represent the Unicode numerically -- if that is what the OP
wants.
No. ASCII characters range is 0..127 while Unicode characters range is
at least 0..65535.
For example, "U+81EC" (all ASCII) is one possible -- not very
readable though <g-- representation of a Hanzi character (see http://www.cojak.org/index.php?funct...kup&term=81EC).
U+81EC means a Unicode character which is represented by the number
0x81EC. There are some encodings defined which map Unicode sequences
to byte sequences: UTF-8 maps Unicode strings to sequences of bytes in
the range 0..255, UTF-7 maps Unicode strings to sequences of bytes in
the range 0..127. You *could* read the latter as ASCII sequences
but this is not correct.
How to do it in Python? Let chinesePhrase be a Unicode string with
Chinese content. Then
chinesePhrase_7bit = chinesePhrase.encode('utf-7')
will produce a sequences of bytes in the range 0..127 representing
chinesePhrase and *looking like* a (meaningless) ASCII sequence.
chinesePhrase_16bit = chinesePhrase.encode('utf-16be')
will produce a sequence with Unicode numbers packed in a byte
string in big endian order. This is probably closest to what
the OP wants.
Peter Maas, Aachen
many_years_after wrote:
John Machin wrote:
many_years_after wrote:
Hi,everyone:
>
Have you any ideas?
>
Say whatever you know about this.
>
Perhaps you had better explain what you mean by "ascii code of Chinese
characters". Chinese characters ("hanzi") can be represented in many
ways on a computer, in Unicode as well as many different "legacy"
encodings, such as GB, GBK, big5, two different 4-digit telegraph
codes, etc etc. They can also be spelled out in "roman" letters with or
without tone indications (digits or "accents") in the pinyin system --
is that what you mean by "ascii code"?
Perhaps you might like to tell us what you want to do in Python with
hanzi and "ascii codes", so that we can give you a specific answer.
With examples, please -- like what are the "ascii codes" for the two
characters in the common greeting that comes across in toneless pinyin
as "ni hao"?
Cheers,
John
hi:
what I want to do is just to make numbers as people input some Chinese
character(hanzi,i mean).The same character will create the same
number.So I think ascii code can do this very well.
*What* characters make *what* numbers? Stop thinking and give us some
*examples*
On 2006-08-19 16:54:36, Peter Maas wrote:
Gerhard Fiedler wrote:
>Well, ASCII can represent the Unicode numerically -- if that is what the OP wants.
No. ASCII characters range is 0..127 while Unicode characters range is
at least 0..65535.
Actually, Unicode goes beyond 65535. But right in this sentence, you
represented the number 65535 with ASCII characters, so it doesn't seem to
be impossible.
>For example, "U+81EC" (all ASCII) is one possible -- not very readable though <g-- representation of a Hanzi character (see http://www.cojak.org/index.php?funct...kup&term=81EC).
U+81EC means a Unicode character which is represented by the number
0x81EC.
Exactly. Both versions represented in ASCII right in your message :)
UTF-8 maps Unicode strings to sequences of bytes in the range 0..255,
UTF-7 maps Unicode strings to sequences of bytes in the range 0..127.
You *could* read the latter as ASCII sequences but this is not correct.
Of course not "correct". I guess the only "correct" representation is the
original Chinese character. But the OP doesn't seem to want this... so a
non-"correct" representation is necessary anyway.
How to do it in Python? Let chinesePhrase be a Unicode string with
Chinese content. Then
chinesePhrase_7bit = chinesePhrase.encode('utf-7')
will produce a sequences of bytes in the range 0..127 representing
chinesePhrase and *looking like* a (meaningless) ASCII sequence.
Actually, no. There are quite a few code positions in the range 0..127 that
don't "look like" anything (non-printable). And, as you say, this is rather
meaningless.
chinesePhrase_16bit = chinesePhrase.encode('utf-16be')
will produce a sequence with Unicode numbers packed in a byte
string in big endian order. This is probably closest to what
the OP wants.
That's what you think... but it's not really ASCII. If you want this in
ASCII, and readable, I still suggest to transform this sequence of 2-byte
values (for Chinese characters it will be 2 bytes per character) into a
sequence of something like U+81EC (or 0x81EC if you are a C fan or 81EC if
you can imply the rest)... that's where we come back to my original
suggestion :)
Gerhard
many_years_after wrote:
hi:
what I want to do is just to make numbers as people input some Chinese
character(hanzi,i mean).The same character will create the same
number.So I think ascii code can do this very well.
Possibly you have "create" upside-down. Could you possibly be talking
about an "input method", in which people type in ascii letters (and
maybe numbers) and the *result* is a Chinese character? In other words,
what *everybody* uses to input Chinese characters?
Perhaps you could ask on the Chinese Python newsgroup.
*GIVE* *EXAMPLES* of what you want to do.
John Machin wrote:
many_years_after wrote:
hi:
what I want to do is just to make numbers as people input some Chinese
character(hanzi,i mean).The same character will create the same
number.So I think ascii code can do this very well.
Possibly you have "create" upside-down. Could you possibly be talking
about an "input method", in which people type in ascii letters (and
maybe numbers) and the *result* is a Chinese character? In other words,
what *everybody* uses to input Chinese characters?
Perhaps you could ask on the Chinese Python newsgroup.
*GIVE* *EXAMPLES* of what you want to do.
Well, people may input from keyboard. They input some Chinese
characters, then, I want to create a number. The same number will be
created if they input the same Chinese characters.
"many_years_after" <sh*****@gmail.comwrites:
Well, people may input from keyboard. They input some Chinese
characters, then, I want to create a number. The same number will be
created if they input the same Chinese characters.
You seem to be looking for a hash.
<URL:http://docs.python.org/lib/module-md5>
<URL:http://docs.python.org/lib/module-sha>
If not, please tell us what your *purpose* is. It's not at all clear
from your questions what you are trying to achieve.
--
\ "I was in a bar the other night, hopping from barstool to |
`\ barstool, trying to get lucky, but there wasn't any gum under |
_o__) any of them." -- Emo Philips |
Ben Finney
Gerhard Fiedler wrote:
>No. ASCII characters range is 0..127 while Unicode characters range is at least 0..65535.
Actually, Unicode goes beyond 65535.
you may want to look up "at least" in a dictionary.
</F>
many_years_after wrote:
Well, people may input from keyboard. They input some Chinese
characters, then, I want to create a number. The same number will be
created if they input the same Chinese characters.
assuming you mean "code point" rather than "ASCII code" (ASCII is a
specific encoding that *doesn't* include Chinese characters), "ord" is
what you want:
char = read_from_some_input_device()
code = ord(char)
see: http://pyref.infogami.com/ord
</F>
In message <ma***************************************@python. org>, Fredrik
Lundh wrote:
Gerhard Fiedler wrote:
>>No. ASCII characters range is 0..127 while Unicode characters range is at least 0..65535.
Actually, Unicode goes beyond 65535.
you may want to look up "at least" in a dictionary.
Maybe you need to do the same for "actually".
On 2006-08-20 05:56:05, Fredrik Lundh wrote:
>>No. ASCII characters range is 0..127 while Unicode characters range is at least 0..65535.
Actually, Unicode goes beyond 65535.
you may want to look up "at least" in a dictionary.
As a homework, try to parse "at least until" and "goes beyond" and compare
the two (a dictionary is not necessarily of help with this :)
"range is least 0..65535" : upper_bound >= 65535
"goes beyond 65535" : upper_bound 65535
For some discussions (like how to represent code points etc) this
distinction is crucial.
Gerhard
Gerhard Fiedler wrote:
>>Actually, Unicode goes beyond 65535.
you may want to look up "at least" in a dictionary.
As a homework, try to parse "at least until" and "goes beyond" and compare
the two (a dictionary is not necessarily of help with this :)
"range is least 0..65535" : upper_bound >= 65535
"goes beyond 65535" : upper_bound 65535
For some discussions (like how to represent code points etc) this
distinction is crucial.
do you know anything about how Unicode is used in real life, or are you
just squabbling ?
</F>
On 2006-08-20 10:31:20, Fredrik Lundh wrote:
>"range is least 0..65535" : upper_bound >= 65535 "goes beyond 65535" : upper_bound 65535
For some discussions (like how to represent code points etc) this distinction is crucial.
do you know anything about how Unicode is used in real life, or are you
just squabbling ?
Your point is?
Gerhard This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: K |
last post by:
I've an XML file in UTF-8.
It contains some chinese characters ( both simplified chinese and
traditional chinese).
In loading the XML file with MSXML parser, I used the below code to retrieve...
|
by: wtistang |
last post by:
I need some suggestions regarding a problem I am facing. I have a web
page (asp.net and C#) which containing Chinese characters. In my
web.config file I have:
<globalization...
|
by: c.verma |
last post by:
I have a web application. There is a page which has a datagrid on
it.The datagrid displays the data that comes from SAP. SAP sends the
chinese characters to this grid. Before I display CHinese...
|
by: lyudmilal |
last post by:
I have a list of last names that can be in different languages:
chinese, english, russian, german, etc. Different format need to be
applied only for last names in chinese. For this purpose, I...
|
by: st.frey |
last post by:
I've got a problem with importing chinese characters into a mysql-table
and have read several mailings but didn't find a solution.
i have a utf-8 text file that contains chinese characters. the...
|
by: Figmo |
last post by:
I'm having a problem working with foreign characters (well....foreign
to me anyway)
I have a textbox control on a form. The font is set to MS Arial
Unicode. If I use the Chinese input method...
|
by: Liang Chen |
last post by:
Hope you all had a nice weekend.
I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type...
|
by: Terry Reedy |
last post by:
Liang Chen wrote:
Start with the Unicode HOWTO in the HOWTOs part of the Manual set.
For 2.6
http://docs.python.org/howto/unicode.html
For 3.0, which has been updated in spite of the warning...
|
by: Flying Kite |
last post by:
Hi All,
I want to know how to print chinese characters on Zebra Printer, following code working fine with English string, but it's not working for Chinese string. It shows ASCII characters instead...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM).
In this month's session, the creator of the excellent VBE...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: Aftab Ahmad |
last post by:
So, I have written a code for a cmd called "Send WhatsApp Message" to open and send WhatsApp messaage. The code is given below.
Dim IE As Object
Set IE =...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: marcoviolo |
last post by:
Dear all,
I would like to implement on my worksheet an vlookup dynamic , that consider a change of pivot excel via win32com, from an external excel (without open it) and save the new file into a...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
| |