By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,330 Members | 1,151 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,330 IT Pros & Developers. It's quick & easy.

unicode converting

P: n/a

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings (UTF-8, UTF-16 ...)
2. how to convert string to UCS-2

(Python 2.2.3 on freebsd4)
--
Best regards,
Maxim
Jul 18 '05 #1
Share this Question
Share on Google+
10 Replies

P: n/a
Maxim Kasimov wrote:

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings
(UTF-8, UTF-16 ...)
It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to utf-8
or utf-16.
2. how to convert string to UCS-2


s = ... # some ucs-2 string
s.decode("utf-16")

might give you the right results for most cases:

http://mail.python.org/pipermail/pyt...ay/024193.html
--
Regards,

Diez B. Roggisch
Jul 18 '05 #2

P: n/a
Diez B. Roggisch wrote:
Maxim Kasimov wrote:

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings
(UTF-8, UTF-16 ...)

It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to utf-8
or utf-16.

is that means that python internal unicode format is ucs2 or ucs4?
i'm concerning with the qustion because i need to send data to external
application in ucs2 encoding
2. how to convert string to UCS-2

s = ... # some ucs-2 string
s.decode("utf-16")

not _from_ ucs2, but _to_ ucs2, for example:
s = ... # some utf-16 string
d = encode_to_ucs2(s)

might give you the right results for most cases:

http://mail.python.org/pipermail/pyt...ay/024193.html

--
Best regards,
Maxim
Jul 18 '05 #3

P: n/a
On Tue, 15 Mar 2005 18:54:20 +0200, rumours say that Maxim Kasimov
<ka*****@i.com.ua> might have written:
It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to utf-8
or utf-16.
is that means that python internal unicode format is ucs2 or ucs4?
i'm concerning with the qustion because i need to send data to external
application in ucs2 encoding


If unicode_data references your unicode data, all you have to send is:

unicode_data.encode('utf-16') # maybe utf-16be for network order

You should not care about internal encoding of unicode objects.

--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #4

P: n/a
Maxim Kasimov wrote:
Diez B. Roggisch wrote:
Maxim Kasimov wrote:

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings
(UTF-8, UTF-16 ...)


It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to
utf-8
or utf-16.


is that means that python internal unicode format is ucs2 or ucs4?
i'm concerning with the qustion because i need to send data to external
application in ucs2 encoding


The internal format Python stores Unicode strings in is an
implementation detail; it has nothing to do with how you send data. To
do that, you encode your string into a suitable encoding:
s = u"Some Unicode text."
s u'Some Unicode text.' s.encode('utf-16')

'\xff\xfeS\x00o\x00m\x00e\x00 \x00U\x00n\x00i\x00c\x00o\x00d\x00e\x00
\x00t\x00e\x00x\x00t\x00.\x00'
Jul 18 '05 #5

P: n/a
Christos TZOTZIOY Georgiou wrote:

If unicode_data references your unicode data, all you have to send is:

unicode_data.encode('utf-16') # maybe utf-16be for network order

is utf-16 string the same ucs-2? my question is how to get string encoded as UCS-2

--
Best regards,
Maxim
Jul 18 '05 #6

P: n/a
Maxim Kasimov wrote:
Christos TZOTZIOY Georgiou wrote:

If unicode_data references your unicode data, all you have to send is:
unicode_data.encode('utf-16') # maybe utf-16be for network order

is utf-16 string the same ucs-2? my question is how to get string
encoded as UCS-2


utf-16 is basically a superset of ucs-2. See here for more detail:
http://www.azillionmonkeys.com/qed/unicode.html
If you ensure that ord() of each output character is < 0x10000
you'll get valid ucs-2 output if you use utf-16 encoding. If you
build python with --enable-unicode=ucs2 no character can be >= 0x10000
so you don't have to check. On the other 1) you won't be able even to
input characters >= 0x10000 into your application and 2) premature
optimization is bad and 3) There is a note in README: To compile
Python2.3 with Tkinter, you will need to pass --enable-unicode=ucs4
flag to ./configure

Serge.

Jul 18 '05 #7

P: n/a
On 16 Mar 2005 02:53:12 -0800, rumours say that "Serge Orlov"
<Se*********@gmail.com> might have written:
3) There is a note in README: To compile
Python2.3 with Tkinter, you will need to pass --enable-unicode=ucs4
flag to ./configure


I thought this applied to Tkinter as pre-built on recent RedHat systems. Does
it also apply to FreeBSD? On Windoze, Mandrake and SuSE python has UCS-2
unicode and Tkinter is working just fine.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #8

P: n/a
Serge Orlov wrote:

utf-16 is basically a superset of ucs-2. See here for more detail:
http://www.azillionmonkeys.com/qed/unicode.html
If you ensure that ord() of each output character is < 0x10000
you'll get valid ucs-2 output if you use utf-16 encoding. If you
build python with --enable-unicode=ucs2 no character can be >= 0x10000
so you don't have to check.


thank you very match! that's exactly what i need

--
Best regards,
Maxim
Jul 18 '05 #9

P: n/a
Christos TZOTZIOY Georgiou wrote:
On 16 Mar 2005 02:53:12 -0800, rumours say that "Serge Orlov"
<Se*********@gmail.com> might have written:
3) There is a note in README: To compile
Python2.3 with Tkinter, you will need to pass --enable-unicode=ucs4
flag to ./configure
I thought this applied to Tkinter as pre-built on recent RedHat
systems. Does it also apply to FreeBSD?


I don't know. I didn't notice that it was about RedHat.
On Windoze, Mandrake and SuSE python has UCS-2
unicode and Tkinter is working just fine.


Did you build python on Mandrake and SuSE yourself? I had an impression
that ucs-4 builds are prefered on Linux. At least python on RedHat EL3
and SUSE ES9 is built with --enable-unicode=ucs4.

Serge.

Jul 18 '05 #10

P: n/a
On 16 Mar 2005 04:21:16 -0800, rumours say that "Serge Orlov"
<Se*********@gmail.com> might have written:
On Windoze, Mandrake and SuSE python has UCS-2
unicode and Tkinter is working just fine.


Did you build python on Mandrake and SuSE yourself? I had an impression
that ucs-4 builds are prefered on Linux. At least python on RedHat EL3
and SUSE ES9 is built with --enable-unicode=ucs4.


tzot@tril/home/tzot/tmp
$ py
Python 2.4 (#8, Mar 2 2005, 11:12:44)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
..>> import sys
..>> sys.maxunicode
65535
..>> import Tkinter
..>>
You have new mail in /var/mail/tzot
tzot@tril/home/tzot/tmp
$ python
Python 2.3.3 (#1, Aug 31 2004, 13:51:39)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
..>> import sys, Tkinter
..>> sys.maxunicode
1114111
..>>

2.4 built by me, 2.3.3 by SuSE.

I see. So on SuSE 9.1 professional too, Python and Tcl/Tk are pre-built with
ucs-4. My Mandrake installation is at home and I can't check now. Sorry for
the misinformation about SuSE.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.