472,146 Members | 1,461 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,146 software developers and data experts.

unicode converting


there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings (UTF-8, UTF-16 ...)
2. how to convert string to UCS-2

(Python 2.2.3 on freebsd4)
--
Best regards,
Maxim
Jul 18 '05 #1
10 8244
Maxim Kasimov wrote:

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings
(UTF-8, UTF-16 ...)
It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to utf-8
or utf-16.
2. how to convert string to UCS-2


s = ... # some ucs-2 string
s.decode("utf-16")

might give you the right results for most cases:

http://mail.python.org/pipermail/pyt...ay/024193.html
--
Regards,

Diez B. Roggisch
Jul 18 '05 #2
Diez B. Roggisch wrote:
Maxim Kasimov wrote:

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings
(UTF-8, UTF-16 ...)

It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to utf-8
or utf-16.

is that means that python internal unicode format is ucs2 or ucs4?
i'm concerning with the qustion because i need to send data to external
application in ucs2 encoding
2. how to convert string to UCS-2

s = ... # some ucs-2 string
s.decode("utf-16")

not _from_ ucs2, but _to_ ucs2, for example:
s = ... # some utf-16 string
d = encode_to_ucs2(s)

might give you the right results for most cases:

http://mail.python.org/pipermail/pyt...ay/024193.html

--
Best regards,
Maxim
Jul 18 '05 #3
On Tue, 15 Mar 2005 18:54:20 +0200, rumours say that Maxim Kasimov
<ka*****@i.com.ua> might have written:
It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to utf-8
or utf-16.
is that means that python internal unicode format is ucs2 or ucs4?
i'm concerning with the qustion because i need to send data to external
application in ucs2 encoding


If unicode_data references your unicode data, all you have to send is:

unicode_data.encode('utf-16') # maybe utf-16be for network order

You should not care about internal encoding of unicode objects.

--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #4
Maxim Kasimov wrote:
Diez B. Roggisch wrote:
Maxim Kasimov wrote:

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings
(UTF-8, UTF-16 ...)


It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to
utf-8
or utf-16.


is that means that python internal unicode format is ucs2 or ucs4?
i'm concerning with the qustion because i need to send data to external
application in ucs2 encoding


The internal format Python stores Unicode strings in is an
implementation detail; it has nothing to do with how you send data. To
do that, you encode your string into a suitable encoding:
s = u"Some Unicode text."
s u'Some Unicode text.' s.encode('utf-16')

'\xff\xfeS\x00o\x00m\x00e\x00 \x00U\x00n\x00i\x00c\x00o\x00d\x00e\x00
\x00t\x00e\x00x\x00t\x00.\x00'
Jul 18 '05 #5
Christos TZOTZIOY Georgiou wrote:

If unicode_data references your unicode data, all you have to send is:

unicode_data.encode('utf-16') # maybe utf-16be for network order

is utf-16 string the same ucs-2? my question is how to get string encoded as UCS-2

--
Best regards,
Maxim
Jul 18 '05 #6
Maxim Kasimov wrote:
Christos TZOTZIOY Georgiou wrote:

If unicode_data references your unicode data, all you have to send is:
unicode_data.encode('utf-16') # maybe utf-16be for network order

is utf-16 string the same ucs-2? my question is how to get string
encoded as UCS-2


utf-16 is basically a superset of ucs-2. See here for more detail:
http://www.azillionmonkeys.com/qed/unicode.html
If you ensure that ord() of each output character is < 0x10000
you'll get valid ucs-2 output if you use utf-16 encoding. If you
build python with --enable-unicode=ucs2 no character can be >= 0x10000
so you don't have to check. On the other 1) you won't be able even to
input characters >= 0x10000 into your application and 2) premature
optimization is bad and 3) There is a note in README: To compile
Python2.3 with Tkinter, you will need to pass --enable-unicode=ucs4
flag to ./configure

Serge.

Jul 18 '05 #7
On 16 Mar 2005 02:53:12 -0800, rumours say that "Serge Orlov"
<Se*********@gmail.com> might have written:
3) There is a note in README: To compile
Python2.3 with Tkinter, you will need to pass --enable-unicode=ucs4
flag to ./configure


I thought this applied to Tkinter as pre-built on recent RedHat systems. Does
it also apply to FreeBSD? On Windoze, Mandrake and SuSE python has UCS-2
unicode and Tkinter is working just fine.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #8
Serge Orlov wrote:

utf-16 is basically a superset of ucs-2. See here for more detail:
http://www.azillionmonkeys.com/qed/unicode.html
If you ensure that ord() of each output character is < 0x10000
you'll get valid ucs-2 output if you use utf-16 encoding. If you
build python with --enable-unicode=ucs2 no character can be >= 0x10000
so you don't have to check.


thank you very match! that's exactly what i need

--
Best regards,
Maxim
Jul 18 '05 #9
Christos TZOTZIOY Georgiou wrote:
On 16 Mar 2005 02:53:12 -0800, rumours say that "Serge Orlov"
<Se*********@gmail.com> might have written:
3) There is a note in README: To compile
Python2.3 with Tkinter, you will need to pass --enable-unicode=ucs4
flag to ./configure
I thought this applied to Tkinter as pre-built on recent RedHat
systems. Does it also apply to FreeBSD?


I don't know. I didn't notice that it was about RedHat.
On Windoze, Mandrake and SuSE python has UCS-2
unicode and Tkinter is working just fine.


Did you build python on Mandrake and SuSE yourself? I had an impression
that ucs-4 builds are prefered on Linux. At least python on RedHat EL3
and SUSE ES9 is built with --enable-unicode=ucs4.

Serge.

Jul 18 '05 #10
On 16 Mar 2005 04:21:16 -0800, rumours say that "Serge Orlov"
<Se*********@gmail.com> might have written:
On Windoze, Mandrake and SuSE python has UCS-2
unicode and Tkinter is working just fine.


Did you build python on Mandrake and SuSE yourself? I had an impression
that ucs-4 builds are prefered on Linux. At least python on RedHat EL3
and SUSE ES9 is built with --enable-unicode=ucs4.


tzot@tril/home/tzot/tmp
$ py
Python 2.4 (#8, Mar 2 2005, 11:12:44)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
..>> import sys
..>> sys.maxunicode
65535
..>> import Tkinter
..>>
You have new mail in /var/mail/tzot
tzot@tril/home/tzot/tmp
$ python
Python 2.3.3 (#1, Aug 31 2004, 13:51:39)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
..>> import sys, Tkinter
..>> sys.maxunicode
1114111
..>>

2.4 built by me, 2.3.3 by SuSE.

I see. So on SuSE 9.1 professional too, Python and Tcl/Tk are pre-built with
ucs-4. My Mandrake installation is at home and I can't check now. Sorry for
the misinformation about SuSE.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by François Pinard | last post: by
5 posts views Thread by Sonu | last post: by
10 posts views Thread by Nikolay Petrov | last post: by
22 posts views Thread by Filipe | last post: by
1 post views Thread by willie | last post: by
2 posts views Thread by Nikola Skoric | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.