473,395 Members | 1,441 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

unicode converting


there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings (UTF-8, UTF-16 ...)
2. how to convert string to UCS-2

(Python 2.2.3 on freebsd4)
--
Best regards,
Maxim
Jul 18 '05 #1
10 8473
Maxim Kasimov wrote:

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings
(UTF-8, UTF-16 ...)
It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to utf-8
or utf-16.
2. how to convert string to UCS-2


s = ... # some ucs-2 string
s.decode("utf-16")

might give you the right results for most cases:

http://mail.python.org/pipermail/pyt...ay/024193.html
--
Regards,

Diez B. Roggisch
Jul 18 '05 #2
Diez B. Roggisch wrote:
Maxim Kasimov wrote:

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings
(UTF-8, UTF-16 ...)

It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to utf-8
or utf-16.

is that means that python internal unicode format is ucs2 or ucs4?
i'm concerning with the qustion because i need to send data to external
application in ucs2 encoding
2. how to convert string to UCS-2

s = ... # some ucs-2 string
s.decode("utf-16")

not _from_ ucs2, but _to_ ucs2, for example:
s = ... # some utf-16 string
d = encode_to_ucs2(s)

might give you the right results for most cases:

http://mail.python.org/pipermail/pyt...ay/024193.html

--
Best regards,
Maxim
Jul 18 '05 #3
On Tue, 15 Mar 2005 18:54:20 +0200, rumours say that Maxim Kasimov
<ka*****@i.com.ua> might have written:
It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to utf-8
or utf-16.
is that means that python internal unicode format is ucs2 or ucs4?
i'm concerning with the qustion because i need to send data to external
application in ucs2 encoding


If unicode_data references your unicode data, all you have to send is:

unicode_data.encode('utf-16') # maybe utf-16be for network order

You should not care about internal encoding of unicode objects.

--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #4
Maxim Kasimov wrote:
Diez B. Roggisch wrote:
Maxim Kasimov wrote:

there are a few questions i can find answer in manual:
1. how to define which is internal encoding of python unicode strings
(UTF-8, UTF-16 ...)


It shouldn't be your concern - but you can specify it using " ./configure
--enable-unicode=ucs2" or --enable-unicode=ucs4. You can't set it to
utf-8
or utf-16.


is that means that python internal unicode format is ucs2 or ucs4?
i'm concerning with the qustion because i need to send data to external
application in ucs2 encoding


The internal format Python stores Unicode strings in is an
implementation detail; it has nothing to do with how you send data. To
do that, you encode your string into a suitable encoding:
s = u"Some Unicode text."
s u'Some Unicode text.' s.encode('utf-16')

'\xff\xfeS\x00o\x00m\x00e\x00 \x00U\x00n\x00i\x00c\x00o\x00d\x00e\x00
\x00t\x00e\x00x\x00t\x00.\x00'
Jul 18 '05 #5
Christos TZOTZIOY Georgiou wrote:

If unicode_data references your unicode data, all you have to send is:

unicode_data.encode('utf-16') # maybe utf-16be for network order

is utf-16 string the same ucs-2? my question is how to get string encoded as UCS-2

--
Best regards,
Maxim
Jul 18 '05 #6
Maxim Kasimov wrote:
Christos TZOTZIOY Georgiou wrote:

If unicode_data references your unicode data, all you have to send is:
unicode_data.encode('utf-16') # maybe utf-16be for network order

is utf-16 string the same ucs-2? my question is how to get string
encoded as UCS-2


utf-16 is basically a superset of ucs-2. See here for more detail:
http://www.azillionmonkeys.com/qed/unicode.html
If you ensure that ord() of each output character is < 0x10000
you'll get valid ucs-2 output if you use utf-16 encoding. If you
build python with --enable-unicode=ucs2 no character can be >= 0x10000
so you don't have to check. On the other 1) you won't be able even to
input characters >= 0x10000 into your application and 2) premature
optimization is bad and 3) There is a note in README: To compile
Python2.3 with Tkinter, you will need to pass --enable-unicode=ucs4
flag to ./configure

Serge.

Jul 18 '05 #7
On 16 Mar 2005 02:53:12 -0800, rumours say that "Serge Orlov"
<Se*********@gmail.com> might have written:
3) There is a note in README: To compile
Python2.3 with Tkinter, you will need to pass --enable-unicode=ucs4
flag to ./configure


I thought this applied to Tkinter as pre-built on recent RedHat systems. Does
it also apply to FreeBSD? On Windoze, Mandrake and SuSE python has UCS-2
unicode and Tkinter is working just fine.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #8
Serge Orlov wrote:

utf-16 is basically a superset of ucs-2. See here for more detail:
http://www.azillionmonkeys.com/qed/unicode.html
If you ensure that ord() of each output character is < 0x10000
you'll get valid ucs-2 output if you use utf-16 encoding. If you
build python with --enable-unicode=ucs2 no character can be >= 0x10000
so you don't have to check.


thank you very match! that's exactly what i need

--
Best regards,
Maxim
Jul 18 '05 #9
Christos TZOTZIOY Georgiou wrote:
On 16 Mar 2005 02:53:12 -0800, rumours say that "Serge Orlov"
<Se*********@gmail.com> might have written:
3) There is a note in README: To compile
Python2.3 with Tkinter, you will need to pass --enable-unicode=ucs4
flag to ./configure
I thought this applied to Tkinter as pre-built on recent RedHat
systems. Does it also apply to FreeBSD?


I don't know. I didn't notice that it was about RedHat.
On Windoze, Mandrake and SuSE python has UCS-2
unicode and Tkinter is working just fine.


Did you build python on Mandrake and SuSE yourself? I had an impression
that ucs-4 builds are prefered on Linux. At least python on RedHat EL3
and SUSE ES9 is built with --enable-unicode=ucs4.

Serge.

Jul 18 '05 #10
On 16 Mar 2005 04:21:16 -0800, rumours say that "Serge Orlov"
<Se*********@gmail.com> might have written:
On Windoze, Mandrake and SuSE python has UCS-2
unicode and Tkinter is working just fine.


Did you build python on Mandrake and SuSE yourself? I had an impression
that ucs-4 builds are prefered on Linux. At least python on RedHat EL3
and SUSE ES9 is built with --enable-unicode=ucs4.


tzot@tril/home/tzot/tmp
$ py
Python 2.4 (#8, Mar 2 2005, 11:12:44)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
..>> import sys
..>> sys.maxunicode
65535
..>> import Tkinter
..>>
You have new mail in /var/mail/tzot
tzot@tril/home/tzot/tmp
$ python
Python 2.3.3 (#1, Aug 31 2004, 13:51:39)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
..>> import sys, Tkinter
..>> sys.maxunicode
1114111
..>>

2.4 built by me, 2.3.3 by SuSE.

I see. So on SuSE 9.1 professional too, Python and Tcl/Tk are pre-built with
ucs-4. My Mandrake installation is at home and I can't check now. Sorry for
the misinformation about SuSE.
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
Jul 18 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: François Pinard | last post by:
Hi, people. I hope someone would like to enlighten me. For any application handling Unicode internally, I'm usually careful at properly converting those Unicode strings into 8-bit strings before...
3
by: hunterb | last post by:
I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a...
1
by: Daman | last post by:
Hi, I am currently facing difficulty displaying chinese, japanese, russian etc. characters. I am using VB 6 and ADO to query the DB2 Version 7.2 unicode database (UTF-8). The resultset that...
8
by: Alphaboomer | last post by:
I'm using the following code to retrieve a list of all the Categories used by Microsoft Outlook: sub test() Dim objWSHShell As Object Dim strCategoryList As Variant Set objWSHShell =...
5
by: Sonu | last post by:
Hello everyone and thanks in advance. I have a multilingual application which has been built in MFC VC++ 6.0 (non-Unicode). It support English German Hungarian so far, which has been fine. But...
10
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
22
by: Filipe | last post by:
Hi all, I'm starting to learn python but am having some difficulties with how it handles the encoding of data I'm reading from a database. I'm using pymssql to access data stored in a SqlServer...
1
by: willie | last post by:
>willie wrote: wrote:
2
by: John Nagle | last post by:
Here's a strange little bug. "socket.getaddrinfo" blows up if given a bad domain name containing ".." in Unicode. The same string in ASCII produces the correct "gaierror" exception. Actually,...
2
by: Nikola Skoric | last post by:
What I have is a bunch of text in arabic, and series of Unicode bytes which represent those arabic words (like this: \'c2\'e4\'f6\'d3\'f3\'c9 \'f1). Now I have to figure out how to convert my...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.