By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,403 Members | 1,089 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,403 IT Pros & Developers. It's quick & easy.

unicode, C++, python 2.2

P: n/a
I am currently writing a python interface to a C++ library. Some of the
functions in this library take unicode strings (UTF-8, mostly) as arguments.

However, when getting these data I run into problem on python 2.2
(RHEL3) - while the data is all nice UCS4 in 2.3, in 2.2 it seems to be
UTF-8 on top of UCS4. UTF8 encoded in UCS4, meaning that 3 bytes of the
UCS4 char is 0 and the first one contains a byte of the string encoding
in UTF-8.

Is there a trick to get python 2.2 to do UCS4 more cleanly?

--
Trond Eivind Glomsrød
Senior Software Engineer
Scali - www.scali.com
Scaling the Linux Datacenter

Sep 9 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Trond Eivind Glomsrød wrote:
I am currently writing a python interface to a C++ library. Some of the
functions in this library take unicode strings (UTF-8, mostly) as
arguments.

However, when getting these data I run into problem on python 2.2
(RHEL3) - while the data is all nice UCS4 in 2.3, in 2.2 it seems to be
UTF-8 on top of UCS4. UTF8 encoded in UCS4, meaning that 3 bytes of the
UCS4 char is 0 and the first one contains a byte of the string encoding
in UTF-8.

Is there a trick to get python 2.2 to do UCS4 more cleanly?


It's hard to tell from your message what your problem really is, as we
have not clue what "these data" are. How do you know they are "nice
UCS4" in 2.3? Are you looking at the internal representation at the
C level, or are you looking at something else? Do you use byte strings
or Unicode strings?

You tried to explain what "UTF8 encoded in UCS4" might be, but I'm
not sure I understand the explanation: what precise sequence of
statements did you use to create such a thing, and what precisely
does it look like (what exact byte is first, what is second, and so
on)?

Regards,
Martin
Sep 11 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.