By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,846 Members | 1,803 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,846 IT Pros & Developers. It's quick & easy.

UTF Questions

P: n/a
I have a couple of questions about the UTF encodings.

The codecs module has constants definded for the UTF32 encoding, yet
this encoding isn't supported as a standard encoding. Why isn't it
supported ?

It possibly has something to do with my next question. I know that
unicode has (recently?) been expanded to include new character sets.
This means that the latest unicode standard can't be fully supported
with 2 bytes per character. As far as I know though, Python doesn't
(yet) support the extended version of unicode anyway ? Am I correct ?

Best Reagrds,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

Jul 18 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Fuzzyman wrote:
I have a couple of questions about the UTF encodings.

The codecs module has constants definded for the UTF32 encoding, yet
this encoding isn't supported as a standard encoding. Why isn't it
supported ?
Probably because there is little demand for it. The most widespread
unicode encodings are utf-8 and utf-16

It possibly has something to do with my next question. I know that
unicode has (recently?) been expanded to include new character sets.
This means that the latest unicode standard can't be fully supported
with 2 bytes per character. As far as I know though, Python doesn't
(yet) support the extended version of unicode anyway ? Am I correct ?


Python does support them. PEP 261 has the answers for your questions.

Serge.

Jul 18 '05 #2

P: n/a
Thanks Serge.

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml

Jul 18 '05 #3

P: n/a
Fuzzyman wrote:
Thanks Serge.


You're welcome. While we at it, iconvcodec supports utf-32 and more. I have sent
a 2.4 windows build of iconvcodec module to the author. He promised to publish it
soon.

Serge.
Jul 18 '05 #4

P: n/a
Fuzzyman wrote:
The codecs module has constants definded for the UTF32 encoding, yet
this encoding isn't supported as a standard encoding. Why isn't it
supported ?


Because nobody has contributed such an implementation.

Notice that this is really trivial to implement, with very few lines
of pure Python code. In fact, given a Unicode string s, the line

codecs.BOM_UTF32+array.array("i",map(ord,s)).tostr ing()

generates UTF-32 for the string s. Creating a codec on top of this
approach is left as an exercise for the reader.

Regards,
Martin
Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.