By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,995 Members | 1,217 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,995 IT Pros & Developers. It's quick & easy.

Dr. Dobb's Python-URL! - weekly Python news and links (Dec 30)

P: n/a
QOTW: "I found the discussion of unicode, in any python book I have,
insufficient." -- Thomas Heller

"If you develop on a Mac, ... Objective-C could come in handy. . . .
PyObjC makes mixing the two languages dead easy and more convenient than
indoor plumbing." -- Robert Kern
Among other activities, the PSF aggregates donors with dollars
destined to do good Python works, and developers expert in
obscure corners of Pythonia.
http://groups-beta.google.com/group/...5bfe05419aa0b3
http://groups-beta.google.com/group/...22f3e14752ce5/

Yippee! The martellibot promises to explain Unicode for Pythoneers.
http://groups-beta.google.com/group/...15a5a05c206712

The glorious SciPy project supports *multiple* worthwhile Wikis.
http://www.scipy.org/wikis

Good style in Python does not generally include "in-place"
operations on lists. Several cleaner idioms are possible.
http://groups-beta.google.com/group/...4559f53d25474e

Assume you're comfortable with tuples' semantics, immutability,
and so on. Do you correctly understand the basics of their
syntax, though? This is another opportunity to think about
Unicode, by the way.
http://groups-beta.google.com/group/...0049d7adb1bcce

Robert Kern, Paul Rubin, Mike Meyer, Alex Martelli, and others
provide disproportionately high-quality advice (and tangents!)
on the subject of languages which complement Python.
http://groups-beta.google.com/group/...c1c6d9d87049b6
================================================== ======================
Everything Python-related you want is probably one or two clicks away in
these pages:

Python.org's Python Language Website is the traditional
center of Pythonia
http://www.python.org
Notice especially the master FAQ
http://www.python.org/doc/FAQ.html

PythonWare complements the digest you're reading with the
marvelous daily python url
http://www.pythonware.com/daily
Mygale is a news-gathering webcrawler that specializes in (new)
World-Wide Web articles related to Python.
http://www.awaretek.com/nowak/mygale.html
While cosmetically similar, Mygale and the Daily Python-URL
are utterly different in their technologies and generally in
their results.

comp.lang.python.announce announces new Python software. Be
sure to scan this newsgroup weekly.
http://groups.google.com/groups?oi=d...ython.announce

Brett Cannon continues the marvelous tradition established by
Andrew Kuchling and Michael Hudson of intelligently summarizing
action on the python-dev mailing list once every other week.
http://www.python.org/dev/summary/

The Python Package Index catalogues packages.
http://www.python.org/pypi/

The somewhat older Vaults of Parnassus ambitiously collects references
to all sorts of Python resources.
http://www.vex.net/~x/parnassus/

Much of Python's real work takes place on Special-Interest Group
mailing lists
http://www.python.org/sigs/

The Python Business Forum "further[s] the interests of companies
that base their business on ... Python."
http://www.python-in-business.org

Python Success Stories--from air-traffic control to on-line
match-making--can inspire you or decision-makers to whom you're
subject with a vision of what the language makes practical.
http://www.pythonology.com/success

The Python Software Foundation (PSF) has replaced the Python
Consortium as an independent nexus of activity. It has official
responsibility for Python's development and maintenance.
http://www.python.org/psf/
Among the ways you can support PSF is with a donation.
http://www.python.org/psf/donate.html

Kurt B. Kaiser publishes a weekly report on faults and patches.
http://www.google.com/groups?as_usub...python%20patch

Cetus collects Python hyperlinks.
http://www.cetus-links.org/oo_python.html

Python FAQTS
http://python.faqts.com/

The Cookbook is a collaborative effort to capture useful and
interesting recipes.
http://aspn.activestate.com/ASPN/Cookbook/Python

Among several Python-oriented RSS/RDF feeds available are
http://www.python.org/channews.rdf
http://bootleg-rss.g-blog.net/pythonware_com_daily.pcgi
http://python.de/backend.php
For more, see
http://www.syndic8.com/feedlist.php?...ShowStatus=all
The old Python "To-Do List" now lives principally in a
SourceForge reincarnation.
http://sourceforge.net/tracker/?atid...70&func=browse
http://python.sourceforge.net/peps/pep-0042.html

The online Python Journal is posted at pythonjournal.cognizor.com.
ed****@pythonjournal.com and ed****@pythonjournal.cognizor.com
welcome submission of material that helps people's understanding
of Python use, and offer Web presentation of your work.

deli.cio.us presents an intriguing approach to reference commentary.
It already aggregates quite a bit of Python intelligence.
http://del.icio.us/tag/python

*Py: the Journal of the Python Language*
http://www.pyzine.com

Archive probing tricks of the trade:
http://groups.google.com/groups?oi=d...python&num=100
http://groups.google.com/groups?meta....lang.python.*

Previous - (U)se the (R)esource, (L)uke! - messages are listed here:
http://www.ddj.com/topics/pythonurl/
http://purl.org/thecliff/python/url.html (dormant)
or
http://groups.google.com/groups?oi=djq&as_q=+Python-URL!&as_ugroup=comp.lang.python
Suggestions/corrections for next week's posting are always welcome.
E-mail to <Py********@phaseit.net> should get through.

To receive a new issue of this posting in e-mail each Monday morning
(approximately), ask <cl****@phaseit.net> to subscribe. Mention
"Python-URL!".
-- The Python-URL! Team--

Dr. Dobb's Journal (http://www.ddj.com) is pleased to participate in and
sponsor the "Python-URL!" project.
Jul 18 '05 #1
Share this Question
Share on Google+
16 Replies


P: n/a
Cameron Laird <py********@phaseit.net> wrote:
...
Yippee! The martellibot promises to explain Unicode for Pythoneers.
http://groups-beta.google.com/group/...15a5a05c206712


Uh -- _did_ I? Eeep... I guess I did... mostly, I was pointing to
Holger Krekel's very nice recipe (not sure he posted it to the site as
well as submitting it for the printed edition, but, lobby _HIM_ about
that;-).
Alex
Jul 18 '05 #2

P: n/a
On Fri, Dec 31, 2004 at 19:18 +0100, Alex Martelli wrote:
Cameron Laird <py********@phaseit.net> wrote:
...
Yippee! The martellibot promises to explain Unicode for Pythoneers.
http://groups-beta.google.com/group/...15a5a05c206712


Uh -- _did_ I? Eeep... I guess I did... mostly, I was pointing to
Holger Krekel's very nice recipe (not sure he posted it to the site as
well as submitting it for the printed edition, but, lobby _HIM_ about
that;-).


FWIW, i added the recipe back to the online cookbook. It's not perfectly
formatted but still useful, i hope.

http://aspn.activestate.com/ASPN/Coo.../Recipe/361742

cheers,

holger

P.S: happy new year.
Jul 18 '05 #3

P: n/a
Holger:
FWIW, i added the recipe back to the online cookbook. It's not perfectly formatted but still useful, i hope. http://aspn.activestate.com/ASPN/Coo.../Recipe/361742


Uhm... on my system I get:
german_ae = unicode('\xc3\xa4', 'utf8')
print german_ae # dunno if it will appear right on Google groups
german_ae.decode('latin1')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
position 0: ordinal not in range(128)
?? What's wrong?

Michele Simionato

Jul 18 '05 #4

P: n/a
On Tue, 04 Jan 2005 05:43:32 -0800, michele.simionato wrote:
Holger:
FWIW, i added the recipe back to the online cookbook. It's not perfectly
formatted but still useful, i hope.

http://aspn.activestate.com/ASPN/Coo.../Recipe/361742


Uhm... on my system I get:
german_ae = unicode('\xc3\xa4', 'utf8')
print german_ae # dunno if it will appear right on Google groups
german_ae.decode('latin1')

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
position 0: ordinal not in range(128)
?? What's wrong?


I'd rather use german_ae.encode('latin1')
^^^^^^

which returns '\xe4'.
Michele Simionato


Jul 18 '05 #5

P: n/a
Stephan:
I'd rather use german_ae.encode('latin1') ^^^^^^ which returns '\xe4'.


uhm ... then there is a misprint in the discussion of the recipe;
BTW what's the difference between .encode and .decode ?
(yes, I have been living in happy ASCII-land until now ... ;)
I should probably ask for an unicode primer, I have found the
one by Marc Andr Lemburg
http://www.reportlab.com/i18n/python..._tutorial.html
and I am reading it right now.
Michele Simionato

Jul 18 '05 #6

P: n/a
In article <11*********************@c13g2000cwb.googlegroups. com>,
<mi***************@gmail.com> wrote:

BTW what's the difference between .encode and .decode ?
(yes, I have been living in happy ASCII-land until now ... ;)


Here's the stark simple recipe: when you use Unicode, you *MUST* switch
to a Unicode-centric view of the universe. Therefore you encode *FROM*
Unicode and you decode *TO* Unicode. Period. It's similar to the way
floating point contaminates ints.
--
Aahz (aa**@pythoncraft.com) <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing." --Alan Perlis
Jul 18 '05 #7

P: n/a

michele> BTW what's the difference between .encode and .decode ?

I started to answer, then got confused when I read the docstrings for
unicode.encode and unicode.decode:
help(u"\xe4".decode) Help on built-in function decode:

decode(...)
S.decode([encoding[,errors]]) -> string or unicode

Decodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registerd with codecs.register_error that is
able to handle UnicodeDecodeErrors.
help(u"\xe4".encode)

Help on built-in function encode:

encode(...)
S.encode([encoding[,errors]]) -> string or unicode

Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.

It probably makes sense to one who knows, but for the feeble-minded like
myself, they seem about the same.

I'd be happy to add a couple examples to the string methods section of the
docs if someone will produce something simple that makes the distinction
clear.

Skip

Jul 18 '05 #8

P: n/a
Yep, I did the same and got confused :-/

Michele

Jul 18 '05 #9

P: n/a
aahz> Here's the stark simple recipe: when you use Unicode, you *MUST*
aahz> switch to a Unicode-centric view of the universe. Therefore you
aahz> encode *FROM* Unicode and you decode *TO* Unicode. Period. It's
aahz> similar to the way floating point contaminates ints.

That's what I do in my code. Why do Unicode objects have a decode method
then?

Skip

Jul 18 '05 #10

P: n/a
Skip Montanaro <sk**@pobox.com> writes:
michele> BTW what's the difference between .encode and .decode ?

I started to answer, then got confused when I read the docstrings for
unicode.encode and unicode.decode:
>>> help(u"\xe4".decode) Help on built-in function decode:

decode(...)
S.decode([encoding[,errors]]) -> string or unicode

Decodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
as well as any other name registerd with codecs.register_error that is
able to handle UnicodeDecodeErrors.
>>> help(u"\xe4".encode) Help on built-in function encode:

encode(...)
S.encode([encoding[,errors]]) -> string or unicode

Encodes S using the codec registered for encoding. encoding defaults
to the default encoding. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.

It probably makes sense to one who knows, but for the feeble-minded like
myself, they seem about the same.


It seems also the error messages aren't too helpful:
"".encode("latin-1") Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: ordinal not in range(128)
Hm, why does the 'encode' call complain about decoding?

Why do string objects have an encode method, and why do unicode objects
have a decode method, and what does this error message want to tell me:
u"".decode("latin-1") Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 0: ordinal not in range(128)


Thomas
Jul 18 '05 #11

P: n/a
mi***************@gmail.com wrote:
uhm ... then there is a misprint in the discussion of the recipe;
BTW what's the difference between .encode and .decode ?
(yes, I have been living in happy ASCII-land until now ... ;)

# -*- coding: latin-1 -*-
# here i make a unicode string
unicode_file = u'Some danish characters ' #.encode('hex')
print type(unicode_file)
print repr(unicode_file)
print ''
# I can convert this unicode string to an ordinary string.
# because are in the latin-1 charmap it can be understood as
# a latin-1 string
# the characters even has the same value in both
latin1_file = unicode_file.encode('latin-1')
print type(latin1_file)
print repr(latin1_file)
print latin1_file
print ''
## I can *not* convert it to ascii
#ascii_file = unicode_file.encode('ascii')
#print ''
# I can also convert it to utf-8
utf8_file = unicode_file.encode('utf-8')
print type(utf8_file)
print repr(utf8_file)
print utf8_file
print ''
#utf8_file is now an ordinary string. again it can help to think of it
as a file
#format.
#
#I can convert this file/string back to unicode again by using the
decode method.
#It tells python to decode this "file format" as utf-8 when it loads it
onto a
#unicode string. And we are back where we started
unicode_file = utf8_file.decode('utf-8')
print type(unicode_file)
print repr(unicode_file)
print ''
# So basically you can encode a unicode string into a special
string/file format
# and you can decode a string from a special string/file format back
into unicode.
###################################
<type 'unicode'>
u'Some danish characters \xe6\xf8\xe5'

<type 'str'>
'Some danish characters \xe6\xf8\xe5'
Some danish characters

<type 'str'>
'Some danish characters \xc3\xa6\xc3\xb8\xc3\xa5'
Some danish characters æøå

<type 'unicode'>
u'Some danish characters \xe6\xf8\xe5'

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Jul 18 '05 #12

P: n/a
Thomas Heller wrote:
It seems also the error messages aren't too helpful:
"".encode("latin-1")


Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: ordinal not in range(128)

Hm, why does the 'encode' call complain about decoding?


Because it tries to print it out to your console and fail. While writing
to the console it tries to convert to ascii.

Beside, you should write:

u"".encode("latin-1") to get a latin-1 encoded string.
--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Jul 18 '05 #13

P: n/a
Max M <ma**@mxm.dk> writes:
Thomas Heller wrote:
It seems also the error messages aren't too helpful:
>"".encode("latin-1") Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: ordinal not in range(128)
Hm, why does the 'encode' call complain about decoding?


Because it tries to print it out to your console and fail. While
writing to the console it tries to convert to ascii.


Wrong, same error without trying to print something:
x = "".encode("latin-1") Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x84 in position 0: ordinal not in range(128)


Beside, you should write:

u"".encode("latin-1") to get a latin-1 encoded string.


I know, but the question was: why does a unicode string has a encode
method, and why does it complain about decoding (which has already been
answered in the meantime).

Thomas
Jul 18 '05 #14

P: n/a
Skip Montanaro wrote:
aahz> Here's the stark simple recipe: when you use Unicode, you *MUST*
aahz> switch to a Unicode-centric view of the universe. Therefore you
aahz> encode *FROM* Unicode and you decode *TO* Unicode. Period. It's
aahz> similar to the way floating point contaminates ints.

That's what I do in my code. Why do Unicode objects have a decode method
then?


Because MAL implemented it! >;->

It first encodes in the default encoding and then decodes the result
with the specified encoding, so if u is a unicode object
u.decode("utf-16")
is an abbreviation of
u.encode().decode("utf-16")

In the same way str has an encode method, so
s.encode("utf-16")
is an abbreviation of
s.decode().encode("utf-16")

Bye,
Walter Drwald
Jul 18 '05 #15

P: n/a
Skip Montanaro wrote:
I started to answer, then got confused when I read the docstrings for
unicode.encode and unicode.decode:

[snip]
It certainly is confusing. When I first started Unicoding, I pretty
much stuck to Aahz's rule of thumb, without understanding this details,
and still do that. But now I do undertstand it.

Although encodings are bijective (i.e., equivalent one-to-one
mappings), they are not apolar. One side of the encoding is
arbitrarily labeled the encoded form; the other is arbitrarily labeled
the decoded form. (This is not a relativistic system, here.) The
encode method maps from the decoded to the encoded set. The decode
method does the inverse.

That's it. The only real technical difference between encode and
decode is the direction they map in.

By convention, the decoded form is a Python unicode string, and the
encoded form is the byte string.

I believe it's technically possible (but very rude) to write an
"inverse encoding", where the "encoded" form is a unicode string, and
the decoded form is UTF-8 byte string.

Also, note that there are some encodings unrelated to Unicode. For
example, try this:

.. >>> "abcd".encode("base64")
This is an encoding between two byte strings.
--
CARL BANKS

Jul 18 '05 #16

P: n/a
Carl Banks wrote:
Also, note that there are some encodings unrelated to Unicode. For
example, try this:

. >>> "abcd".encode("base64")
This is an encoding between two byte strings.


Yes. This can be especially nice when you need to use restricted charsets.

I needed to use unicode objects as Zope ids. But Zope only accepts a
subset of ascii as ids.

So I used:
hex_id = u'INBOX'.encode('utf-8').encode('hex')
494e424f58
And I can get the unicode representation back with:

unicode_id = id.decode('hex').decode('utf-8')u'INBOX'


Tn that case id.decode('hex') doesn't return a unicode, but a utf-8
encoded string.

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
Jul 18 '05 #17

This discussion thread is closed

Replies have been disabled for this discussion.