472,142 Members | 1,046 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,142 software developers and data experts.

Patch to pydoc (partial) to handle encodings other than ascii

Hello Pythonists:
I am using SPE as python IDE on Windows, with Python 2.5.1 installed
(official distro). As my mother tongue is Spanish, I had documented
some modules in it (I now, I should have documented all in English,
except if I were 110% sure than nobody else would read my docs, but
they are only for in-house use). When I tried to use the pydoc tab
that SPE attaches to every source file, I only found a message saying
that my accented text coud not be decoded.

Browsing the SPE's sources, I found that pydoc's HTMLDoc class could
not handle the non-ascii characters. Then patched the Doc's class (the
parent of HTMLDoc) code to look for the encoding declared in the
source of the module to document, and (in HTMLDoc) decode the source
with it. As the HTML file writer function used the same class and
choked when writing the file, reencoded the text with the same
encoding on writing.

As I could not find the mail of pydoc's maintainer (the source code
states that the autor is Ka-Ping Yee, but the original date is from
2001, and I could not find if he is still maintaining it), I want to
make this patch available so can be possible to use pydoc on non-ascii
sources (at least to generate programmatically HTML documentation). If
the solution is useful (please don't hesitate in criticize it), may be
can be incorporated on a future pydoc version.

I don't know how to make a patch file (I usually don't do co-op
programming, but use to code as a hobby), but of course I don't even
think of sending 90 k of code to the newsgroup, so I am sending the
modified code here, with the indication of where do the modifications:

After line 323, replace

if inspect.ismodule(object): return self.docmodule(*args)

with:

if inspect.ismodule(object):
remarks = inspect.getcomments(object)
start = remarks.find(' -*- coding: ') + 13
if start == 12:
start = remarks.find('# vim:fileencoding=') + 19
if start == 18:
if inspect.getsource(object)[:3] == '\xef\xbb\xbf':
self.encoding = 'utf_8'
else:
self.encoding = sys.getdefaultencoding()
else:
end = remarks.find(' ', start)
self.encoding = remarks[start:end]
else:
end = remarks.find('-*-', start)
self.encoding = remarks[start:end].strip()
return self.docmodule(*args)

After the line 421 (moved to 437 with the previous insert), insert

title = title.decode(self.encoding)
contents = contents.decode(self.encoding)

And finally replace line 1491 (now 1509):

file.write(page)

with:

file.write(page.encode(html.encoding))

The code don't solves the encoding issue on consoles (just try to
document utf-8 sources and see what funny things appears!), but if the
approach can help, may be something can be worked to use it in a
general way (I just don't know hoy to get the console encoding, and I
don't use consoles most of the time).
Hope that this can help to some other non-ascii user like me.
Cheers (and sorry for the english).
Walter Gardella

May 29 '07 #1
0 1167

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

reply views Thread by Eric Mathew Hopper | last post: by
11 posts views Thread by Laurent Therond | last post: by
27 posts views Thread by John Roth | last post: by
reply views Thread by Kurt B. Kaiser | last post: by
9 posts views Thread by Safalra | last post: by
6 posts views Thread by Raphael.Benedet | last post: by
reply views Thread by Kurt B. Kaiser | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.