Patch to pydoc (partial) to handle encodings other than ascii

w.m.gardella.sambeth

Hello Pythonists:
I am using SPE as python IDE on Windows, with Python 2.5.1 installed
(official distro). As my mother tongue is Spanish, I had documented
some modules in it (I now, I should have documented all in English,
except if I were 110% sure than nobody else would read my docs, but
they are only for in-house use). When I tried to use the pydoc tab
that SPE attaches to every source file, I only found a message saying
that my accented text coud not be decoded.

Browsing the SPE's sources, I found that pydoc's HTMLDoc class could
not handle the non-ascii characters. Then patched the Doc's class (the
parent of HTMLDoc) code to look for the encoding declared in the
source of the module to document, and (in HTMLDoc) decode the source
with it. As the HTML file writer function used the same class and
choked when writing the file, reencoded the text with the same
encoding on writing.

As I could not find the mail of pydoc's maintainer (the source code
states that the autor is Ka-Ping Yee, but the original date is from
2001, and I could not find if he is still maintaining it), I want to
make this patch available so can be possible to use pydoc on non-ascii
sources (at least to generate programmatically HTML documentation). If
the solution is useful (please don't hesitate in criticize it), may be
can be incorporated on a future pydoc version.

I don't know how to make a patch file (I usually don't do co-op
programming, but use to code as a hobby), but of course I don't even
think of sending 90 k of code to the newsgroup, so I am sending the
modified code here, with the indication of where do the modifications:

After line 323, replace

if inspect.ismodule(object): return self.docmodule(*args)

with:

if inspect.ismodule(object):
remarks = inspect.getcomments(object)
start = remarks.find(' -*- coding: ') + 13
if start == 12:
start = remarks.find('# vim:fileencoding=') + 19
if start == 18:
if inspect.getsource(object)[:3] == '\xef\xbb\xbf':
self.encoding = 'utf_8'
else:
self.encoding = sys.getdefaultencoding()
else:
end = remarks.find(' ', start)
self.encoding = remarks[start:end]
else:
end = remarks.find('-*-', start)
self.encoding = remarks[start:end].strip()
return self.docmodule(*args)

After the line 421 (moved to 437 with the previous insert), insert

title = title.decode(self.encoding)
contents = contents.decode(self.encoding)

And finally replace line 1491 (now 1509):

file.write(page)

with:

file.write(page.encode(html.encoding))

The code don't solves the encoding issue on consoles (just try to
document utf-8 sources and see what funny things appears!), but if the
approach can help, may be something can be worked to use it in a
general way (I just don't know hoy to get the console encoding, and I
don't use consoles most of the time).
Hope that this can help to some other non-ascii user like me.
Cheers (and sorry for the english).
Walter Gardella

May 29 '07 #1

Subscribe Post Reply

1226

Similar topics

pydoc patch for Subversion

by: Eric Mathew Hopper | last post by:

I have a patch that allows pydoc to deal with Subversion (http://www.subversion.tigris.org) style version strings. Subversion does not do '$Revsion: num$' style tags. The closest it will get is...

Python

Binary strings, unicode and encodings

by: Laurent Therond | last post by:

Maybe you have a minute to clarify the following matter... Consider: --- from cStringIO import StringIO def bencode_rec(x, b): t = type(x)

Python

PEP 263 status check

by: John Roth | last post by:

PEP 263 is marked finished in the PEP index, however I haven't seen the specified Phase 2 in the list of changes for 2.4 which is when I expected it. Did phase 2 get cancelled, or is it just not...

Python

Weekly Python Bug/Patch Summary

by: Kurt B. Kaiser | last post by:

Patch / Bug Summary ___________________ Patches : 259 open ( -5) / 2573 closed (+17) / 2832 total (+12) Bugs : 745 open ( +0) / 4405 closed (+21) / 5150 total (+21) RFE : 150 open...

Python

Character encodings and invalid characters

by: Safalra | last post by:

The idea here is relatively simple: a java program (I'm using JDK1.4 if that makes a difference) that loads an HTML file, removes invalid characters (or replaces them in the case of common ones...

HTML / CSS

Unicode, encodings, and asian languages: need some help.

by: apprentice | last post by:

Hello, I'm writing an class library that I imagine people from different countries might be interested in using, so I'm considering what needs to be provided to support foreign languages,...

.NET Framework

Understanding Unicode & encodings

by: Raphael.Benedet | last post by:

Hello, For my application, I would like to execute an SQL query like this: self.dbCursor.execute("INSERT INTO track (name, nbr, idartist, idalbum, path) VALUES ('%s', %s, %s, %s, '%s')" %...

Python

Weekly Python Patch/Bug Summary

by: Kurt B. Kaiser | last post by:

Patch / Bug Summary ___________________ Patches : 342 open (-38) / 3712 closed (+54) / 4054 total (+16) Bugs : 951 open (-14) / 6588 closed (+33) / 7539 total (+19) RFE : 257 open...

Python

different encodings for unicode() and u''.encode(), bug?

by: mario | last post by:

Hello! i stumbled on this situation, that is if I decode some string, below just the empty string, using the mcbs encoding, it succeeds, but if I try to encode it back with the same encoding it...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware