By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,028 Members | 1,784 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,028 IT Pros & Developers. It's quick & easy.

Re: convert xhtml back to html

P: n/a
On 2008-04-24 19:16, John Krukoff wrote:
>-----Original Message-----
From: py*********************************** [mailto:python-
li****************************] On Behalf Of Tim Arnold
Sent: Thursday, April 24, 2008 9:34 AM
To: py*********
Subject: convert xhtml back to html

hi, I've got lots of xhtml pages that need to be fed to MS HTML Workshop
create CHM files. That application really hates xhtml, so I need to
self-ending tags (e.g. <br />) to plain html (e.g. <br>).

Seems simple enough, but I'm having some trouble with it. regexps trip up
because I also have to take into account 'img', 'meta', 'link' tags, not
just the simple 'br' and 'hr' tags. Well, maybe there's a simple way to do
that with regexps, but my simpleminded <img[^(/>)]+/doesn't work. I'm
enough of a regexp pro to figure out that lookahead stuff.

I'm not sure where to start now; I looked at BeautifulSoup and
BeautifulStoneSoup, but I can't see how to modify the actual tag.
You could filter the XHTML through mxTidy and set the hide_endtags to 1:

Marc-Andre Lemburg

Professional Python Services directly from the Source (#1, Apr 24 2008)
>>Python/Zope Consulting and Support ...
mxODBC.Zope.Database.Adapter ...
mxODBC, mxDateTime, mxTextTools ...
__________________________________________________ ______________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
Jun 27 '08 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.