UTF8 encoded XML file. Here's what I've tried so far:
<msg>Simon\xe2\ x80\x99s XML nightmare</msg>>>xml_utf8 = """<?xml version="1.0" encoding="UTF-8" ?>
"""
('START_DOCUMEN T', <xml.dom.minido m.Document instance at 0x6f06c0>)>>from xml.dom import pulldom
parser = pulldom.parseSt ring(xml_utf8)
parser.next ()
('START_ELEMENT ', <DOM Element: msg at 0x6f0710>)>>parser.next ()
....>>parser.next ()
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\u2019' in
position 21: ordinal not in range(128)
xml.dom.minidom can handle the string just fine:
u'<?xml version="1.0" ?><msg>Simon\u2 019s XML nightmare</msg>'>>from xml.dom import minidom
dom = minidom.parseSt ring(xml_utf8)
dom.toxml()
If I pass a unicode string to pulldom instead of a utf8 encoded
bytestring it still breaks:
....>>xml_unicode = u'<?xml version="1.0" ?><msg>Simon\u2 019s XML nightmare</msg>'
parser = pulldom.parseSt ring(xml_unicod e)
/System/Library/Frameworks/Python.framewor k/Versions/2.5/lib/python2.5/
xml/dom/pulldom.py in parseString(str ing, parser)
346
347 bufsize = len(string)
--348 buf = StringIO(string )
349 if not parser:
350 parser = xml.sax.make_pa rser()
UnicodeEncodeEr ror: 'ascii' codec can't encode character u'\u2019' in
position 32: ordinal not in range(128)
Is it possible to consume utf8 or unicode using xml.dom.pulldom or
should I try something else?
Thanks,
Simon Willison