Martin v. Löwis wrote:
<...snip...>
I find that hard to believe. There is no code in Python that does
removal of characters, and I can't see any other reason why it gets
removed.
OTOH, what I do get when writing to a file is a UnicodeError, when
it tries to convert the Unicode string that toxml gives to a byte
string.
So I recommend you pass encoding="utf-8" to the toprettyxml invocation
also.
Regards,
Martin
OK, now I am really confused. After trying all variations of opening
and writing and encoding and all the other voodoo I can find on the web
for hours, I decide to put the script back to how it was when it did
everything but remove the unicode characters.
And now it just works...
I hate it when that happens. In case you are wondering here is the code
that caused me all this (seemingly odd) pain:
import csv
import codecs
from xml.dom.minidom import Document
out = open("test.xml" , "w")
# Create the minidom document
doc = Document()
# Create the <boxesbase element
boxes = doc.createEleme nt("boxes")
myfile = open('ClientsXM LUpdate.txt')
csvreader = csv.reader(myfi le)
for row in csvreader:
mainbox = doc.createEleme nt("box")
doc.appendChild (boxes)
r2 = csv.reader(myfi le)
b = r2.next()
mainbox.setAttr ibute("city", b[10])
mainbox.setAttr ibute("country" , b[9])
mainbox.setAttr ibute("phone", b[8])
mainbox.setAttr ibute("address" , b[7])
mainbox.setAttr ibute("name", b[6])
mainbox.setAttr ibute("pl_heart beat", b[5])
mainbox.setAttr ibute("sw_ver", b[4])
mainbox.setAttr ibute("hw_ver", b[3])
mainbox.setAttr ibute("date_act ivated", b[2])
mainbox.setAttr ibute("mac_addr ess", b[1])
mainbox.setAttr ibute("boxid", b[0])
boxes.appendChi ld(mainbox)
# Print our newly created XML
out.write( doc.toprettyxml ())
And it just works...