By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,427 Members | 1,356 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,427 IT Pros & Developers. It's quick & easy.

minidom utf-8 encoding

P: n/a
Hi guys/gals.

I am trying to write and xml file from data parsed from a csv.

I can get everything to work except that I cannot get minidom to do -->
which needless to say is driving me nuts.

Any suggestions?

What it ends up doing is just removing the character from the
datastream.

Jan 3 '07 #1
Share this Question
Share on Google+
5 Replies


P: n/a
fscked schrieb:
Hi guys/gals.

I am trying to write and xml file from data parsed from a csv.

I can get everything to work except that I cannot get minidom to do -->
which needless to say is driving me nuts.

Any suggestions?
Works fine for me:

pyd = minidom.Document()
pyr = d.createElement("root")
pyr.appendChild(d.createTextNode(u"\xf6"))
<DOM Text node "\xf6">
pyd.appendChild(r)
<DOM Element: root at -0x482ab614>
pyd.toxml()
u'<?xml version="1.0" ?>\n<root>\xf6</root>'
pyprint d.toxml()
<?xml version="1.0" ?>
<root></root>

Regards,
Martin
Jan 4 '07 #2

P: n/a

Martin v. Lwis wrote:
fscked schrieb:
Hi guys/gals.

I am trying to write and xml file from data parsed from a csv.

I can get everything to work except that I cannot get minidom to do -->
which needless to say is driving me nuts.

Any suggestions?

Works fine for me:

pyd = minidom.Document()
pyr = d.createElement("root")
pyr.appendChild(d.createTextNode(u"\xf6"))
<DOM Text node "\xf6">
pyd.appendChild(r)
<DOM Element: root at -0x482ab614>
pyd.toxml()
u'<?xml version="1.0" ?>\n<root>\xf6</root>'
pyprint d.toxml()
<?xml version="1.0" ?>
<root></root>

Regards,
Martin
Well, let me clarify. If I just print it to the screen/console it works
fine, but when I do:

out.write( doc.toprettyxml())

it just removes the character that would be the "".

I can post the code if anyone wants to see it, but it is fairly
straightforward.

Jan 4 '07 #3

P: n/a
fscked schrieb:
Well, let me clarify. If I just print it to the screen/console it works
fine, but when I do:

out.write( doc.toprettyxml())

it just removes the character that would be the "".

I can post the code if anyone wants to see it, but it is fairly
straightforward.
I find that hard to believe. There is no code in Python that does
removal of characters, and I can't see any other reason why it gets
removed.

OTOH, what I do get when writing to a file is a UnicodeError, when
it tries to convert the Unicode string that toxml gives to a byte
string.

So I recommend you pass encoding="utf-8" to the toprettyxml invocation
also.

Regards,
Martin
Jan 4 '07 #4

P: n/a

Martin v. Lwis wrote:
<...snip...>
I find that hard to believe. There is no code in Python that does
removal of characters, and I can't see any other reason why it gets
removed.

OTOH, what I do get when writing to a file is a UnicodeError, when
it tries to convert the Unicode string that toxml gives to a byte
string.

So I recommend you pass encoding="utf-8" to the toprettyxml invocation
also.

Regards,
Martin
OK, now I am really confused. After trying all variations of opening
and writing and encoding and all the other voodoo I can find on the web
for hours, I decide to put the script back to how it was when it did
everything but remove the unicode characters.

And now it just works...

I hate it when that happens. In case you are wondering here is the code
that caused me all this (seemingly odd) pain:

import csv
import codecs
from xml.dom.minidom import Document

out = open("test.xml", "w")

# Create the minidom document
doc = Document()

# Create the <boxesbase element
boxes = doc.createElement("boxes")
myfile = open('ClientsXMLUpdate.txt')
csvreader = csv.reader(myfile)
for row in csvreader:
mainbox = doc.createElement("box")
doc.appendChild(boxes)
r2 = csv.reader(myfile)
b = r2.next()
mainbox.setAttribute("city", b[10])
mainbox.setAttribute("country", b[9])
mainbox.setAttribute("phone", b[8])
mainbox.setAttribute("address", b[7])
mainbox.setAttribute("name", b[6])
mainbox.setAttribute("pl_heartbeat", b[5])
mainbox.setAttribute("sw_ver", b[4])
mainbox.setAttribute("hw_ver", b[3])
mainbox.setAttribute("date_activated", b[2])
mainbox.setAttribute("mac_address", b[1])
mainbox.setAttribute("boxid", b[0])
boxes.appendChild(mainbox)
# Print our newly created XML
out.write( doc.toprettyxml ())
And it just works...

Jan 4 '07 #5

P: n/a
fscked schrieb:
# Create the <boxesbase element
boxes = doc.createElement("boxes")
myfile = open('ClientsXMLUpdate.txt')
csvreader = csv.reader(myfile)
for row in csvreader:
mainbox = doc.createElement("box")
doc.appendChild(boxes)
r2 = csv.reader(myfile)
b = r2.next()
mainbox.setAttribute("city", b[10])

And it just works...
You should not use it like that: it will only work if the CSV file is
encoded in UTF-8. If the CSV file uses any other encoding, the resulting
XML file will be ill-formed.

What you should do instead is

....
encoding_of_csv_file = some_value_that_the_producer_of_the_file_told_you
....
...
mainbox.setAttribute("city", b[10].decode(encoding_of_csv_file))

Regards,
Martin
Jan 5 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.