473,698 Members | 2,213 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

minidom utf-8 encoding

Hi guys/gals.

I am trying to write and xml file from data parsed from a csv.

I can get everything to work except that I cannot get minidom to do -->
ö which needless to say is driving me nuts.

Any suggestions?

What it ends up doing is just removing the character from the
datastream.

Jan 3 '07 #1
5 6559
fscked schrieb:
Hi guys/gals.

I am trying to write and xml file from data parsed from a csv.

I can get everything to work except that I cannot get minidom to do -->
ö which needless to say is driving me nuts.

Any suggestions?
Works fine for me:

pyd = minidom.Documen t()
pyr = d.createElement ("root")
pyr.appendChild (d.createTextNo de(u"\xf6"))
<DOM Text node "\xf6">
pyd.appendChild (r)
<DOM Element: root at -0x482ab614>
pyd.toxml()
u'<?xml version="1.0" ?>\n<root>\xf6 </root>'
pyprint d.toxml()
<?xml version="1.0" ?>
<root>ö</root>

Regards,
Martin
Jan 4 '07 #2

Martin v. Löwis wrote:
fscked schrieb:
Hi guys/gals.

I am trying to write and xml file from data parsed from a csv.

I can get everything to work except that I cannot get minidom to do -->
ö which needless to say is driving me nuts.

Any suggestions?

Works fine for me:

pyd = minidom.Documen t()
pyr = d.createElement ("root")
pyr.appendChild (d.createTextNo de(u"\xf6"))
<DOM Text node "\xf6">
pyd.appendChild (r)
<DOM Element: root at -0x482ab614>
pyd.toxml()
u'<?xml version="1.0" ?>\n<root>\xf6 </root>'
pyprint d.toxml()
<?xml version="1.0" ?>
<root>ö</root>

Regards,
Martin
Well, let me clarify. If I just print it to the screen/console it works
fine, but when I do:

out.write( doc.toprettyxml ())

it just removes the character that would be the "ö".

I can post the code if anyone wants to see it, but it is fairly
straightforward .

Jan 4 '07 #3
fscked schrieb:
Well, let me clarify. If I just print it to the screen/console it works
fine, but when I do:

out.write( doc.toprettyxml ())

it just removes the character that would be the "ö".

I can post the code if anyone wants to see it, but it is fairly
straightforward .
I find that hard to believe. There is no code in Python that does
removal of characters, and I can't see any other reason why it gets
removed.

OTOH, what I do get when writing to a file is a UnicodeError, when
it tries to convert the Unicode string that toxml gives to a byte
string.

So I recommend you pass encoding="utf-8" to the toprettyxml invocation
also.

Regards,
Martin
Jan 4 '07 #4

Martin v. Löwis wrote:
<...snip...>
I find that hard to believe. There is no code in Python that does
removal of characters, and I can't see any other reason why it gets
removed.

OTOH, what I do get when writing to a file is a UnicodeError, when
it tries to convert the Unicode string that toxml gives to a byte
string.

So I recommend you pass encoding="utf-8" to the toprettyxml invocation
also.

Regards,
Martin
OK, now I am really confused. After trying all variations of opening
and writing and encoding and all the other voodoo I can find on the web
for hours, I decide to put the script back to how it was when it did
everything but remove the unicode characters.

And now it just works...

I hate it when that happens. In case you are wondering here is the code
that caused me all this (seemingly odd) pain:

import csv
import codecs
from xml.dom.minidom import Document

out = open("test.xml" , "w")

# Create the minidom document
doc = Document()

# Create the <boxesbase element
boxes = doc.createEleme nt("boxes")
myfile = open('ClientsXM LUpdate.txt')
csvreader = csv.reader(myfi le)
for row in csvreader:
mainbox = doc.createEleme nt("box")
doc.appendChild (boxes)
r2 = csv.reader(myfi le)
b = r2.next()
mainbox.setAttr ibute("city", b[10])
mainbox.setAttr ibute("country" , b[9])
mainbox.setAttr ibute("phone", b[8])
mainbox.setAttr ibute("address" , b[7])
mainbox.setAttr ibute("name", b[6])
mainbox.setAttr ibute("pl_heart beat", b[5])
mainbox.setAttr ibute("sw_ver", b[4])
mainbox.setAttr ibute("hw_ver", b[3])
mainbox.setAttr ibute("date_act ivated", b[2])
mainbox.setAttr ibute("mac_addr ess", b[1])
mainbox.setAttr ibute("boxid", b[0])
boxes.appendChi ld(mainbox)
# Print our newly created XML
out.write( doc.toprettyxml ())
And it just works...

Jan 4 '07 #5
fscked schrieb:
# Create the <boxesbase element
boxes = doc.createEleme nt("boxes")
myfile = open('ClientsXM LUpdate.txt')
csvreader = csv.reader(myfi le)
for row in csvreader:
mainbox = doc.createEleme nt("box")
doc.appendChild (boxes)
r2 = csv.reader(myfi le)
b = r2.next()
mainbox.setAttr ibute("city", b[10])

And it just works...
You should not use it like that: it will only work if the CSV file is
encoded in UTF-8. If the CSV file uses any other encoding, the resulting
XML file will be ill-formed.

What you should do instead is

....
encoding_of_csv _file = some_value_that _the_producer_o f_the_file_told _you
....
...
mainbox.setAttr ibute("city", b[10].decode(encodin g_of_csv_file))

Regards,
Martin
Jan 5 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2508
by: Paul Miller | last post by:
We've run into minidom's inabilty to handle large (20+MB) XML files, and need a replacement that can handle it. Unfortunately, we're pretty dependent on a DOM, so a pulldom or SAX replacement is likely out of the question for now. Has someone done a more efficient minidom replacement module that we can just drop in? Preferrably written in C?
5
4823
by: Skip Montanaro | last post by:
I'd like to compare two xml.dom.minidom objects, but the naive attempt fails: >>> import xml.dom.minidom >>> d1 = xml.dom.minidom.parse("ES.xml") >>> d2 = xml.dom.minidom.parse("ES.xml") >>> d1 == d2 False My goal is to decide whether or not I need to prompt the user to save config information at the end of a program run by generating a minidom object then
1
2755
by: Greg Wogan-Browne | last post by:
Hi all, I am having some trouble figuring out what is going on here - is this a bug, or correct behaviour? Basically, when I create an XML document with a namespace using xml.dom.minidom.parse() or parseString(), the namespace exists as an xmlns attribute in the DOM (fair enough, as it's in the original source document). However, if I use the DOM implementation to create an identical document with a namespace, the xmlns attribute is not...
6
3981
by: Horst Gutmann | last post by:
Hi :-) I currently have quite a big problem with minidom and special chars (for example &uuml;) in HTML. Let's say I have following input file: -------------------------------------------------- <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html>
1
1553
by: JoReiners | last post by:
Hello, I have a really strange problem. I'm unable to figure it out on my own. I parse very simple xml documents, without any check for their form. These files look very similar and are encoded in UTF-8. Now minidom is always able to parse these files with minidom.parse("file") . Now when fetching I use this expression: xmldoc.getElementsByTagName('DocNumb').firstChild.data.encode('latin1')
6
4477
by: Dan | last post by:
I'm using python's xml.dom.minidom module to generate xml files, and I'm running into memory problems. The xml files I'm trying to create are relatively flat, with one root node which may have millions of direct child nodes. Here's an example script: #!/usr/bin/env python import xml.dom.minidom
1
1970
by: Maksim Kasimov | last post by:
Hi, i'm faced with such a problem when i use xml.dom.minidom: to append all child nodes from "doc" in "_requ" to "doc" in "_resp", i do the following: _requ = minidom.parseString("<resp><doc><one>One</one><two>Two</two></doc></resp>") _resp = minidom.parseString("<resp><doc/></resp>") iSourseTag = _requ.getElementsByTagName('doc') iTargetTag = _resp.getElementsByTagName('doc')
2
2433
by: soulofstar | last post by:
In my programe, I want to send a ClientHello socket message to server, but server can only receive message encoded UTF-8. I open a xml file, and try to encode it as UTF-8, After I send it to server, there is no feedback. I am doubted if the message I sent is UTF-8 or normal string. How to tell it? some code is below: strFile = "C:\\Python25\\ClientHello.xml" xmldoc = minidom.parse(strFile) ...
18
758
by: sim.sim | last post by:
Hi all. i'm faced to trouble using minidom: #i have a string (xml) within CDATA section, and the section includes "\r\n": iInStr = '<?xml version="1.0"?>\n<Data><!]></Data>\n' #After i create DOM-object, i get the value of "Data" without "\r\n"
2
12584
by: ashmir.d | last post by:
Hi, I am trying to parse an xml file using the minidom parser. <code> from xml.dom import minidom xmlfilename = "sample.xml" xmldoc = minidom.parse(xmlfilename) </code> The parser is failing on this line:
0
8601
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8860
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7716
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6518
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5860
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4614
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3043
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2327
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
1998
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.