467,145 Members | 944 Online
Bytes | Developer Community
Ask Question

Home New Posts Topics Members FAQ

Post your question to a community of 467,145 developers. It's quick & easy.

Problem with processing XML

Hi.

I'm new to Python and trying to use it to solve a specific problem. I
have an XML file in which I need to locate a specific text node and
replace the contents with some other text. The text in question is
actually about 70k of base64 encoded data.

I wrote some code that works on my Linux box using xml.dom.minidom, but
it will not run on the windows box that I really need it on. Python
2.5.1 on both.

On the windows machine, it's a clean install of the Python .msi from
python.org. The linux box is Ubuntu 7.10, which has some Python XML
packages installed which can't easily be removed (namely python-libxml2
and python-xml).

I have boiled the code down to its simplest form which shows the problem:-

import xml.dom.minidom
import sys

input_file = sys.argv[1];
output_file = sys.argv[2];

doc = xml.dom.minidom.parse(input_file)
file = open(output_file, "w")
doc.writexml(file)

The error is:-

$ python test2.py input2.xml output.xml
Traceback (most recent call last):
File "test2.py", line 9, in <module>
doc.writexml(file)
File "c:\Python25\lib\xml\dom\minidom.py", line 1744, in writexml
node.writexml(writer, indent, addindent, newl)
File "c:\Python25\lib\xml\dom\minidom.py", line 814, in writexml
node.writexml(writer,indent+addindent,addindent,ne wl)
File "c:\Python25\lib\xml\dom\minidom.py", line 809, in writexml
_write_data(writer, attrs[a_name].value)
File "c:\Python25\lib\xml\dom\minidom.py", line 299, in _write_data
data = data.replace("&", "&amp;").replace("<", "&lt;")
AttributeError: 'NoneType' object has no attribute 'replace'

As I said, this code runs fine on the Ubuntu box. If I could work out
why the code runs on this box, that would help because then I call set
up the windows box the same way.

The input file contains an <xsd:schemablock which is what actually
causes the problem. If you remove that node and subnodes, it works
fine. For a while at least, you can view the input file at
http://rafb.net/p/5R1JlW12.html

Someone suggested that I should try xml.etree.ElementTree, however
writing the same type of simple code to import and then write the file
mangles the xsd:schema stuff because ElementTree does not understand
namespaces.

By the way, is pyxml a live project or not? Should it still be used?
It's odd that if you go to http://www.python.org/ and click the link
"Using python for..." XML, it leads you to
http://pyxml.sourceforge.net/topics/

If you then follow the download links to
http://sourceforge.net/project/showf...?group_id=6473 you see that
the latest file is 2004, and there are no versions for newer pythons.
It also says "PyXML is no longer maintained". Shouldn't the link be
removed from python.org?

Thanks in advance!
Jan 22 '08 #1
  • viewed: 2357
Share:
3 Replies
On Jan 22, 8:11*am, John Carlyle-Clarke <j...@nowhere.orgwrote:
Hi.

I'm new to Python and trying to use it to solve a specific problem. *I
have an XML file in which I need to locate a specific text node and
replace the contents with some other text. *The text in question is
actually about 70k of base64 encoded data.
Here is a pyparsing hack for your problem. I normally advise against
using literal strings like "<value>" to match XML or HTML tags in a
parser, since this doesn't cover variations in case, embedded
whitespace, or unforeseen attributes, but your example was too simple
to haul in the extra machinery of an expression created by pyparsing's
makeXMLTags.

Also, I don't generally recommend pyparsing for working on XML, since
there are so many better and faster XML-specific modules available.
But if this does the trick for you for your specific base64-removal
task, great.

-- Paul

# requires pyparsing 1.4.8 or later
from pyparsing import makeXMLTags, withAttribute, keepOriginalText,
SkipTo

xml = """
... long XML string goes here ...
"""

# define a filter that will key off of the <datatag with the
# attribute 'name="PctShow.Image"', and then use suppress to filter
the
# body of the following <valuetag
dataTag = makeXMLTags("data")[0]
dataTag.setParseAction(withAttribute(name="PctShow .Image"),
keepOriginalText)

filter = dataTag + "<value>" + SkipTo("</value>").suppress() + "</
value>"

xmlWithoutBase64Block = filter.transformString(xml)
print xmlWithoutBase64Block

Jan 22 '08 #2
On Jan 22, 9:11*am, John Carlyle-Clarke <j...@nowhere.orgwrote:
By the way, is pyxml a live project or not? *Should it still be used?
It's odd that if you go tohttp://www.python.org/and click the link
"Using python for..." XML, it leads you tohttp://pyxml.sourceforge.net/topics/

If you then follow the download links tohttp://sourceforge.net/project/showfiles.php?group_id=6473you see that
the latest file is 2004, and there are no versions for newer pythons.
It also says "PyXML is no longer maintained". *Shouldn't the link be
removed from python.org?
I was wondering that myself. Any answer yet?
Jan 22 '08 #3
Paul McGuire wrote:
>
Here is a pyparsing hack for your problem.
Thanks Paul! This looks like an interesting approach, and once I get my
head around the syntax, I'll give it a proper whirl.
Jan 22 '08 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Mark Smithers | last post: by
11 posts views Thread by Chris | last post: by
7 posts views Thread by hierro | last post: by
3 posts views Thread by Kent | last post: by
8 posts views Thread by Ian Mackenzie | last post: by
3 posts views Thread by Rinaldo | last post: by
4 posts views Thread by Jimmy | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.