473,728 Members | 1,947 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

request for advice - possible ElementTree nexus

Situation is this:
1) I have inherited some python code that accepts a string object, the
contents of which is an XML document, and produces a data structure
that represents some of the content of the XML document
2) The inherited code is somewhat 'brittle' in that some well-formed
XML documents are not correctly processed by the code; the brittleness
is caused by how the parser portion of the code handles whitespace.
3) I would like to change the code to make it less brittle. Whatever
changes I make must continue to produce the same data structure that is
currently being produced.
4) Rather than attempt to fix the parser portion of the code, I would
prefer to use ElementTree. ElementTree handles parsing XML documents
flawlessly, so the brittle portion of the code goes away. In addition,
the ElementTree model is very sweet to work with, so it is a relatively
easy task using the information in ElementTree to produce the same data
structure that is currently being produced.
5) The existing data structure--the structure that must be
maintained--that gets produced does NOT include any {xmlns=<whateve r>}
information that may appear in the source XML document.
6) Based on a review of several posts in this group, I understand why
ElementTree hanldes xmlns=<whatever information the way it does. This
is an oversimplificat ion, but one of the things it does is to
incorporate the {whatever} within the tag property of the element and
of any descendent elements.
7) One of the pieces of information in the data structure that gets
produced by this code is the tag...the tag in the data structure should
not have any xmlns=<whatever information.

So, given that the goal is to produce the same data structure and given
that I really want to use ElementTree, I need to find a way to remove
the xmlns=<whatever information. It seems like there are 2 general
methods for accomplishing this:
1) before feeding the string object to the ElementTree.XML () method,
remove the xmlns=<whatever information from the string.
2) keep the xmlns=<whatever information in the string that feeds
ElementTree.XML (), but when building the data structure, ensure that
the {whatever} information in the tag property of the element should
NOT be included in the data structure.

My requests for advice are:
a) What are the pros/cons of each of the 2 general methods described
above?
b) If I want to remove the xmlns information before feeding it to the
ElementTree.XML () method, and I don't want to be aware of what is to
the right of the equal sign, what is the best way to remove all the
substrings that are of the form xmlns=<whatever >? Would this require
learning the nuances of regular expressions?
c) If I want to leave the xmlns information in the string that gets fed
to ElementTree.XML , and I want to remove the {whatever} from the tag
before building the data structure, what is the best way to find
{whatever} from the tag property...is this another case where one
should be using regular expressions?

Jul 4 '06 #1
2 1963

mi************@ yahoo.com wrote:
Situation is this:
1) I have inherited some python code that accepts a string object, the
contents of which is an XML document, and produces a data structure
that represents some of the content of the XML document
2) The inherited code is somewhat 'brittle' in that some well-formed
XML documents are not correctly processed by the code; the brittleness
is caused by how the parser portion of the code handles whitespace.
3) I would like to change the code to make it less brittle. Whatever
changes I make must continue to produce the same data structure that is
currently being produced.
4) Rather than attempt to fix the parser portion of the code, I would
prefer to use ElementTree. ElementTree handles parsing XML documents
flawlessly, so the brittle portion of the code goes away. In addition,
the ElementTree model is very sweet to work with, so it is a relatively
easy task using the information in ElementTree to produce the same data
structure that is currently being produced.
5) The existing data structure--the structure that must be
maintained--that gets produced does NOT include any {xmlns=<whateve r>}
information that may appear in the source XML document.
6) Based on a review of several posts in this group, I understand why
ElementTree hanldes xmlns=<whatever information the way it does. This
is an oversimplificat ion, but one of the things it does is to
incorporate the {whatever} within the tag property of the element and
of any descendent elements.
7) One of the pieces of information in the data structure that gets
produced by this code is the tag...the tag in the data structure should
not have any xmlns=<whatever information.

So, given that the goal is to produce the same data structure and given
that I really want to use ElementTree, I need to find a way to remove
the xmlns=<whatever information. It seems like there are 2 general
methods for accomplishing this:
1) before feeding the string object to the ElementTree.XML () method,
remove the xmlns=<whatever information from the string.
2) keep the xmlns=<whatever information in the string that feeds
ElementTree.XML (), but when building the data structure, ensure that
the {whatever} information in the tag property of the element should
NOT be included in the data structure.
[snip]

maybe transform the document with XSLT before processing?

google: xslt remove namespaces

eg. http://www.tei-c.org/wiki/index.php/...Namespaces.xsl

eg. http://www.thescripts.com/forum/thread86057.html

hth

Gerard

Jul 5 '06 #2
mi************@ yahoo.com wrote:
c) If I want to leave the xmlns information in the string that gets fed
to ElementTree.XML , and I want to remove the {whatever} from the tag
before building the data structure, what is the best way to find
{whatever} from the tag property...is this another case where one
should be using regular expressions?
if the "whatever" in {whatever} is known in advance, you can use the
approach described here:

http://effbot.org/zone/element-tidyl...-xhtml-to-html

if the "whatever" is not known, you can do e.g.

if elem.tag.starts with("{"):
elem.tag = elem.tag.split( "}")[1]

</F>

Jul 5 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
4035
by: nmac | last post by:
Hi all, hopefully someone can offer some sagely advice regarding Production use of Jakarta's Tomcat. First, some brief background. My company have a servlet application that connects to a MySQL database. The servlet is deployed on two seperate win2k servers (Access to the tomcat servers is via DNS round robin load balancing). The database is on a another win2k server.
7
3270
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male" age="49"></person> <person><name="hilda" sex="female" age="33"></person> <person><name="bartholomew" sex="male" age="17"> </person> </population>
1
3089
by: Greg Wilson | last post by:
I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this: ---------------------------------------------------------------------- #!/usr/bin/env python import sys, os from elementtree import ElementTree
3
3965
by: mirandacascade | last post by:
Verion of Python: 2.4 O/S: Windows XP ElementTree resides in the c:\python24\lib\site-packages\elementtree\ folder When a string that does not contain well-formed XML is passed as an argument to the XML() method in ElementTree.py, an ExpatError exception is raised. I can trap the exception with a try/except where the except does not specify a specific exception, but I cannot figure out how to construct the except clause in a...
15
5424
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with encodings, so I'm sure I'm just screwing something simple up. Can anyone help me? Here's the interactive session. Sorry it's a little verbose, but I figured it would be better to include too much than not enough. I basically expected...
5
9986
by: saif.shakeel | last post by:
#!/usr/bin/env python from elementtree import ElementTree as Element tree = et.parse("testxml.xml") for t in tree.getiterator("SERVICEPARAMETER"): if t.get("Semantics") == "localId": t.set("Semantics", "dataPackageID")
2
4275
by: Rick Muller | last post by:
I'm a computational chemist who frequently dabbles in Python. A collaborator sent me a huge XML file that at one point was evidently modified by a now defunct java application. A sample of this file looks something like: <group type="struct"> <name>Test</name> <param type="string"> <name>File Name</name> <cTag>fileName</cTag>
1
294
by: Mike Slinn | last post by:
The following short Python program parses a KML file and displays the names of all Marks and Routes: from elementtree.ElementTree import ElementTree tree = ElementTree(file='test.kml') kml = tree.getroot() ns = 'http://earth.google.com/kml/2.1' for folder in kml.findall("{%s}Folder/{%s}Folder/{%s}name" % (ns, ns, ns)): print folder.text
3
2082
by: gray.bowman | last post by:
I'm messing around with trying to write an xml file using xml.etree.ElementTree. All the examples on the internet show the use of ElementTree.write(), although when I try to use it it's not available, gives me ... ElementTree(sectionElement).write("section.xml") TypeError: 'module' object is not callable I'm new to python, so I think I'm doing something wrong.. any
0
8903
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8758
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9277
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9130
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8129
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4524
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4792
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2663
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2161
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.