> - don't use SAX unless your document is huge
- don't use DOM unless someone is putting a gun to your head
What I say is: use what works for you. I think SAX would be fine for
this task, but, hey, I personally would use Amara (
http://uche.ogbuji.net/tech/4suite/amara/ ), of course. The following
does the trick:
import sets
import amara
from amara import binderytools
#element_skeleton_rule suppresses char data from the resulting binding
#tree. If you have a large document and only care about element/attr
#structure and not text, this saves a lot of memory
rules = [binderytools.element_skeleton_rule()]
#XML can be a file path, URI, string, or even an open-file-like object
doc = amara.parse(XML, rules=rules)
elems = {}
for e in doc.xml_xpath('//*'):
paths = elems.setdefault((e.namespaceURI, e.localName), sets.Set())
path = u'/'.join([n.nodeName for n in e.xml_xpath(u'ancestor::*')])
paths.add(u'/' + path)
#Pretty-print output
for name in elems:
print name, '\n\t\t\t', '\n\t\t\t'.join(elems[name])
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org
Articles:
http://uche.ogbuji.net/tech/publications/