473,386 Members | 1,721 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

elementtree question

Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

thanks,
--Tim Arnold
Sep 21 '07 #1
7 1464
Ivo
Tim Arnold wrote:
Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

thanks,
--Tim Arnold

You could also try something like this:

from sgmllib import SGMLParser
class IParse(SGMLParser):
def __init__(self, verbose=0):
SGMLParser.__init__(self, verbose)
self.data = ""
def _attr_to_str(self, attrs):
return ' '.join(['%s="%s"' % a for a in attrs])

def start_body(self, attrs):
self.data += "<body %s>" % self._attr_to_str(attrs)
print "remapping"
self.data += '''<div class="remapped">'''
def end_body(self):
self.data += "</div>" # end remapping
self.data += "</body>"
def handle_data(self, data):
self.data += data
def unknown_starttag(self, tag, attrs):
self.data+="<%s %s>" % (tag, self._attr_to_str(attrs),)
def unknown_endtag(self, tag):
self.data += "</%s>" % tag
if __name__=="__main__":
i = IParse()
i.feed('''
<html>
<body bgcolor="#fffff">
original
<i>italic</i>
<b class="test">contents</b>...
</body>
</html>''');

print i.data
i.close()
just look at the code from sgmllib (standard lib) and it is very easy to
make a parser. for some much needed refactoring

Sep 21 '07 #2
Tim Arnold wrote:
Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>
Give lxml.etree (or lxml.html) a try:

tree = etree.parse("http://url.to/some.html", etree.HTMLParser())
body = tree.find("body")

and then:

div = etree.Element("div", {"class" : "remapped"})
div.extend(body)
body.append(div)

or alternatively:

children = list(body)
div = etree.SubElement(body, "div", {"class" : "remapped"})
div.extend(children)

http://codespeak.net/lxml/

and for lxml.html, which is currently in alpha status:

http://codespeak.net/lxml/dev/

ET 1.3 will also support the extend() function, BTW.

Stefan
Sep 24 '07 #3
Thanks for the great answers--I learned a lot. I'm looking forward to the ET
1.3 version. I'm currently working on some older HP10.20ux machines and
haven't been able to compile lxml all the way through yet.

thanks again,
--Tim Arnold
Sep 24 '07 #4
Stefan Behnel wrote:
ET 1.3 will also support the extend() function, BTW.
div.extend(seq) can be trivially rewritten as

div[len(div):] = seq

and in this case, you know that len(div) is 0, so you can simply do:

div[:] = seq

(this recent lxml habit of using lxml-specific versions of things that
are trivial to do with the standard API is a bit disappointing. kind of
defeats the purpose of having a standard API...)

</F>

Sep 26 '07 #5
Fredrik Lundh wrote:
(this recent lxml habit of using lxml-specific versions of things that
are trivial to do with the standard API is a bit disappointing. kind of
defeats the purpose of having a standard API...)
ElementTree is not the only standard API that lxml is following. Another one
is the standard API of the "list" builtin type, which has an extend() method.

ah-you're-just-jealous-we-had-it-first-ly,

Stefan :)
Sep 26 '07 #6
Tim Arnold wrote:
Thanks for the great answers--I learned a lot. I'm looking forward to the ET
1.3 version.
Note that there is a difference in behaviour, though. lxml.etree forces
Elements to be uniquely positioned in a tree, so the code I posted relies on
the "side effect" of automatically removing an Element from the old position
when inserting it at a different place. ElementTree does not do that, so this
code is not portable between the two libraries.

Stefan
Sep 26 '07 #7
Tim Arnold wrote:

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).
for completeness, here's an efficient and fairly straightforward way to
do it under plain 2.5 xml.etree:

body = doc.find(".//body")

# clone and mutate the body element
div = copy.copy(body)
div.tag = "div"
div.set("class", "remapped")

# replace the body contents with the new div
body.clear()
body[:] = [div]

</F>

Sep 26 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...
4
by: Lonnie Princehouse | last post by:
I've run into some eccentric behavior... It appears that one of my modules is being cut off at exactly 2^14 characters when I try to import it. Has anyone else encountered this? I can't find any...
1
by: Greg Wilson | last post by:
I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this: ...
9
by: Chris Spencer | last post by:
Does anyone know how to make ElementTree preserve namespace prefixes in parsed xml files? The default behavior is to strip a document of all prefixes and then replace them autogenerated prefixes...
1
by: mirandacascade | last post by:
O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\
15
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...
7
by: mirandacascade | last post by:
O/S: Windows XP Home Vsn of Python: 2.4 Copy/paste of interactive window is immediately below; the text/questions toward the bottom of this post will refer to the content of the copy/paste ...
0
by: Greg Aumann | last post by:
I am trying to write some python code for a library that reads an XML-like language from a file into elementtree data structures. Then I want to be able to read and/or modify the structure and then...
30
by: Chas Emerick | last post by:
I looked around for an ElementTree-specific mailing list, but found none -- my apologies if this is too broad a forum for this question. I've been using the lxml variant of the ElementTree API,...
2
Zver
by: Zver | last post by:
Hey All! I'm developing application that uses ElementTree for XML parsing. In python 2.5 ElementTree is part of standard "libs". Now my question is. How would you detect version of python and...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.