473,224 Members | 1,651 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,224 software developers and data experts.

elementtree question

Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

thanks,
--Tim Arnold
Sep 21 '07 #1
7 1456
Ivo
Tim Arnold wrote:
Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

thanks,
--Tim Arnold

You could also try something like this:

from sgmllib import SGMLParser
class IParse(SGMLParser):
def __init__(self, verbose=0):
SGMLParser.__init__(self, verbose)
self.data = ""
def _attr_to_str(self, attrs):
return ' '.join(['%s="%s"' % a for a in attrs])

def start_body(self, attrs):
self.data += "<body %s>" % self._attr_to_str(attrs)
print "remapping"
self.data += '''<div class="remapped">'''
def end_body(self):
self.data += "</div>" # end remapping
self.data += "</body>"
def handle_data(self, data):
self.data += data
def unknown_starttag(self, tag, attrs):
self.data+="<%s %s>" % (tag, self._attr_to_str(attrs),)
def unknown_endtag(self, tag):
self.data += "</%s>" % tag
if __name__=="__main__":
i = IParse()
i.feed('''
<html>
<body bgcolor="#fffff">
original
<i>italic</i>
<b class="test">contents</b>...
</body>
</html>''');

print i.data
i.close()
just look at the code from sgmllib (standard lib) and it is very easy to
make a parser. for some much needed refactoring

Sep 21 '07 #2
Tim Arnold wrote:
Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>
Give lxml.etree (or lxml.html) a try:

tree = etree.parse("http://url.to/some.html", etree.HTMLParser())
body = tree.find("body")

and then:

div = etree.Element("div", {"class" : "remapped"})
div.extend(body)
body.append(div)

or alternatively:

children = list(body)
div = etree.SubElement(body, "div", {"class" : "remapped"})
div.extend(children)

http://codespeak.net/lxml/

and for lxml.html, which is currently in alpha status:

http://codespeak.net/lxml/dev/

ET 1.3 will also support the extend() function, BTW.

Stefan
Sep 24 '07 #3
Thanks for the great answers--I learned a lot. I'm looking forward to the ET
1.3 version. I'm currently working on some older HP10.20ux machines and
haven't been able to compile lxml all the way through yet.

thanks again,
--Tim Arnold
Sep 24 '07 #4
Stefan Behnel wrote:
ET 1.3 will also support the extend() function, BTW.
div.extend(seq) can be trivially rewritten as

div[len(div):] = seq

and in this case, you know that len(div) is 0, so you can simply do:

div[:] = seq

(this recent lxml habit of using lxml-specific versions of things that
are trivial to do with the standard API is a bit disappointing. kind of
defeats the purpose of having a standard API...)

</F>

Sep 26 '07 #5
Fredrik Lundh wrote:
(this recent lxml habit of using lxml-specific versions of things that
are trivial to do with the standard API is a bit disappointing. kind of
defeats the purpose of having a standard API...)
ElementTree is not the only standard API that lxml is following. Another one
is the standard API of the "list" builtin type, which has an extend() method.

ah-you're-just-jealous-we-had-it-first-ly,

Stefan :)
Sep 26 '07 #6
Tim Arnold wrote:
Thanks for the great answers--I learned a lot. I'm looking forward to the ET
1.3 version.
Note that there is a difference in behaviour, though. lxml.etree forces
Elements to be uniquely positioned in a tree, so the code I posted relies on
the "side effect" of automatically removing an Element from the old position
when inserting it at a different place. ElementTree does not do that, so this
code is not portable between the two libraries.

Stefan
Sep 26 '07 #7
Tim Arnold wrote:

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).
for completeness, here's an efficient and fairly straightforward way to
do it under plain 2.5 xml.etree:

body = doc.find(".//body")

# clone and mutate the body element
div = copy.copy(body)
div.tag = "div"
div.set("class", "remapped")

# replace the body contents with the new div
body.clear()
body[:] = [div]

</F>

Sep 26 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Stewart Midwinter | last post by:
I want to parse a file with ElementTree. My file has the following format: <!-- file population.xml --> <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...
4
by: Lonnie Princehouse | last post by:
I've run into some eccentric behavior... It appears that one of my modules is being cut off at exactly 2^14 characters when I try to import it. Has anyone else encountered this? I can't find any...
1
by: Greg Wilson | last post by:
I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this: ...
9
by: Chris Spencer | last post by:
Does anyone know how to make ElementTree preserve namespace prefixes in parsed xml files? The default behavior is to strip a document of all prefixes and then replace them autogenerated prefixes...
1
by: mirandacascade | last post by:
O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\
15
by: Steven Bethard | last post by:
I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...
7
by: mirandacascade | last post by:
O/S: Windows XP Home Vsn of Python: 2.4 Copy/paste of interactive window is immediately below; the text/questions toward the bottom of this post will refer to the content of the copy/paste ...
0
by: Greg Aumann | last post by:
I am trying to write some python code for a library that reads an XML-like language from a file into elementtree data structures. Then I want to be able to read and/or modify the structure and then...
30
by: Chas Emerick | last post by:
I looked around for an ElementTree-specific mailing list, but found none -- my apologies if this is too broad a forum for this question. I've been using the lxml variant of the ElementTree API,...
2
Zver
by: Zver | last post by:
Hey All! I'm developing application that uses ElementTree for XML parsing. In python 2.5 ElementTree is part of standard "libs". Now my question is. How would you detect version of python and...
1
isladogs
by: isladogs | last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, Mike...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 7 Feb 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:30 (7.30PM). In this month's session, the creator of the excellent VBE...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
1
by: davi5007 | last post by:
Hi, Basically, I am trying to automate a field named TraceabilityNo into a web page from an access form. I've got the serial held in the variable strSearchString. How can I get this into the...
0
by: MeoLessi9 | last post by:
I have VirtualBox installed on Windows 11 and now I would like to install Kali on a virtual machine. However, on the official website, I see two options: "Installer images" and "Virtual machines"....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.