elementtree question

Tim Arnold

Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

thanks,
--Tim Arnold

Sep 21 '07 #1

Subscribe Post Reply

1464

Ivo

Tim Arnold wrote:

Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

thanks,
--Tim Arnold

You could also try something like this:

from sgmllib import SGMLParser
class IParse(SGMLParser):
def __init__(self, verbose=0):
SGMLParser.__init__(self, verbose)
self.data = ""
def _attr_to_str(self, attrs):
return ' '.join(['%s="%s"' % a for a in attrs])

def start_body(self, attrs):
self.data += "<body %s>" % self._attr_to_str(attrs)
print "remapping"
self.data += '''<div class="remapped">'''
def end_body(self):
self.data += "</div>" # end remapping
self.data += "</body>"
def handle_data(self, data):
self.data += data
def unknown_starttag(self, tag, attrs):
self.data+="<%s %s>" % (tag, self._attr_to_str(attrs),)
def unknown_endtag(self, tag):
self.data += "</%s>" % tag
if __name__=="__main__":
i = IParse()
i.feed('''
<html>
<body bgcolor="#fffff">
original
<i>italic</i>
<b class="test">contents</b>...
</body>
</html>''');

print i.data
i.close()
just look at the code from sgmllib (standard lib) and it is very easy to
make a parser. for some much needed refactoring

Sep 21 '07 #2

Stefan Behnel

Tim Arnold wrote:

Hi, I'm using elementtree and elementtidy to work with some HTML files. For
some of these files I need to enclose the body content in a new div tag,
like this:
<body>
<div class="remapped">
original contents...
</div>
</body>

Give lxml.etree (or lxml.html) a try:

tree = etree.parse("http://url.to/some.html", etree.HTMLParser())
body = tree.find("body")

and then:

div = etree.Element("div", {"class" : "remapped"})
div.extend(body)
body.append(div)

or alternatively:

children = list(body)
div = etree.SubElement(body, "div", {"class" : "remapped"})
div.extend(children)

http://codespeak.net/lxml/

and for lxml.html, which is currently in alpha status:

http://codespeak.net/lxml/dev/

ET 1.3 will also support the extend() function, BTW.

Stefan

Sep 24 '07 #3

Tim Arnold

Thanks for the great answers--I learned a lot. I'm looking forward to the ET
1.3 version. I'm currently working on some older HP10.20ux machines and
haven't been able to compile lxml all the way through yet.

thanks again,
--Tim Arnold

Sep 24 '07 #4

Fredrik Lundh

Stefan Behnel wrote:

ET 1.3 will also support the extend() function, BTW.

div.extend(seq) can be trivially rewritten as

div[len(div):] = seq

and in this case, you know that len(div) is 0, so you can simply do:

div[:] = seq

(this recent lxml habit of using lxml-specific versions of things that
are trivial to do with the standard API is a bit disappointing. kind of
defeats the purpose of having a standard API...)

</F>

Sep 26 '07 #5

Stefan Behnel

Fredrik Lundh wrote:

(this recent lxml habit of using lxml-specific versions of things that
are trivial to do with the standard API is a bit disappointing. kind of
defeats the purpose of having a standard API...)

ElementTree is not the only standard API that lxml is following. Another one
is the standard API of the "list" builtin type, which has an extend() method.

ah-you're-just-jealous-we-had-it-first-ly,

Stefan :)

Sep 26 '07 #6

Stefan Behnel

Tim Arnold wrote:

Thanks for the great answers--I learned a lot. I'm looking forward to the ET
1.3 version.

Note that there is a difference in behaviour, though. lxml.etree forces
Elements to be uniquely positioned in a tree, so the code I posted relies on
the "side effect" of automatically removing an Element from the old position
when inserting it at a different place. ElementTree does not do that, so this
code is not portable between the two libraries.

Stefan

Sep 26 '07 #7

Fredrik Lundh

Tim Arnold wrote:

I figure there must be a way to do it by creating a 'div' SubElement to the
'body' tag and somehow copying the rest of the tree under that SubElement,
but it's beyond my comprehension.

How can I accomplish this?
(I know I could put the class on the body tag itself, but that won't satisfy
the powers-that-be).

for completeness, here's an efficient and fairly straightforward way to
do it under plain 2.5 xml.etree:

body = doc.find(".//body")

# clone and mutate the body element
div = copy.copy(body)
div.tag = "div"
div.set("class", "remapped")

# replace the body contents with the new div
body.clear()
body[:] = [div]

</F>

Sep 26 '07 #8

by: Stewart Midwinter | last post by:

I want to parse a file with ElementTree. My file has the following format:  <?xml version='1.0' encoding='utf-8'?> <population> <person><name="joe" sex="male"...

Python

module file length limitations on windows?

by: Lonnie Princehouse | last post by:

I've run into some eccentric behavior... It appears that one of my modules is being cut off at exactly 2^14 characters when I try to import it. Has anyone else encountered this? I can't find any...

Python

ElementTree/DTD question

by: Greg Wilson | last post by:

I'm trying to convert from minidom to ElementTree for handling XML, and am having trouble with entities in DTDs. My Python script looks like this: ...

Python

ElementTree Namespace Prefixes

by: Chris Spencer | last post by:

Does anyone know how to make ElementTree preserve namespace prefixes in parsed xml files? The default behavior is to strip a document of all prefixes and then replace them autogenerated prefixes...

Python

import statement / ElementTree

by: mirandacascade | last post by:

O/S: Windows 2K Vsn of Python: 2.4 Currently: 1) Folder structure: \workarea\ <- ElementTree files reside here \xml\ \dom\

Python

elementtree and gbk encoding

by: Steven Bethard | last post by:

I'm having trouble using elementtree with an XML file that has some gbk-encoded text. (I can't read Chinese, so I'm taking their word for it that it's gbk-encoded.) I always have trouble with...

Python

the tostring and XML methods in ElementTree

by: mirandacascade | last post by:

O/S: Windows XP Home Vsn of Python: 2.4 Copy/paste of interactive window is immediately below; the text/questions toward the bottom of this post will refer to the content of the copy/paste ...

Python

using TreeBuilder in an ElementTree like way

by: Greg Aumann | last post by:

I am trying to write some python code for a library that reads an XML-like language from a file into elementtree data structures. Then I want to be able to read and/or modify the structure and then...

Python

lxml/ElementTree and .tail

by: Chas Emerick | last post by:

I looked around for an ElementTree-specific mailing list, but found none -- my apologies if this is too broad a forum for this question. I've been using the lxml variant of the ElementTree API,...

Python

ElementTree in Python 2.5

by: Zver | last post by:

Hey All! I'm developing application that uses ElementTree for XML parsing. In python 2.5 ElementTree is part of standard "libs". Now my question is. How would you detect version of python and...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

elementtree question

Similar topics