473,387 Members | 1,621 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

DOM text

Hello Pythoners,

I'm currently writing some Python to manipulate a semi-structured XML
document. I'm using DOM (minidom) and I've got working code for
transforming the document to HTML files and for adding the 'structured'
elements which populate the higher regions of the tree (i.e. near the
root).

What I have to do next is write some code for working with the 'less
structured' elements towards the 'leaf ends' of the tree. These are
rather like little sub-documents and contain a mixture of text with
inline formatting (for links, font styles, headings, paragraphs etc.)
and objects (images, media files etc.).

I admit I haven't tried very much code yet, but I'm not sure how I'm
going to handle situations like: the user wants to insert a link in the
middle of a paragraph. How can I use the DOM to insert a node into the
middle of some text? Am I right in thinking that the DOM will reference
a whole text node but nothing smaller?

Any thoughts or suggestions would be very welcome!

Cheers,
Richard
Aug 26 '05 #1
5 1746
Richard Lewis wrote:

I admit I haven't tried very much code yet, but I'm not sure how I'm
going to handle situations like: the user wants to insert a link in the
middle of a paragraph. How can I use the DOM to insert a node into the
middle of some text? Am I right in thinking that the DOM will reference
a whole text node but nothing smaller?


You have to split the text-node, and add the two resulting noedes
together with the new link-node (or whatever node you want there, can be
a whole tree) in the correct order to the parent of the two node. If
unsure what that means, create two simple documents and parse these to
dom to see how that works.

Diez
Aug 26 '05 #2

On Fri, 26 Aug 2005 12:13:10 +0200, "Diez B. Roggisch"
<de***@nospam.web.de> said:
Richard Lewis wrote:

I admit I haven't tried very much code yet, but I'm not sure how I'm
going to handle situations like: the user wants to insert a link in the
middle of a paragraph. How can I use the DOM to insert a node into the
middle of some text? Am I right in thinking that the DOM will reference
a whole text node but nothing smaller?


You have to split the text-node, and add the two resulting noedes
together with the new link-node (or whatever node you want there, can be
a whole tree) in the correct order to the parent of the two node. If
unsure what that means, create two simple documents and parse these to
dom to see how that works.

Thanks. I was kind of worried it might be like that!

I'm implementing a Cursor class now which keeps track of the current
parent Element, text node and character position so that I can easily (I
hope ;-) work out where the splitting and inserting needs to occur. Wish
me luck!!

Cheers,
Richard
Aug 26 '05 #3

On Fri, 26 Aug 2005 11:43:18 +0100, "Richard Lewis"
<ri**********@fastmail.co.uk> said:

I'm implementing a Cursor class now which keeps track of the current
parent Element, text node and character position so that I can easily (I
hope ;-) work out where the splitting and inserting needs to occur. Wish
me luck!!

Sorry to revive this thread, but there's something else thats causing me
confusion now!

My cursor class is going quite well and I can insert text and element
nodes. It also has methods to 'move' the 'cursor' forward and backward
by a node at a time. It keeps the current_node in an instance variable
which is initially assigned an element from a DOM tree instance created
elsewhere.

The problem I've come up against is when I use the next_node() method,
and the current_node is a (leaf) Text node, the nextSibling property of
current_node is None, where I know (from the document structure) that it
shouldn't be. To make matters more confusing, if I manually create an
instance of my DOM tree (interactively) and check the nextSibling of the
same Text node, it is the correct value (another Element node) while the
nextSibling property of the SectionCursor instance's current_node
property (referring to the same node) is None. I *think* it only applies
to leaf Text nodes.

Here is the *complete* code for my SectionCursor class:
(note that 'sections' are large(ish) document fragments from the main
document)
==========================================
class SectionCursor:
def __init__(self, section_element):
"""Create a SectionCursor instance using the 'section_element' as
the parent element."""
self.section_element = section_element
self.current_node = self.section_element.firstChild
self.char_pos = 0

def forward(self, skip=1):
"""Move the cursor forward 'skip' character positions."""
if self.current_node.nodeType == Node.TEXT_NODE:
self.char_pos += skip
if self.char_pos > len(self.current_node.data):
self.next_node()
else: self.next_node()

def backward(self, skip=1):
"""Move the cursor backward 'skip' character positions."""
if self.current_node.nodeType == Node.TEXT_NODE:
self.char_pos -= skip
if self.char_pos < 0:
self.previous_node()
else: self.previous_node()

def next_node(self):
"""Move the cursor to the next node; either the first child or next
sibling."""
if self.current_node.hasChildNodes():
self.current_node = self.current_node.firstChild
elif self.current_node.nextSibling is not None:
self.current_node = self.current_node.nextSibling
else: return False
self.char_pos = 0
return True

def previous_node(self):
"""Move the cursor to the previous node; either the previous sibling
or the parent."""
if self.current_node.previousSibling is not None:
self.current_node = self.current_node.previousSibling
elif self.current_node.parentNode != self.section_element:
self.current_node = self.current_node.parentNode
else: return False
if self.current_node.nodeType == Node.TEXT_NODE:
self.char_pos = len(self.current_node.data) - 1
else:
self.char_pos = 0
return True

def jump_to(self, node, char_pos=0):
"""Jump to a node and character position."""
self.current_node = node
self.char_pos = char_pos

def insert_node(self, ref_doc, new_node):
"""Insert a node (new_node); ref_doc is an instance of the Document
class."""
if self.current_node.nodeType == Node.TEXT_NODE:
parent_node = self.current_node.parentNode
text_node = self.current_node
next_node = text_node.nextSibling

preceeding_portion =
ref_doc.createTextNode(text_node.data[:self.char_pos])
proceeding_portion =
ref_doc.createTextNode(text_node.data[self.char_pos:])

parent_node.replaceChild(preceeding_portion, text_node)
parent_node.insertBefore(new_node, next_node)
parent_node.insertBefore(proceeding_portion, next_node)
# where is the cursor?
else:
parent_node = self.current_node.parent_element
parent_node.insertBefore(new_node, self.current_node)
# where is the cursor?

def append_child_node(self, ref_doc, new_node):
pass

def insert_element(self, ref_doc, tag_name, attrs=None):
"""Insert an element called tag_name and with the attributes in the
attrs dictionary; ref_doc is an instance of the Document class."""
new_element = ref_doc.createElement(tag_name)
if attrs is not None:
for name, value in attrs.items():
new_element.setAttribute(name, value)
self.insert_node(ref_doc, new_element)

def insert_text(self, ref_doc, text):
"""Insert the text in 'text'; ref_doc is an instance of the Document
class."""
new_text = ref_doc.createTextNode(text)
self.insert_node(ref_doc, new_text)

def remove_node(self):
"""Remove the current node."""
condemned_node = self.current_node
if not self.next_node():
self.previous_node()
parent_node = condemned_node.parentNode
old_child = parent_node.removeChild(condemned_node)
old_child.unlink()

def remove_text(self, ref_doc, count=None):
"""Remove count (or all) characters from the current cursor
position."""
if self.current_node.nodeType != Node.TEXT_NODE:
return False

text = self.current_node.data
new_text = text[:self.char_pos]
if count is not None:
new_text += text[self.char_pos + count:]

new_text_node = ref_doc.createTextNode(new_text)
parent_node = self.current_node.parentNode
self.current_node = parent_node.replaceChild(new_text_node,
self.current_node)
#self.char_pos = 0
==========================================

I've noticed that when you print any minidom node (except a Text node)
it shows the node's memory address. But it doesn't do this with Text
nodes. Does anyone know why this is? If I assign a Text node from one
DOM tree to a variable, I don't get a copy do I? I hope I just get
another reference to the original node.

Cheers,
Richard
Aug 30 '05 #4

On Tue, 30 Aug 2005 11:17:25 +0100, "Richard Lewis"
<ri**********@fastmail.co.uk> said:

Here is the *complete* code for my SectionCursor class:


In case anyone's interested, I've just noticed a logical error in the
next_node() method:
=================================
def next_node(self):
if self.current_node.hasChildNodes():
self.current_node = self.current_node.firstChild
elif self.current_node.nextSibling is not None:
self.current_node = self.current_node.nextSibling
else:
while self.current_node.parentNode.nextSibling is None\
and self.current_node != self.section_element:
self.current_node = self.current_node.parentNode
if self.current_node != self.section_element:
self.current_node = self.current_node.parentNode.nextSibling
else: return False
self.char_pos = 0
return True
=================================

which doesn't solve the original problem. Though I think it may be
causing a (related) problem: it says the self.current_node.parentNode is
of NoneType. If there is a problem with assigning parts of an existing
DOM tree to other variables, might this be another symptom?

Cheers,
Richard
Aug 30 '05 #5

On Tue, 30 Aug 2005 12:05:38 +0100, "Richard Lewis"
<ri**********@fastmail.co.uk> said:

On Tue, 30 Aug 2005 11:17:25 +0100, "Richard Lewis"
<ri**********@fastmail.co.uk> said:

Here is the *complete* code for my SectionCursor class:


In case anyone's interested, I've just noticed a logical error in the
next_node() method:

OK, I'm beginning to wish I hadn't mentioned this now; I've changed the
insert_node() method as well:
================================
def insert_node(self, ref_doc, new_node):
if self.current_node.nodeType == Node.TEXT_NODE:
parent_node = self.current_node.parentNode
text_node = self.current_node
next_node = text_node.nextSibling

preceeding_portion =
ref_doc.createTextNode(text_node.data[:self.char_pos])
proceeding_portion =
ref_doc.createTextNode(text_node.data[self.char_pos:])

parent_node.replaceChild(preceeding_portion, text_node)
if next_node is None:
parent_node.appendChild(new_node)
parent_node.appendChild(proceeding_portion)
else:
parent_node.insertBefore(new_node, next_node)
parent_node.insertBefore(proceeding_portion, next_node)
# where is the cursor?
else:
parent_node = self.current_node.parentNode
next_node = self.current_node.nextSibling
if next_node is None:
parent_node.appendChild(new_node)
else:
parent_node.insertBefore(new_node, self.current_node)
# where is the cursor?
================================

I've done some more testing and it seems that, after a call to
insert_node() when the current_node is a Text node, current_node's
parentNode, nextSibling and firstChild properties become None (assuming
they weren't None before, which firstChild was).

Hmm. Um...er, yeah. I don't think anyones following me anyway....

I'll keep fiddling with it.
Aug 30 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: joes | last post by:
Hello there ! I rendering a PDF with XSLT using Xalan and FOP. I like to place in my article an image, so that the text is floating around the image. I tried several things but it didn't work so...
3
by: Xerxes | last post by:
Hi, I need help in setting up a page where the text wraps around an image. Right now, I am using table, with text in one <td> and the image in the adjacent <td>. The problem is when the text is...
2
by: Macsicarr | last post by:
Hi All Wonder if you could help me. I have created a CMS system that allows the user to enter text and pic 'tags' for their own About us page, eg text.... text.... text.... text.......
2
by: Jiri Palecek | last post by:
I have a question on web authoring (probably HTML+CSS). Is it somehow possible to put two words above each other inside a paragraph so the result would be valid and render at least in Mozilla? I...
4
by: Arif Çimen | last post by:
Hi to everybody, I have chnged a button text in design mode. But After compiling and executing the program the text of the button do not change to new value. Any Ideas? Thaks for helps.
3
by: jweinberg1975 | last post by:
I would like for users to be able to select from a small number of options that come from a little drop down menu which then closes. .....
3
by: bbepristis | last post by:
Hey all I have this code that reads from one text file writes to another unless im on a certian line then it writes the new data however it only seems to do about 40 lines then quits and I cant...
3
by: acecraig100 | last post by:
I am fairly new to Javascript. I have a form that users fill out to enter an animal to exhibit at a fair. Because we have no way of knowing, how many animals a user may enter, I created a table...
3
by: jonniethecodeprince | last post by:
Hi all, I have trouble getting an array of data stored in a separate javascript file i.e. a file called books.js into a table of data for a .xhtml file. There are 50 Records in this file....
10
by: bluemountain | last post by:
Hi there, Iam new to python forms and programming too I had a text file where i need to extract few words of data from the header(which is of 3 lines) and search for the keyword TEXT1, TEXT2,...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.