473,320 Members | 2,133 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Normalizing XHTML with XML

I'm getting XHTML input that can be in a number of formats, and I'm
trying to get it into a consistent format for later use. "Consistent"
in this case means everything in the root/body is in either a p, table,
img, ol, or ul tag. I'm processing just the body text. There is no head
section or anything. So the body is the root of the tree that I'm
processing. I've got almost everything working except one thing. If I
get input like the following:
some text<br/>some more text

then I need that to become two paragraphs, like:
<p>some text</p>
<p>some more text</p>

That's easy enough. But if I get this input:
some text <a href="blah">link</a> some more text

that should all become one paragraph:
<p>some text <a href="blah">link</a> some more text<p>

And if a table, list, or image is encountered, that should be the end
of a paragraph if there is one:
some text<table> ... </table>some more text

becomes
<p>some text</p>
<table> ... </table>
<p>some more text</p>

Again, simply placing the text nodes inside p tags is simple, but a
problem arises if there is a link or other tag inside some of that
text. (At this point other tags don't actually matter because I'm
stripping them out, but links need to be passed through.)

Basically, my problem boils down to this:
1) I need to select any text node child of the root and surround it
with p tags, but
2) if an a element is a child of the root, it should be joined with any
adjacent text nodes and the whole thing should be surrounded with p
tags.

Can someone give me an example of how to do this with XSL?

May 11 '06 #1
3 1071
> 1) I need to select any text node child of the root and surround it
with p tags, but 2) if an a element is a child of the root, it should be joined with any
adjacent text nodes and the whole thing should be surrounded with p
tags.


.... If I put those two rules together, I get "I want to wrap a <p>
element around all the root's children". Since that's trivial, I presume
there's some case where you don't want to do that....?
May 11 '06 #2
Yes, only text nodes and links should be inside p tags. Tables, lists,
and images will also be present and must not be wrapped, especially
since tables and lists are block elements and p tags may only contain
inline elements. Maybe a more complex example:
some text <a href="blah">a link</a> some more text<br/>
third text node<table>...</table>final text node

should become:
<p>some text <a href="blah">a link</a> some more text</p>
<p>third text node</p>
<table>...</table>
<p>final text node</p>

Notice that the <br/> causes a new p element, the first two root-level
text nodes and the a element in between them become one paragraph, the
third text node becomes a paragraph, the table is not touched, and the
last text node becomes a paragraph.

May 11 '06 #3
>From looking around some more, I'm seeing that XSLT should be viewed as
transforming nodes from a source tree into nodes in a result tree. So a
different way of looking at my problem might be, "How do I grab
consecutive text and inline nodes (besides the br and img elements)
that are children of the root node from the source tree and put them
inside one node (a p element) in the result tree?"

May 11 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Evan Escently | last post by:
Hi, I've laid out a _very_ simple database that tracks my artwork the table 'works' looks like: +---------+----------+------------+------------+-------------+ | work_id | title | media ...
4
by: Evan Escently | last post by:
Hi, I've laid out a _very_ simple database that tracks my artwork the table 'works' looks like: +---------+----------+------------+------------+-------------+ | work_id | title | media ...
3
by: Megan | last post by:
hello everybody- i'm normalizing a database i inherited. i'm breaking up a huge table named case into several smaller tables. i am creating several many to many relationships between the new...
8
by: Richard Hollenbeck | last post by:
I have a recipe database that I've been building but I haven't yet put any of the ingredients in because of a little problem of normalization. If I build a table of ingredients, all the recipes...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.