473,624 Members | 2,135 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Looking for suggestions (xslt?) on stripping specified elements/attributesfrom XHTML

Given some arbitrary XHTML, I'd like to obtain a 'simplified' XHTML
result which strips out a large subset of standard elements and
attributes - but not all. The main things I would like to accomplish:

1) Provide a list of elements/attributes to be stripped (i.e. everything
else should be passed through) or those that should be passed through
(i.e. everything else should be stripped) which would be applied
recursively.
2) If an element is to be stripped, pass through any enclosed text
and/or elements (the elements should in turn be processed recursively by
step 1.)
3) If after stripping the resulting element is empty, eliminate it
completely.

For example, this snippet:

<h1>
<a href='chap2.htm '>
<img src="image.gif" alt="Thumbnail" border=0>
</a>
</h1>
<table width=515 border=0 cellpadding=0 cellspacing=0>
<tr>
<td width=172 align=left valign=top>
<a href="chap1.htm ">
<img src="prev.gif" alt="Previous" border=0>
</a>
</td>
<td>
<style type="text/css">
</style>
</td>
<td width=171 align=center valign=top>
<b>
<font face="ariel,hel vetica,helv,san serif" size="-1">Chapter 2 Getting
Started</font>
</b>
</td>
<td width=172 align=right valign=top>
<a href="chap3.htm ">
<img src="next.gif" alt="Next" border=0>
</a>
</td>
</tr>
</table>

Would become:

<a href='chap2.htm '>
<img src="image.gif" >
</a>
<table>
<tr>
<td>
<a href="chap1.htm ">
<img src="prev.gif" alt="Previous">
</a>
</td>
<td>
Chapter 2 Getting Started
</td>
<td>
<a href="chap3.htm ">
<img src="next.gif" alt="Next">
</a>
</td>
</tr>
</table>

Is XSLT the best means to accomplish this? Suggestions on how to get
this done (esp. examples that could be used as a starting point) are
appreciated.

Thanks,
Phil
Jul 26 '06 #1
1 1263
Search for and read about "XSLT identity rule" or "XSLT identity
transformation" .

It is the most fundamental design pattern in XSLT to override the identity
rule in order to globally delete/replace ... etc. certain subset of nodes,
leaving the general structure and other nodes of the document the same.
Cheers,
Dimitre Novatchev

"Foxpointe" <fo*******@comc ast.netwrote in message
news:4r******** *************** *******@comcast .com...
Given some arbitrary XHTML, I'd like to obtain a 'simplified' XHTML result
which strips out a large subset of standard elements and attributes - but
not all. The main things I would like to accomplish:

1) Provide a list of elements/attributes to be stripped (i.e. everything
else should be passed through) or those that should be passed through
(i.e. everything else should be stripped) which would be applied
recursively.
2) If an element is to be stripped, pass through any enclosed text and/or
elements (the elements should in turn be processed recursively by step 1.)
3) If after stripping the resulting element is empty, eliminate it
completely.

For example, this snippet:

<h1>
<a href='chap2.htm '>
<img src="image.gif" alt="Thumbnail" border=0>
</a>
</h1>
<table width=515 border=0 cellpadding=0 cellspacing=0>
<tr>
<td width=172 align=left valign=top>
<a href="chap1.htm ">
<img src="prev.gif" alt="Previous" border=0>
</a>
</td>
<td>
<style type="text/css">
</style>
</td>
<td width=171 align=center valign=top>
<b>
<font face="ariel,hel vetica,helv,san serif" size="-1">Chapter 2 Getting
Started</font>
</b>
</td>
<td width=172 align=right valign=top>
<a href="chap3.htm ">
<img src="next.gif" alt="Next" border=0>
</a>
</td>
</tr>
</table>

Would become:

<a href='chap2.htm '>
<img src="image.gif" >
</a>
<table>
<tr>
<td>
<a href="chap1.htm ">
<img src="prev.gif" alt="Previous">
</a>
</td>
<td>
Chapter 2 Getting Started
</td>
<td>
<a href="chap3.htm ">
<img src="next.gif" alt="Next">
</a>
</td>
</tr>
</table>

Is XSLT the best means to accomplish this? Suggestions on how to get this
done (esp. examples that could be used as a starting point) are
appreciated.

Thanks,
Phil

Jul 27 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
3170
by: Jesper Moth | last post by:
The MSXML4 xslt-parser would output any source <div/> like this: <div></div> I never figured out how to disable this behaviour. But since it makes the source document look cleaner, and since most browsers require empty divs to be markup up this way, I didn't really give it an extra thought untill I moved to the Tomcat 4.1 where the *reverse* situation applies. The native xslt-parser outputs any <div></div> like this:
5
2408
by: Greg | last post by:
Hi everybody, so, I would like to use XML files for some parts of my website. I would like to respect W3C XHTML 1.1 recommendation. Then, I have these two docs : o My XML file: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?>
6
4431
by: Rainer Herbst | last post by:
Hi *, please consider the following problem: I have a XML document which includes some html elements. I want to replace only the <div> element: I specified two templates, one matches everything ("*"), one matches only the "div". As far as I understand, the most specific rule should apply, i.e. the div rule if the element is a <div>.
20
6763
by: Bernd Fuhrmann | last post by:
Hi! I have some trouble with some simple stupid XSLT-stuff. My stylesheet: ------------- <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
1
3677
by: Bartek | last post by:
Hello This is my problem: It consider xml 2 xml conversion. source document had unknown structure (xhtml), xslt must process every node, attribute, text, comments etc. from source and write in destination file. During that process i must catch some nodes (e.g. <input> position of this node in XML tree is unknown )and change the value attribute. The destination xml file must be the exact copy of source file + changes on
21
4600
by: =?iso-8859-2?Q?K=F8i=B9tof_=AEelechovski?= | last post by:
It is common knowledge that XHTML is better HTML and you can serve XHTML content as HTML. However, the second statement is incorrect, for various reasons; it is enough to say that the HTML validator does not tolerate XML-style empty tags. It seems serving XHTML to the browser is of no advantage and can cause serious problems if the browser does not understand the difference. This raises the question of downgrading XHTML to HTML. I could...
3
2457
by: Andy Dingley | last post by:
>From a thread over in c.i.w.a.h "RFC: From XHTML to HTML via XSLT" http://groups.google.co.uk/group/comp.infosystems.www.authoring.html/msg/f112c230061ffe86 As is well-known, the XSLT HTML output method should generate <br> rather than <br /or <br></br> From: <http://www.w3.org/TR/xslt#section-HTML-Output-Method> : The html output method should not output an end-tag for empty
7
3068
by: C.W.Holeman II | last post by:
For info on the context of my question see the end of this posting. From http://www.w3.org/TR/XHTMLplusMathMLplusSVG/: How can I validate the result of client-side XSLT transform which has the following? <xsl:output method="xml"
0
8172
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8620
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8335
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8474
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7158
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6110
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5563
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
2605
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1482
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.