473,804 Members | 3,030 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Flat HTML headers to nested XML sections

I am working on creating an XSLT that transforms Html into an XML
format that can be imported into Framemaker. The challenge, it turns
out, is correctly transforming the flat html header tags (<H1>, <H2>,
etc)
into nested sections inside the xml. I have made significant
progress, but have run into a roadblock.

Here is an example of my input HTML:

<html><body>
<p>abc abc</p>
<h1 class='header'> A</h1>
<p>A abc abc</p>
<h2 class='header'> B</h2>
<p>B abc abc</p>
<h3 class='header'> C</h3>
<p>Cabc abc</p>
<h2 class='header'> D</h2<!-- this is missing in the output --
>
<p>D abc abc</p<!-- this is missing in the output -->
<h1 class='header'> E</h1>
<p>E abc abc</p>
</body></html>

Here is an example of the output, you'll notice that the <H2>D</h2>
is missing.

<?xml version="1.0" encoding="UTF-8"?>
<article>
<title/>
<para>abc abc</para>
<section depth="1" id="A">
<title>A</title>
<para>A abc abc</para>
<section depth="2" id="B">
<title>B</title>
<para>B abc abc</para>
<section depth="3" id="C">
<title>C</title>
<para>C abc abc</para>
</section>
</section>
</section>
<section depth="1" id="E">
<title>E</title>
<para>E abc abc</para>
</section>

The problem is that my code is currently applying templates to all
nodes following a header who's nearest preceding header is that same
header. For this reason when content follows a header which isn't
it's header (like an <h2following an <h3>) it doesn't get shown.
What I don't understand is how to fix it. Any help would much
appreciated. I'm not really an xsl guru, so I'm doing the best I can
to get through this.

Here is the relevant code from my xsl:

<xsl:template match="body">
<article>
<title>
<xsl:value-of select="$docTit le" />
</title>

<xsl:for-each select='child:: *[not(preceding-
sibling::*[@class="header"])][not(@class="hea der")]'>
<xsl:apply-templates select="."/>
</xsl:for-each>

<xsl:variable name='depth'
select='substri ng(name(child:: *[@class="header"][1]),2)'/>
<xsl:for-each select='child:: *[@class="header"]
[substring(name( ),
2)&lt;=$depth]'>
<xsl:apply-templates select="."/>
</xsl:for-each>

</article>
</xsl:template>

<xsl:template match="h1 | h2 | h3 | h4 | h5">
<xsl:call-template name="header">
<xsl:with-param name="depth" select="substri ng(name(),2)"/>
</xsl:call-template>
</xsl:template>

<xsl:template name="header">
<xsl:param name="depth"/>
<section>
<xsl:attribut e name="depth">
<xsl:value-of select="$depth"/>
</xsl:attribute>

<xsl:attribut e name="id">
<xsl:value-of select="transla te(.,' ','')" />
</xsl:attribute>
<title><xsl:val ue-of select="."/></title>

<xsl:variable name='thisHeade r' select='generat e-id(.)'/>
<xsl:for-each select='followi ng-sibling::*[$thisHeader=gen erate-
id(preceding-sibling::*[@class="header"][last()])]
[not(@class="hea der") or (@class="header " and substring(name( ),2)>=
$depth)]'>
<xsl:apply-templates select="."/>
</xsl:for-each>

</section>

</xsl:template>

May 16 '07 #1
3 2646
CrazyAtlantaGuy wrote:
I am working on creating an XSLT that transforms Html into an XML
format that can be imported into Framemaker. The challenge, it turns
out, is correctly transforming the flat html header tags (<H1>, <H2>,
etc) into nested sections inside the xml.
This is called encapsulation, and there's a much neater way than writing
XSLT to try and reach-forward-down-the-tree-up-to-but-not-including the
next H1/H2/H3/etc.

1. Run Tidy to make the HTML into well-formed XHTML (tidy -nc -asxml)

2. Write a short script to turn the XHTML back into valid SGML
(remove NETs, namespaces)

3. Apply a DocType Declaration for the ISO 15445 HTML DTD, which
includes a DIV1/DIV2 containment structure, in "preparatio n" mode
(declare % Preparation as INCLUDE in the internal subset and use
pre-html as the declared root element type)

4. Run osgmlnorm to normalize the document: this adds the missing
markup, switches single quotes to double where possible, etc

<!doctype pre-html
public "ISO/IEC 15445:2000//DTD HyperText Markup Language//EN" [
<!entity % Preparation "include" >
]>
<PRE-HTML>
<HEAD>
<META CONTENT="HTML Tidy for Linux/x86 (vers 1 September 2005), see
www.w3.org" NAME="GENERATOR ">
<TITLE></TITLE>
</HEAD>
<BODY>
<P>abc abc</P>
<H1 CLASS="header"> A</H1>
<DIV1>
<P>A abc abc</P>
<H2 CLASS="header"> B</H2>
<DIV2>
<P>B abc abc</P>
<H3 CLASS="header"> C</H3>
<DIV3>
<P>Cabc abc</P>
</DIV3>
</DIV2>
<H2 CLASS="header"> D</H2>
<DIV2>
<P>D abc abc</P>
</DIV2>
</DIV1>
<H1 CLASS="header"> E</H1>
<DIV1>
<P>E abc abc</P>
</DIV1>
</BODY>
</PRE-HTML>

You can easily mess with the Preparation structure in the DTD if you
don't like the way they did it (I don't).

///Peter
May 16 '07 #2
You could try adapting something from the XSLT FAQ. Likely candidates
would be
http://www.dpawson.co.uk/xsl/sect2/N4486.html#d5891e424
or
http://www.dpawson.co.uk/xsl/sect2/N...tml#d5891e1051

Some of the other examples on that page may also be adaptable to this
question.

(It's always worth checking Dave's page; he has done an excellent job of
collecting useful answers from XSL-List, which is unofficial but has
been in existence since before XSL was a Recommendation and has had
participation by a lot of XSL's architects and implementers. I still try
to keep half an eye on that list, though I must admit I don't watch it
as closely as I should.)

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
May 17 '07 #3
On May 17, 12:37 am, Joe Kesselman <keshlam-nos...@comcast. netwrote:
You could try adapting something from the XSLT FAQ. Likely candidates
would behttp://www.dpawson.co. uk/xsl/sect2/N4486.html#d589 1e424
orhttp://www.dpawson.co. uk/xsl/sect2/N4486.html#d589 1e1051

Some of the other examples on that page may also be adaptable to this
question.

(It's always worth checking Dave's page; he has done an excellent job of
collecting useful answers from XSL-List, which is unofficial but has
been in existence since before XSL was a Recommendation and has had
participation by a lot of XSL's architects and implementers. I still try
to keep half an eye on that list, though I must admit I don't watch it
as closely as I should.)

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Thanks for the help!

May 22 '07 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2669
by: .d.hos | last post by:
ok, fairly new to python, relatively familiar w/ ms-sql. here's my issue: my .py script parses the contents of a (tab delim.) flat file, then attempts to insert the information into the db. I've been fighting this for a day or so, and i'm stuck on the db insertion... basically the script uses .readlines() to capture the flat file contents, and stick the *record* into a container list...
699
34276
by: mike420 | last post by:
I think everyone who used Python will agree that its syntax is the best thing going for it. It is very readable and easy for everyone to learn. But, Python does not a have very good macro capabilities, unfortunately. I'd like to know if it may be possible to add a powerful macro system to Python, while keeping its amazing syntax, and if it could be possible to add Pythonistic syntax to Lisp or Scheme, while keeping all of the...
4
1772
by: charles | last post by:
Using XSL, how can I go from this format: <T4> <T4_slip>1</T4_slip> <T4_slip>2</T4_slip> <T4_slip>3</T4_slip> <T4_slip>4</T4_slip> <T4_summary>20</T4_summary> <T4_slip>1</T4_slip> <T4_slip>2</T4_slip>
1
6608
by: tom | last post by:
hello, i have seen multiple postings on the subject but no answer that addresses my question: I create a dataset using a xsd schema. the schema specifies a relation from one of the tables to the other table via a primary-secondary key on the tables. i load the data into the dataset.
1
1824
by: Anders Nilsson | last post by:
I'd like to know if there is support in .NET to somehow "flatten" a nested XML schema. Here is the situation: Currently I have code that can validate nested XML against a schema. The XML is generated by using a DataSet. Now I want to generate the XML as non-nested from the DataSet by setting the flag Nested to false in the Relation objects but still be able to validate it using the same "nested" schema. So I'm hoping there is a way to...
22
3020
by: Daniel Billingsley | last post by:
Ok, I wanted to ask this separate from nospam's ridiculous thread in hopes it could get some honest attention. VB6 had a some simple and fast mechanisms for retrieving values from basic text files, which in turn could be simply and easily maintained with notepad. I understand the benefits of XML, really, but in the case of configuration files it seems it is almost always nothing more than unnecessary complexity, both in accessing them...
2
1215
by: Sue | last post by:
Nested datagrid within outer datagrid. Both datagrids use tables to format data in both the itemtemplate and edittemplate sections. Having problems with the format of once of the tablecells in the itemtemplate of the inner datagrid. Tried to inspect the html source file just to find the rendered inner datagrid missing in action. Does anyone have a clue where my missing source information might be? Searched high and low through all the...
17
5851
by: Grant Kelly | last post by:
I'm wondering if it's possible within HTML markup (or possibly CSS) to specify that an HTML table's headers should be placed 'over' the cell borders rather than 'within' the cells. For example (requires monospace font): Default: +--------------------+ | H1 H2 H3 | +--------------------+
15
5283
by: lxyone | last post by:
Using a flat file containing table names, fields, values whats the best way of creating html pages? I want control over the html pages ie 1. layout 2. what data to show 3. what controls to show - text boxes, input boxes, buttons, hyperlinks ie the usual. The data is not obtained directly from a database.
0
9706
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9579
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10326
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10317
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
6851
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5520
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5651
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4295
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2990
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.