printing XML file with XSLT code

Jürgen Kahrs wrote:

Stu wrote:

>Assume the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<abc>
<def>
<mno>2008-06-11-13:15:59</mno>
<pqr stu="World">Hello</pqr>
</def>
<ghi>
<jkl vwx="12345678" </jkl>
</ghi>
</abc>

Below is my desired out. As you can see for each element I print
"element=value" and for each attribute within an element I print
"element_attribute=value"

mno=2008-06-11-13:15:59
pqr=Hello
pqr_stu=World
jkl_vwx=12345678

The following script in XMLgawk does it:

@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR[i] }
XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

It produces the output you wanted (except for a change in sequence).

[:alnum:] delivers correct results with the given test data, but I guess you meant:

XMLENDELEM && data ~ /[[:alnum:]]/ ...

Hermann

Jun 27 '08 #5

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Hermann Peifer schrieb:

>The following script in XMLgawk does it:

@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
XMLATTR[i] }
XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

It produces the output you wanted (except for a change in sequence).

[:alnum:] delivers correct results with the given test data, but I guess
you meant:
XMLENDELEM && data ~ /[[:alnum:]]/ ...

No, I thought [:alnum:] was sufficient.
Does it really make a difference in this example ?

Jun 27 '08 #6

Jürgen Kahrs wrote:

Hermann Peifer schrieb:

>>The following script in XMLgawk does it:

@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
XMLATTR[i] }
XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

It produces the output you wanted (except for a change in sequence).

[:alnum:] delivers correct results with the given test data, but I
guess you meant:
XMLENDELEM && data ~ /[[:alnum:]]/ ...

No, I thought [:alnum:] was sufficient.
Does it really make a difference in this example ?

[:alnum:] is treated as list of characters: 'a', 'l', 'm', 'n', 'u', and ':', whereas [[:alnum:]] is treated as a character class. The former will match 'Hello', but not 'HELLO', whereas the latter will match both. However, this doesn't make any difference with the given test data.

To make your script a bit more generic and robust (in case of empty elements), I would go for:

$ cat hermann.awk
@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { data = ""; for (i in XMLATTR) print XMLSTARTELEM "_" i "=" XMLATTR[i] }
XMLENDELEM && data !~ /^[[:space:]]*$/ { print XMLENDELEM "=" data }

See below the different results for this sample data:

$ cat file1
<?xml version="1.0" encoding="UTF-8"?>
<abc>
<def>
<mno>2008-06-11-13:15:59</mno>
<pqr stu="World">Hello</pqr>
<a1>.,-?(){}[]</a1>
<a2>ABC</a2><a3/>
</def>
<ghi>
<jkl vwx="12345678" </jkl>
</ghi>
</abc>

$ xgawk -f hermann.awk file1
mno=2008-06-11-13:15:59
pqr_stu=World
pqr=Hello
a1=.,-?(){}[]
a2=ABC
jkl_vwx=12345678

$ xgawk -f juergen.awk file1
mno=2008-06-11-13:15:59
pqr_stu=World
pqr=Hello
a2=ABC
a3=ABC
jkl_vwx=12345678

Jun 27 '08 #7

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Hermann Peifer schrieb:

Jürgen Kahrs wrote:
>Hermann Peifer schrieb:

>>>The following script in XMLgawk does it:

@load xml
XMLCHARDATA { data = $0 }
XMLSTARTELEM { for (i in XMLATTR) print XMLSTARTELEM "_" i "="
XMLATTR[i] }
XMLENDELEM && data ~ /[:alnum:]/ { print XMLENDELEM "=" data }

It produces the output you wanted (except for a change in sequence).

[:alnum:] delivers correct results with the given test data, but I
guess you meant:
XMLENDELEM && data ~ /[[:alnum:]]/ ...

No, I thought [:alnum:] was sufficient.
Does it really make a difference in this example ?

[:alnum:] is treated as list of characters: 'a', 'l', 'm', 'n', 'u', and
':', whereas [[:alnum:]] is treated as a character class. The former
will match 'Hello', but not 'HELLO', whereas the latter will match both.

Thanks for the reminder.

You know both languages equally well (XSL and XMLgawk).
Would you prefer the XSL solution that was posted here ?

Jun 27 '08 #8

On Jun 12, 10:53*pm, Jürgen Kahrs <Juergen.KahrsDELETET...@vr-web.de>
wrote:

You know both languages equally well (XSL and XMLgawk).
Would you prefer the XSL solution that was posted here ?

My rule of thumb is:

Big files (say: 100+ MB), with a flat, regular structure -XMLgawk
Small files with many optional and/or empty elements -XSL

Hermann

Jun 27 '08 #9

Joseph J. Kesselman

Hermann Peifer wrote:

Big files (say: 100+ MB), with a flat, regular structure -XMLgawk
Small files with many optional and/or empty elements -XSL

Depends in part the XSLT processor, of course. Some handle large
documents better than others.

Jun 27 '08 #10

On Jun 13, 3:55 pm, "Joseph J. Kesselman" <keshlam-nos...@comcast.net>
wrote:

Hermann Peifer wrote:
Big files (say: 100+ MB), with a flat, regular structure -XMLgawk
Small files with many optional and/or empty elements -XSL

Depends in part the XSLT processor, of course. Some handle large
documents better than others.

Of course. Reality is not as black and white as my rule of thumb
suggests. Would you have any pointer to some helpful XSLT processor
comparison/benchmarking?

BTW, another rule of thumb is:

Transformation: XML to text, with regex string processing -XMLgawk
Transformation: XML to XML (in my context usually: XML to KML) -XSL

Hermann

Jun 27 '08 #11

Joseph J. Kesselman

Hermann Peifer wrote:

Of course. Reality is not as black and white as my rule of thumb
suggests. Would you have any pointer to some helpful XSLT processor
comparison/benchmarking?

Most of what I've been doing has been using the W3C/NIST XPath and XSLT
conformance suites (pointed to from http://www.w3.org/QA/TheMatrix),
test sets such as the DataPower (now IBM) XSLTMark kernels (described at
http://www.xml.com/pub/a/2001/03/28/xsltmark/), or customer datasets
(which for obvious reasons I can't share).

I do know that the XSLT processor in the DataPower product can recognize
at least some cases where a document can be processed in a streaming
manner rather than reading it all into memory at once. That depends on
the nature of the stylesheet, of course; I'm not sure exactly where the
current limits are. But when this optimization works, it permits
handling huge documents and reduces latency, both of which are good
things. Websearch on "DataPower streaming" finds some discussion of this.

I don't think Apache Xalan has any true streaming capability yet, though
we've wanted it for many years. However, Xalan's internal data model
(DTM) is considerably more space-efficient than a standard Java DOM,
which improves its ability to handle large documents. (We had a version
of DTM which reduced overhead to only 16 bytes per XML node -- but
compressing things that far cost us some performance and imposed some
limitations we didn't like, so we had to let it grow a bit.)

I haven't used XMLgawk. But part of the point of XML is precisely that
adopting a shared (and relatively simple) syntax eases the task of
writing useful and reusable tools, and there's certainly a large amount
of "let a thousand flowers bloom" built into that assumption. I prefer
to stick to the W3C's standardized tools as much as possible, both to
push those to improve and for best portability of my work, but if
another tool does something XSLT really can't, or does it far better
than the copy of XSLT you have available to you, I'm not going to tell
you not to use it.

Jun 27 '08 #12

XSL Transformation of .owl file

Joseph J. Kesselman wrote:

Hermann Peifer wrote:
>Of course. Reality is not as black and white as my rule of thumb
suggests. Would you have any pointer to some helpful XSLT processor
comparison/benchmarking?

Most of what I've been doing has been using the W3C/NIST XPath and XSLT
conformance suites (pointed to from http://www.w3.org/QA/TheMatrix),
test sets such as the DataPower (now IBM) XSLTMark kernels (described at
http://www.xml.com/pub/a/2001/03/28/xsltmark/), or customer datasets
(which for obvious reasons I can't share).

I do know that the XSLT processor in the DataPower product can recognize
at least some cases where a document can be processed in a streaming
manner rather than reading it all into memory at once. That depends on
the nature of the stylesheet, of course; I'm not sure exactly where the
current limits are. But when this optimization works, it permits
handling huge documents and reduces latency, both of which are good
things. Websearch on "DataPower streaming" finds some discussion of this.

I don't think Apache Xalan has any true streaming capability yet, though
we've wanted it for many years. However, Xalan's internal data model
(DTM) is considerably more space-efficient than a standard Java DOM,
which improves its ability to handle large documents. (We had a version
of DTM which reduced overhead to only 16 bytes per XML node -- but
compressing things that far cost us some performance and imposed some
limitations we didn't like, so we had to let it grow a bit.)

I haven't used XMLgawk. But part of the point of XML is precisely that
adopting a shared (and relatively simple) syntax eases the task of
writing useful and reusable tools, and there's certainly a large amount
of "let a thousand flowers bloom" built into that assumption. I prefer
to stick to the W3C's standardized tools as much as possible, both to
push those to improve and for best portability of my work, but if
another tool does something XSLT really can't, or does it far better
than the copy of XSLT you have available to you, I'm not going to tell
you not to use it.

Thanks for the information.

I can't remember that I ever came across something that XSLT really can't do, but string processing is obviously not a strength of XSLT 1.0. I read that this improved with version 2.0, but I don't have any own experience. For transforming large XML documents into text format, which in my context often includes some regex based string processing: XMLgawk continues to be my favourite tool.

Hermann

Jun 27 '08 #13

Similar topics

by: Fredrik Henricsson | last post by:

Hey, I'm building an ontology in Protégé and I want to transform parts of it (e.g. the instances) to HTML with XSL. When I was transforming another file with 'simple' XML-tags like <author> before,...

Safe Printing

by: Jody Gelowitz | last post by:

I am trying to find the definition of "Safe Printing" and cannot find out exactly what this entitles. The reason is that I am trying to print contents from a single textbox to no avail using the...

DJDE file to PDF

by: Marcelo | last post by:

Hi Guys, I have the following question. I have a Xerox DJDE File that I want to convert to PDF.

Printing Output Form a Dataview

by: Chris Bingham | last post by:

Hi, I'm stuck (again!), and can't find any past posts that seemed to cover this so... I've got a dataview that filters a data set to show a list of jobs that have been finished in the last...

Printing with the WebBrowser Control

by: Chris Bingham | last post by:

Hi everyone, Does anyone know how to fix the paper orientation depending on the paper size used when printing with the WebBrowser control please? What I've got is a HTML file, containing 1 big...

XSLTranslation of a large XML file using Java results in OutOfMemory

by: Lenny Wintfeld | last post by:

Hi I'm attempting additions/changes to a Java program that (among other things) uses XSLT to transform a large (96 Mb) XML file. It runs fine on small XML files but generates OutOfMemory...

Printing XML Formatted Report in Windows Application

by: .NETUser | last post by:

Hello, I am doing research on printing in VB.NET and I would like to know how I can use XML to format three different reports. I'm new to this, so I don't know how to include this in my .NET...

Printing a document in XSL using VB.NET

by: .NETUser | last post by:

Hello, I hope that I can explain this correctly. I am trying to print a specific report based upon user's input in a Windows application. I'm a newbie to XSL and unaware of how to go about this....

xsl landscape printing

by: sheinaz | last post by:

Hi I am tryign to use this to print my table in landscape by default. i am new at the xsl and need some assistance. when i try to use the following i get the error: Cannot view XML input using...