473,386 Members | 1,832 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

XSL and entities

I've a problem in an xsl transformation.
My xml input:

--- input.xml ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE xc:content [
<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>
<xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent" xmlns="http://www.w3.org/1999/xhtml" module="news">
<xc:text type="html">
leuk he jazeker ãôé<br/>
</xc:text>
</xc:xcontent>

----

And an xsl file:

-- style.xsl ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

<xsl:output method="xml" indent="yes"/>

<!--
! All html should remain html
!-->
<xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates select="./node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="/xc:xcontent">
<page:page type="module">
<p>
<xsl:apply-templates select="xc:text"/>
</p>
</page:page>
</xsl:template>

</xsl:stylesheet>

---

The output here is:

---
<page:page type="module">
<p>
leuk he jazeker<br/>
</p>
</page:page>

---
But i expect this as output
---
<page:page type="module">
<p>
leuk he jazeker ãôé<br/>
</p>
</page:page>
---
How can that be, why are the characters: ãôé gone??
Is there something wrong with my encoding?
Note: i do'nt know if the files are really encoded in ISO-8859-1, but it did work for me.
My editor says the encoding is ISO-8859-1 so i think that is good.. Or did the editor get that
information from the xml prolog?
Jul 20 '05 #1
11 2061
> cut

Well my topic-subject is not really a good choice. there are not entities involved.
Jul 20 '05 #2

are they really gone or are you just looking at the file in some program
that doesn't understand the encoding, they appeared to be gone inyour
posted output but that does'nt match what xslt should have done.
That output is also missing a namespace declaration for xhtml, is it
really the output you got from XSLT?

If you want iso-8859-1 output add
<xsl:output encoding="iso-8859-1"/>
to your stylesheet.

Incidentally despite the fact that you have refered to entities in the
subject line there are no entity references in your input (except the
parameter entity reference %xhtml) if you enter all your characters
directlly as character data there's no need to reference the xhtml dtd
(which might have a very noticable effect on parsing speed, especially
if you really are fetching the dtd off eth w3c site each time)

David
Jul 20 '05 #3


Tjerk Wolterink wrote:
I've a problem in an xsl transformation.
My xml input:

--- input.xml ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE xc:content [ ^^^^^^^^^^
<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>
<xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent"
If the DOCTYPE declaration says the root element is xc:content then you
should have that but you have xc:xcontent so one needs to be changed.
xmlns="http://www.w3.org/1999/xhtml" module="news">
<xc:text type="html">
leuk he jazeker ãôé<br/>
</xc:text>
</xc:xcontent>

----

And an xsl file:

-- style.xsl ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

<xsl:output method="xml" indent="yes"/>
What output encoding do you want?
<!--
! All html should remain html
!-->
<xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates select="./node()"/>
</xsl:copy>
</xsl:template>
Could be done easier and more efficient:

<xsl:template match="xhtml:*">
<xsl:copy>
<xsl:copy-of select="@* " />
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>

where the prefix xhtml is bound to the namespace URI for XHTML earlier
in the document.

The output here is:

---
<page:page type="module">
<p>
leuk he jazeker<br/>
</p>
</page:page>

---
But i expect this as output
---
<page:page type="module">
<p>
leuk he jazeker ãôé<br/>
</p>
</page:page>
---


What XSLT processor are you using, how exactly do you run the
transformation?

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jul 20 '05 #4
Martin Honnen wrote:


Tjerk Wolterink wrote:
I've a problem in an xsl transformation.
My xml input:

--- input.xml ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE xc:content [
^^^^^^^^^^
<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>
<xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent"

If the DOCTYPE declaration says the root element is xc:content then you
should have that but you have xc:xcontent so one needs to be changed.


Your right, typing error. The xml-reader does not complain, therefore i did not notice this error.
xmlns="http://www.w3.org/1999/xhtml" module="news">
<xc:text type="html">
leuk he jazeker ãôé<br/>
</xc:text>
</xc:xcontent>

----

And an xsl file:

-- style.xsl ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

<xsl:output method="xml" indent="yes"/>

What output encoding do you want?
<!--
! All html should remain html
!-->
<xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates select="./node()"/>
</xsl:copy>
</xsl:template>

Could be done easier and more efficient:

<xsl:template match="xhtml:*">
<xsl:copy>
<xsl:copy-of select="@* " />
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>

where the prefix xhtml is bound to the namespace URI for XHTML earlier
in the document.


that is a solution, but they both work.
The output here is:

---
<page:page type="module">
<p>
leuk he jazeker<br/>
</p>
</page:page>

---
But i expect this as output
---
<page:page type="module">
<p>
leuk he jazeker ãôé<br/>
</p>
</page:page>
---

What XSLT processor are you using, how exactly do you run the
transformation?


I'm using sablatron for xsl transformations.
But i think the problem is more complex than i thought.
Jul 20 '05 #5
> [cut]

Well,

The example i gave you was a bad one.
The problem i have do not occur in my examples.

Here an example where the problem does occur:

I have an xml document:
---
<?xml version="1.0" encoding="ISO-8859-1"?>

<!DOCTYPE xc:xcontent [
<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>
<xc:xcontent xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent" xmlns="http://www.w3.org/1999/xhtml" module="geschiedenis">
<xc:text type="html">
<p>Caf&eacute; de Kletskop is gevestigd in een oud lichtenvoords pander,
</p> </xc:text>
</xc:xcontent>
---
And when i put this together with this xsl document:
-- style.xsl ---

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:xc="http://www.wolterinkwebdesign.com/xml/xcontent">

<xsl:output method="xml" indent="yes"/>

<!--
! All html should remain html
!-->
<xsl:template match="*[namespace-uri(.)='http://www.w3.org/1999/xhtml']">
<xsl:copy>
<xsl:for-each select="@*">
<xsl:copy/>
</xsl:for-each>
<xsl:apply-templates select="./node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="/xc:xcontent">
<page:page type="module">
<xsl:apply-templates select="xc:text"/>
</page:page>
</xsl:template>

</xsl:stylesheet>
---
Then the output will be:

--
<page:page type="module">
<p>
Caf de Kletskop is gevestigd in een oud lichtenvoords pander,
</p>
</page:page>
--
My &eacute; in the xml is gone in the transformation output.

Sorry that i gave a bad example, now the problem should be clear.

How do you solve my problem?
Jul 20 '05 #6

a non validating parser is allowed by the XML recommendation to _not_
fetch external DTD files and just report entity references as undefined.

howevr the Xpath model does not support undefined entities so in this
case I would expect that you get a parsing error on input that the
entity reference cab bot be resolved. Your system seems to be silently
dropping the entities, which looks like a bug to me.

Can't suggest what you can do other than raise it with maintainers.

David
Jul 20 '05 #7
David Carlisle wrote:
a non validating parser is allowed by the XML recommendation to _not_
fetch external DTD files and just report entity references as undefined.

howevr the Xpath model does not support undefined entities so in this
case I would expect that you get a parsing error on input that the
entity reference cab bot be resolved. Your system seems to be silently
dropping the entities, which looks like a bug to me.
Is there no way to match entities in xsl?
What is the default behavior of xsl systems when it comes to entities?
Can't suggest what you can do other than raise it with maintainers.
raise it with maintainers??
You mean to report it as a bug

David

Jul 20 '05 #8
Tjerk Wolterink <tj***@wolterinkwebdesign.com> writes:
David Carlisle wrote:
a non validating parser is allowed by the XML recommendation to _not_
fetch external DTD files and just report entity references as undefined.

howevr the Xpath model does not support undefined entities so in this
case I would expect that you get a parsing error on input that the
entity reference cab bot be resolved. Your system seems to be silently
dropping the entities, which looks like a bug to me.
Is there no way to match entities in xsl?


No, they are expanded by teh xml parser befope XSLT starts , so the
input tree has all entities expanded.
What is the default behavior of xsl systems when it comes to entities? If the parser expands then they are not there as far as XXSLT is
concerned, if it doesn't it's a fatal error and nothing is transformed.
Can't suggest what you can do other than raise it with maintainers.

Try a different XSLT engine?

raise it with maintainers??
You mean to report it as a bug
yes.

David


David
Jul 20 '05 #9
David Carlisle wrote:
Tjerk Wolterink <tj***@wolterinkwebdesign.com> writes:

[cut]

I could set the following option of the xslt-parser:

XSLT_SABOPT_PARSE_PUBLIC_ENTITIES = on
Tell the processor to parse public entities. By default this has been turned off.

But now when i do the following xsltransformation:

xml:
--

<?xml version="1.0" encoding="ISO-8859-1"?>
<page:page xmlns:page="http://www.wolterinkwebdesign.com/xml/page">
<page:content>

<page:module
module="agenda"
stylesheet="agenda.xsl">

<page:multiple-settings multiple="agendapunt" max="30" order-by="datum" direction="desc"/>
</page:module>
</page:content>
</page:page>

--
xsl:
--
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
%xhtml;
]>

<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:page="http://www.wolterinkwebdesign.com/xml/page"
xmlns:menu="http://www.wolterinkwebdesign.com/xml/menu"
xmlns:r="http://www.wolterinkwebdesign.com/xml/roles">
[rest does not matter]

</xsl:stylesheet>
--
Now i get the following error:

["msgtype"]=> string(5) "error"
["code"]=> string(1) "2"
["module"]=> string(9) "Sablotron"
["URI"]=> string(49) "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
["line"]=> string(1) "1"
["msg"]=> string(51) "XML parser error 4: not well-formed (invalid token)"
So the dtdt on w3c.org is not valid??
How can i solve this?

What i want is that xhtml entities like &nbsp; are parsed to a number entitie lik &209;
(dont know if 209=nbsp but you know what i mean)

What should i do?
Jul 20 '05 #10

So the dtdt on w3c.org is not valid??

I just tested the file you posted with rxp and it reported it as being
well formed.

How can i solve this?

Report it as a bug to the parser maintainers?

You don't need to load the whole xhtml dtd, just the entity definitions,
eg the dtd you quoted uses
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;
so
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
for latin-1 for example. so you might like to try just loading those, or
the versions of entity files you will find at
http://www.w3.org/2003/entities which I personally prefer (being
biased:-) instead of loading the xhtml dtd.
But as I said at the beginning, not using a <!DOCTYPE and not using
entity references in your stylesheet really will make your life simpler.

At the very least you ought to make local copies of the files and
reference those. refererencing the w3c site to download the xhtml dtd
every time you do a transformation is going to slow your transformation
down dramatically.

David
Jul 20 '05 #11
David Carlisle wrote:
So the dtdt on w3c.org is not valid??

I just tested the file you posted with rxp and it reported it as being
well formed.

How can i solve this?

Report it as a bug to the parser maintainers?

You don't need to load the whole xhtml dtd, just the entity definitions,
eg the dtd you quoted uses
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"xhtml-special.ent">
%HTMLspecial;
so
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
for latin-1 for example. so you might like to try just loading those, or
the versions of entity files you will find at
http://www.w3.org/2003/entities which I personally prefer (being
biased:-) instead of loading the xhtml dtd.
But as I said at the beginning, not using a <!DOCTYPE and not using
entity references in your stylesheet really will make your life simpler.

At the very least you ought to make local copies of the files and
reference those. refererencing the w3c site to download the xhtml dtd
every time you do a transformation is going to slow your transformation
down dramatically.

David

i think i solved the problem. The xsl-engine was not able to load dtd's from other servers.

David,
thanks for your help!
Jul 20 '05 #12

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Kunle Odutola | last post by:
I have a database that tracks players for children's sports clubs. I have included representative DDL for this database at the end of this post. A single instance of this database supports...
1
by: sylvain.loiseau | last post by:
It's not so clear for me, in the xml spec, which is the required behaviour of an XML processor for entities : - The characters entities (predefined, decimal and hexadecimal) must be expended and...
2
by: Nick Craig-Wood | last post by:
I'm using xml.minidom to parse some of our XML files. Some of these have entities like "&deg;" in which aren't understood by xml.minidom. These give this error. xml.parsers.expat.ExpatError:...
3
by: Michel de Becdelièvre | last post by:
I have some *performance* trouble reading MathML files in my application (in ASP.Net). - I have small MathML files (2-3k) as input - as (almost) all MathML files these use entities. I have no...
1
by: David Bertoni | last post by:
Hi all, I'm trying to resolve what appears to me an inconsistency in the XML 1.0 recommendation involving entities encoding in UTF-16 and the requirement for a byte order mark. Section 4.3.3...
2
by: Frantic | last post by:
I'm working on a list of japaneese entities that contain the entity, the unicode hexadecimal code and the xml/sgml entity used for that entity. A unicode document is read into the program, then the...
6
by: clintonG | last post by:
Can anybody make sense of this crazy and inconsistent results? // IE7 Feed Reading View disabled displays this raw XML <?xml version="1.0" encoding="utf-8" ?> <!-- AT&T HTML entities & XML...
3
by: bsagert | last post by:
Some web feeds use decimal character entities that seem to confuse Python (or me). For example, the string "doesn't" may be coded as "doesn’t" which should produce a right leaning apostrophe....
7
by: tempest | last post by:
Hi all. This is a rather long posting but I have some questions concerning the usage of character entities in XML documents and PCI security compliance. The company I work for is using a...
1
Dormilich
by: Dormilich | last post by:
Hi, I got a strange behaviour (FF 3) of entities in my xml files. I have an element (see xml listing), where the attribute/content contains a latin1 entity (&ouml;), but FF throws an error...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.