473,324 Members | 2,541 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

numeric entities in XSL

More silly questions, I'm afraid.

Consider the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">

<xsl:output indent="yes" method="text"/>

<xsl:template match="/">
 £
</xsl:template>

<xsl:template match="*">
<!-- nothing -->
</xsl:template>
</xsl:stylesheet>

When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

ÂÂ*£

(that is not seven-bit clean - if it is not correctly transmitted by NNTP,
it is uppercase A circumflex, space, uppercase A circumflex, pound-sign).
If the output method is changed to 'xml', the output is the same. If the
output method is changed to 'html', however, xsltproc outputs exactly the
same, but Xalan2 outputs:

&nbsp;&pound;

If we now change the stylesheet to:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE stylesheet [
<!ENTITY nobreak " ">
<!ENTITY poundsign "£">
]>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">

<xsl:output indent="yes" method="html"/>

<xsl:template match="/">
&nobreak;&poundsign;
</xsl:template>

<xsl:template match="*">
<!-- nothing -->
</xsl:template>
</xsl:stylesheet>

then the behaviour is exactly the same as before.

So, questions:

(1) Where does the uppercase A circumflex come from? What do I have to do
to avoid it?
(2) Where does Xalan2 magically get the HTML entity names from, and is it
in accord with the standard in printing them?

--
si***@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

;; When all else fails, read the distractions.

Mar 14 '07 #1
4 3333
* Simon Brooke wrote in comp.text.xml:
<xsl:template match="/">
 £
</xsl:template>
>When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

Â*£

(that is not seven-bit clean - if it is not correctly transmitted by NNTP,
it is uppercase A circumflex, space, uppercase A circumflex, pound-sign).
If the output method is changed to 'xml', the output is the same. If the
output method is changed to 'html', however, xsltproc outputs exactly the
same, but Xalan2 outputs:
>(1) Where does the uppercase A circumflex come from? What do I have to do
to avoid it?
You are seeing UTF-8 interpreted as some other encoding like ISO-8859-1.
The problem is that you are using the wrong tool to inspect the result,
or failed to configure the tool correctly. Use a tool with UTF-8 support
or tell the tool the content is UTF-8 encoded or pick a different en-
coding using xsl:output encoding='...'.
>(2) Where does Xalan2 magically get the HTML entity names from, and is it
in accord with the standard in printing them?
It presumably has an internal character<->entity table where it looks
it up, and yes, that's in accord with the HTML output method, see the
XSLT 1.0 spec, <http://www.w3.org/TR/xslt#section-HTML-Output-Method>.
--
Björn Höhrmann · mailto:bj****@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Mar 14 '07 #2
Simon Brooke wrote:
When processed by Xalan 2.7.0 or by xsltproc 1.1.19 both output:

 £
You get that result if the output is UTF-8 encoded but you look at it
with a tool/editor that assumes ISO-8859-1 to decode.
You might want to use e.g.
<xsl:output encoding="ISO-8859-1"/>
in your stylesheet if you want that encoding respectively if your editor
assumes it.
--

Martin Honnen
http://JavaScript.FAQTs.com/
Mar 14 '07 #3
In article <45***********************@newsspool3.arcor-online.net>,
Martin Honnen <Ma***********@gmx.dewrote:
>You might want to use e.g.
<xsl:output encoding="ISO-8859-1"/>
Or even encoding="ascii".

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
Mar 14 '07 #4
in message <et***********@pc-news.cogsci.ed.ac.uk>, Richard Tobin
('r******@cogsci.ed.ac.uk') wrote:
In article <45***********************@newsspool3.arcor-online.net>,
Martin Honnen <Ma***********@gmx.dewrote:
>>You might want to use e.g.
<xsl:output encoding="ISO-8859-1"/>

Or even encoding="ascii".
Ah! Thank you. Or even

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

in the generated HTML; or even (better) fix it in the servlet config so
that it sends that in the real HTTP header.

Many thanks indeed.

--
si***@jasmine.org.uk (Simon Brooke) http://www.jasmine.org.uk/~simon/

;; Conservatives are not necessarily stupid,
;; but most stupid people are conservatives -- J S Mill
Mar 15 '07 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Kunle Odutola | last post by:
I have a database that tracks players for children's sports clubs. I have included representative DDL for this database at the end of this post. A single instance of this database supports...
1
by: sylvain.loiseau | last post by:
It's not so clear for me, in the xml spec, which is the required behaviour of an XML processor for entities : - The characters entities (predefined, decimal and hexadecimal) must be expended and...
10
by: Andreas Gohr | last post by:
Hi all! I need a way to decode numeric HTML entities (like Ü) back to their UTF-8 character to place them into a textarea. I tried the following but it doesn't work in IE. data =...
3
by: Michel de Becdelièvre | last post by:
I have some *performance* trouble reading MathML files in my application (in ASP.Net). - I have small MathML files (2-3k) as input - as (almost) all MathML files these use entities. I have no...
7
by: Raj | last post by:
Hi I was hoping someone could suggest a simple way of stripping non-numeric data from a string of numbers. For example, if I have "ADB12458789\n" I would like to remove the letters and the...
1
by: David Bertoni | last post by:
Hi all, I'm trying to resolve what appears to me an inconsistency in the XML 1.0 recommendation involving entities encoding in UTF-16 and the requirement for a byte order mark. Section 4.3.3...
2
by: Frantic | last post by:
I'm working on a list of japaneese entities that contain the entity, the unicode hexadecimal code and the xml/sgml entity used for that entity. A unicode document is read into the program, then the...
6
by: clintonG | last post by:
Can anybody make sense of this crazy and inconsistent results? // IE7 Feed Reading View disabled displays this raw XML <?xml version="1.0" encoding="utf-8" ?> <!-- AT&T HTML entities & XML...
3
by: bsagert | last post by:
Some web feeds use decimal character entities that seem to confuse Python (or me). For example, the string "doesn't" may be coded as "doesn’t" which should produce a right leaning apostrophe....
7
by: tempest | last post by:
Hi all. This is a rather long posting but I have some questions concerning the usage of character entities in XML documents and PCI security compliance. The company I work for is using a...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.