Jean-François Michaud wrote:
Hello,
I was wondering if anybody could point me in the right direction
regarding this.
I have unicode entities in an XML in hexadecimal format and I need to
be able to convert to ISO entities. Are there facilities available to
do this easily or do I have to parse all text and convert everything
manually? If thats what I have to do, is there any code already
available that would orient me in the right direction?
This is my XML snippet.
XML:
<?xml version = "1.0" encoding = "UTF-8"?>
<root>
<para>Å Å å Ã β ε ϰ
λ μ</para>
</root>
I basically need to something like this:
SGML:
<root>
<para>Å Å å Ã &b.beta; &b.epsi; &b.kappav;
&b.lambda; &b.mu;</para>
</root>
Thanks
Regards
Jeff
one way is to use xslt2 character maps, if I save your file as ent.xml,
saxon8 gives the following output if run with the stylesheet at the end
it's not quite the result you asked for but I think the bold greek
should map to the characters in plane1 so the grk3 entity names are used
rather than grk4. (It would be easy for you to take a local copy and
change that though)
David
$ saxon8 ent.xml ent.xsl
<?xml version="1.0" encoding="UTF-8"?><root>
<para>Å Å å Ã β ϵ ϰ
λ μ</para>
</root>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import
href="http://www.w3.org/2003/entities/iso9573-2003/iso9573-2003map.xsl"/>
<xsl:output use-character-maps="iso9573-2003"/>
<xsl:template match="/">
<xsl:copy-of select="/"/>
</xsl:template>
</xsl:stylesheet>