Frantic wrote:
I'm working on a list of japaneese entities that contain the entity,
the unicode hexadecimal code and the xml/sgml entity used for that
entity. A unicode document is read into the program, then the program
sorts out every doublet and the hexadecimal unicode code is extracted,
but I dont know a way to find the xml or sgml-entity equivalent to the
unicode code. Anyone who could give me a pointer?
If you have the Unicode code number then in XML as well as SGML you can do
&#number;
or
&#xhexnumber;
In XML there are only a few entities predefined (e.g. quot, apos, lt,
gt, amp) so you would need to keep those in a dictionary or similar.
SGML itself does not predefine more entities I think, if you are
thinking about HTML and not SGML then HTML 4 defines some entities which
you can find here:
<http://www.w3.org/TR/html4/sgml/entities.html>
So you need to put those into a dictionary.
If you have an SGML or XML DTD defining character entities and you want
to read them out programmatically then you can do with an SGML
respectively XML parser. .NET has a built-in XML parser that can parse
DTDs but there is not much of an API to get at the information in the
DTD while the parsing is done.
The DOM API however, after the parsing, gives you some information about
entities:
C# example:
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(@"<!DOCTYPE example [
<!ENTITY auml ""ä"">
]>
<example>Kibology</example>");
foreach (XmlEntity entity in xmlDocument.DocumentType.Entities) {
Console.WriteLine("Enity name: {0}, replacement: {1}.",
entity.Name, entity.InnerText);
}
prints out
Enity name: auml, replacement: ä.
--
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/