469,592 Members | 1,713 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,592 developers. It's quick & easy.

sgml vs unicode notation

S.
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?

is there a mathematical formula to calculate a unicode value given its
utf8 value?

Rgds,
Sam
Jul 20 '05 #1
6 2528
us********@yahoo.com (S.) wrote:
if in my website i am using the sgml { notation, is it
accurate to say to my users that the site uses unicode or that it
requires unicode?
No. Please tell what the real problem is. Why would you say anything
like that?
is there a mathematical formula to calculate a unicode value given
its utf8 value?


That's actually off-topic here, but the short answer is that UTF-8 _is_
a Unicode encoding - and algorithmically defined.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #2
On Mon, 13 Oct 2003, S. wrote:
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?
Yes, no, maybe. HTML4 and later have "Unicode" (well, to be pedantic
iso-10646) as their character set, so - potentially - any HTML4 client
agent has to at lest _understand_ the use of that character set. The
specification doesn't actually require every client agent to be able
to *render* the entire character set - that would hardly be practical.

Anyhow, { is an ascii character ;-)
is there a mathematical formula to calculate a unicode value given its
utf8 value?


Sure: but if you needed to ask, I doubt that you'd want to program it
yourself. Why don't you ask the question about what you _really_ want
to achieve, rather than this detail which probably isn't really going
to help?

You do understand, don't you, that utf-8 is one of the recommended
encodings of Unicode/iso-10646? Maybe a bit of browsing around
www.unicode.org (obvious as it might seem) would help you to put the
details into context.

Perl (at least 5.8.0 or later) understands this stuff internally, so
if you talk to it nicely, it'll do anything you need. Sure, there are
plenty of other ways too. But your question is in one sense too vague
(no proper context) and in another sense too specific (you asked a
question to which the answer can only be "yes", but we don't know how
that can help you to achieve your real aims).

good luck
Jul 20 '05 #3
S.
oops i meant is there a formula to map unicode to sgml notation

{ was an example. i have finished my site using sgml notation for
the special chars.out of curiosity what i wanted to know was that if
sgml is related to unicode. i.e. is a sgml value Ӓ decimal for
some unicode char or did sgml define its own table of char to decimal
values.

thanks

us********@yahoo.com (S.) wrote in message news:<f3**************************@posting.google. com>...
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?

is there a mathematical formula to calculate a unicode value given its
utf8 value?

Rgds,
Sam

Jul 20 '05 #4
On Tue, 13 Oct 2003, S. suddenly blurted out:
oops i meant is there a formula to map unicode to sgml notation
If you're aiming to participate in big-8 newsgroups... (well, see
footnote [1])
{ was an example.
Sure, no problem with that.
i have finished my site using sgml notation for the special
chars.out of curiosity what i wanted to know was that if sgml is
related to unicode.
The SGML notation &#number; (technically this is a "numeric character
reference") refers to the decimal value of the character position in
the chosen "Document Character Set" (a technical term from SGML).

In versions of HTML starting with RFC2070 and continuing through
HTML4.01 and into XHTML, there is only one "Document Character Set" in
HTML, and that is Unicode.

In SGML itself, you could define any character set to be a "Document
Character Set". (This would be done in the "SGML Declaration", as I
understand it). But for HTML there is a non-negotiable "SGML
Declaration" which sets the document character set (for the particular
version of HTML under discussion): and that, for good reason, was
chosen to be Unicode.

Note that the Document Character Set has *no* relationship to the
external document character coding: by using the &#number; notation
it's possible, if you wish, to represent the entire Document Character
Set (i.e Unicode) using nothing more challenging than US-ASCII
character coding. But that's only one possible option - there are
many possible choices[2]

The external document coding is specified in MIME notation by the
so-called "charset" attribute, which is very confusing in this
context, since it has nothing whatever to do with the Document
Character Set. In current versions of HTML, many different "charset"
values are used, according to the locale and writing system in use and
other considerations; but there is only one Document Character Set,
namely Unicode.
i.e. is a sgml value Ӓ decimal for some unicode char
In HTML, this is certainly so.
or did sgml define its own table of char to decimal values.


No, SGML doesn't have a specific "table" of such values: it can use
whatever Document Character Set the SGML user cares to put in their
declaration, I believe. Of course, there are good practical reasons
for choosing Unicode.
is there a mathematical formula to calculate a unicode value
Unicode publications represent their characters using a hexadecimal
notation e.g U+04D2 for "CYRILLIC CAPITAL LETTER A WITH DIAERESIS",
which you would represent as your example Ӓ

Later, SGML adopted a syntax for hexadecimal numeric character
references, e.g &#x04D2; if you don't want to do the conversion - but
as far as its use in HTML for the WWW, you get slightly better support
across browsers if you use the decimal syntax instead.
given its utf8 value?


As I said before, utf-8 is an encoding of Unicode. The details are
published, but you'd be better advised to use some available library
or module which supports this encoding, and the other encodings of
Unicode, for you.

good luck

[1] If you're aiming to participate in big-8 newsgroups, you'd be
strongly advised to catch up with the netiquette conventions. In
particular, about not posting the same question separately to
different groups (yes, some of us read more than one group, and we
spot these things), and following the accepted rules of quotation: one
quotes, with attribution, the specific part of the previous thread
which sets the context for your followup - one puts one's comment
below the context-setting quote - and one snips all extraneous matter,
signatures etc. from what one is quoting.

[2] I have an overview aimed at optimising the choice, at
http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist

(this now cross-posted to c.t.sgml, and followups suggested back at
c.i.w.a.html).

Jul 20 '05 #5
S.
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> patiently wrote
<snip>
[1] If you're aiming to participate in big-8 newsgroups, you'd be
strongly advised to catch up with the netiquette conventions. In

<snip>

thanks for your detailed answer to my question.i have posted many
times before but this is the first time someone took the time to
provide me a critique of my posting style (or lack of it). This one
should look better though.

rgds,
Sam
Jul 20 '05 #6
In article <f3**************************@posting.google.com > in
comp.infosystems.www.authoring.html, S. <us********@yahoo.com>
wrote:
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> patiently wrote
<snip>
[1] If you're aiming to participate in big-8 newsgroups, you'd be
strongly advised to catch up with the netiquette conventions. In

<snip>

thanks for your detailed answer to my question.i have posted many
times before but this is the first time someone took the time to
provide me a critique of my posting style (or lack of it). This one
should look better though.


Indeed it does. Thank you!

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Usman | last post: by
1 post views Thread by krammer | last post: by
1 post views Thread by krammer | last post: by
5 posts views Thread by Lars | last post: by
25 posts views Thread by Andrew Thompson | last post: by
7 posts views Thread by 7stud | last post: by
reply views Thread by suresh191 | last post: by
4 posts views Thread by guiromero | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.