473,395 Members | 1,647 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

sgml vs unicode notation

S.
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?

is there a mathematical formula to calculate a unicode value given its
utf8 value?

Rgds,
Sam
Jul 20 '05 #1
6 2752
us********@yahoo.com (S.) wrote:
if in my website i am using the sgml { notation, is it
accurate to say to my users that the site uses unicode or that it
requires unicode?
No. Please tell what the real problem is. Why would you say anything
like that?
is there a mathematical formula to calculate a unicode value given
its utf8 value?


That's actually off-topic here, but the short answer is that UTF-8 _is_
a Unicode encoding - and algorithmically defined.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #2
On Mon, 13 Oct 2003, S. wrote:
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?
Yes, no, maybe. HTML4 and later have "Unicode" (well, to be pedantic
iso-10646) as their character set, so - potentially - any HTML4 client
agent has to at lest _understand_ the use of that character set. The
specification doesn't actually require every client agent to be able
to *render* the entire character set - that would hardly be practical.

Anyhow, { is an ascii character ;-)
is there a mathematical formula to calculate a unicode value given its
utf8 value?


Sure: but if you needed to ask, I doubt that you'd want to program it
yourself. Why don't you ask the question about what you _really_ want
to achieve, rather than this detail which probably isn't really going
to help?

You do understand, don't you, that utf-8 is one of the recommended
encodings of Unicode/iso-10646? Maybe a bit of browsing around
www.unicode.org (obvious as it might seem) would help you to put the
details into context.

Perl (at least 5.8.0 or later) understands this stuff internally, so
if you talk to it nicely, it'll do anything you need. Sure, there are
plenty of other ways too. But your question is in one sense too vague
(no proper context) and in another sense too specific (you asked a
question to which the answer can only be "yes", but we don't know how
that can help you to achieve your real aims).

good luck
Jul 20 '05 #3
S.
oops i meant is there a formula to map unicode to sgml notation

{ was an example. i have finished my site using sgml notation for
the special chars.out of curiosity what i wanted to know was that if
sgml is related to unicode. i.e. is a sgml value Ӓ decimal for
some unicode char or did sgml define its own table of char to decimal
values.

thanks

us********@yahoo.com (S.) wrote in message news:<f3**************************@posting.google. com>...
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?

is there a mathematical formula to calculate a unicode value given its
utf8 value?

Rgds,
Sam

Jul 20 '05 #4
On Tue, 13 Oct 2003, S. suddenly blurted out:
oops i meant is there a formula to map unicode to sgml notation
If you're aiming to participate in big-8 newsgroups... (well, see
footnote [1])
{ was an example.
Sure, no problem with that.
i have finished my site using sgml notation for the special
chars.out of curiosity what i wanted to know was that if sgml is
related to unicode.
The SGML notation &#number; (technically this is a "numeric character
reference") refers to the decimal value of the character position in
the chosen "Document Character Set" (a technical term from SGML).

In versions of HTML starting with RFC2070 and continuing through
HTML4.01 and into XHTML, there is only one "Document Character Set" in
HTML, and that is Unicode.

In SGML itself, you could define any character set to be a "Document
Character Set". (This would be done in the "SGML Declaration", as I
understand it). But for HTML there is a non-negotiable "SGML
Declaration" which sets the document character set (for the particular
version of HTML under discussion): and that, for good reason, was
chosen to be Unicode.

Note that the Document Character Set has *no* relationship to the
external document character coding: by using the &#number; notation
it's possible, if you wish, to represent the entire Document Character
Set (i.e Unicode) using nothing more challenging than US-ASCII
character coding. But that's only one possible option - there are
many possible choices[2]

The external document coding is specified in MIME notation by the
so-called "charset" attribute, which is very confusing in this
context, since it has nothing whatever to do with the Document
Character Set. In current versions of HTML, many different "charset"
values are used, according to the locale and writing system in use and
other considerations; but there is only one Document Character Set,
namely Unicode.
i.e. is a sgml value Ӓ decimal for some unicode char
In HTML, this is certainly so.
or did sgml define its own table of char to decimal values.


No, SGML doesn't have a specific "table" of such values: it can use
whatever Document Character Set the SGML user cares to put in their
declaration, I believe. Of course, there are good practical reasons
for choosing Unicode.
is there a mathematical formula to calculate a unicode value
Unicode publications represent their characters using a hexadecimal
notation e.g U+04D2 for "CYRILLIC CAPITAL LETTER A WITH DIAERESIS",
which you would represent as your example Ӓ

Later, SGML adopted a syntax for hexadecimal numeric character
references, e.g &#x04D2; if you don't want to do the conversion - but
as far as its use in HTML for the WWW, you get slightly better support
across browsers if you use the decimal syntax instead.
given its utf8 value?


As I said before, utf-8 is an encoding of Unicode. The details are
published, but you'd be better advised to use some available library
or module which supports this encoding, and the other encodings of
Unicode, for you.

good luck

[1] If you're aiming to participate in big-8 newsgroups, you'd be
strongly advised to catch up with the netiquette conventions. In
particular, about not posting the same question separately to
different groups (yes, some of us read more than one group, and we
spot these things), and following the accepted rules of quotation: one
quotes, with attribution, the specific part of the previous thread
which sets the context for your followup - one puts one's comment
below the context-setting quote - and one snips all extraneous matter,
signatures etc. from what one is quoting.

[2] I have an overview aimed at optimising the choice, at
http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist

(this now cross-posted to c.t.sgml, and followups suggested back at
c.i.w.a.html).

Jul 20 '05 #5
S.
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> patiently wrote
<snip>
[1] If you're aiming to participate in big-8 newsgroups, you'd be
strongly advised to catch up with the netiquette conventions. In

<snip>

thanks for your detailed answer to my question.i have posted many
times before but this is the first time someone took the time to
provide me a critique of my posting style (or lack of it). This one
should look better though.

rgds,
Sam
Jul 20 '05 #6
In article <f3**************************@posting.google.com > in
comp.infosystems.www.authoring.html, S. <us********@yahoo.com>
wrote:
"Alan J. Flavell" <fl*****@ph.gla.ac.uk> patiently wrote
<snip>
[1] If you're aiming to participate in big-8 newsgroups, you'd be
strongly advised to catch up with the netiquette conventions. In

<snip>

thanks for your detailed answer to my question.i have posted many
times before but this is the first time someone took the time to
provide me a critique of my posting style (or lack of it). This one
should look better though.


Indeed it does. Thank you!

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Usman | last post by:
Dear friends, I would like to ask about James Clark sx.exe parser from SGML to XML. I write the batch file like this : "E:\Project\sx\sx.exe" -wall "-DE:\Project\sx\entities"...
1
by: krammer | last post by:
Hello, I have the following questions that I have not been able to find any *good* answers for. Your help would me much appreciated!, fyi, I am a Java XML guy and I have no experience with SGML...
1
by: krammer | last post by:
Hello, Can any one please give me a short but concise pros and cons list of Unicode support in both SGML and XML? long story short, we are gonna port our leagacy SGML files to XML and the new...
5
by: Lars | last post by:
Why doesn't the W3C's HTML Validator recognize &euro; and what do I have to do to make my html-file valid?
25
by: Andrew Thompson | last post by:
I was recently loading an HTML editor so I could find the charcode of that particularly obscure character using the editor's 'insert special character' dialog. It occured to me there had to be...
1
by: lkrubner | last post by:
>Alan J. Flavell Oct 7 2004, 1:44 pm show options >>On Thu, 7 Oct 2004, Shmuel (Seymour J.) Metz wrote: >> at 08:24 PM, "Alan J. Flavell" <flav...@ph.gla.ac.uk> said: >> >I think you...
2
by: Frantic | last post by:
I'm working on a list of japaneese entities that contain the entity, the unicode hexadecimal code and the xml/sgml entity used for that entity. A unicode document is read into the program, then the...
7
by: 7stud | last post by:
Based on this example and the error: ----- u_str = u"abc\u9999" print u_str UnicodeEncodeError: 'ascii' codec can't encode character u'\u9999' in position 3: ordinal not in range(128) ------
0
by: M.-A. Lemburg | last post by:
On 2008-07-01 20:31, Peter Bulychev wrote: You could write a codec which translates Unicode into a ASCII lookalike characters, but AFAIK there is no standard for doing this. I guess the best...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.