473,549 Members | 3,099 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

sgml vs unicode notation

S.
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?

is there a mathematical formula to calculate a unicode value given its
utf8 value?

Rgds,
Sam
Jul 20 '05 #1
6 2769
us********@yaho o.com (S.) wrote:
if in my website i am using the sgml { notation, is it
accurate to say to my users that the site uses unicode or that it
requires unicode?
No. Please tell what the real problem is. Why would you say anything
like that?
is there a mathematical formula to calculate a unicode value given
its utf8 value?


That's actually off-topic here, but the short answer is that UTF-8 _is_
a Unicode encoding - and algorithmically defined.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html

Jul 20 '05 #2
On Mon, 13 Oct 2003, S. wrote:
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?
Yes, no, maybe. HTML4 and later have "Unicode" (well, to be pedantic
iso-10646) as their character set, so - potentially - any HTML4 client
agent has to at lest _understand_ the use of that character set. The
specification doesn't actually require every client agent to be able
to *render* the entire character set - that would hardly be practical.

Anyhow, { is an ascii character ;-)
is there a mathematical formula to calculate a unicode value given its
utf8 value?


Sure: but if you needed to ask, I doubt that you'd want to program it
yourself. Why don't you ask the question about what you _really_ want
to achieve, rather than this detail which probably isn't really going
to help?

You do understand, don't you, that utf-8 is one of the recommended
encodings of Unicode/iso-10646? Maybe a bit of browsing around
www.unicode.org (obvious as it might seem) would help you to put the
details into context.

Perl (at least 5.8.0 or later) understands this stuff internally, so
if you talk to it nicely, it'll do anything you need. Sure, there are
plenty of other ways too. But your question is in one sense too vague
(no proper context) and in another sense too specific (you asked a
question to which the answer can only be "yes", but we don't know how
that can help you to achieve your real aims).

good luck
Jul 20 '05 #3
S.
oops i meant is there a formula to map unicode to sgml notation

{ was an example. i have finished my site using sgml notation for
the special chars.out of curiosity what i wanted to know was that if
sgml is related to unicode. i.e. is a sgml value Ӓ decimal for
some unicode char or did sgml define its own table of char to decimal
values.

thanks

us********@yaho o.com (S.) wrote in message news:<f3******* *************** ****@posting.go ogle.com>...
if in my website i am using the sgml { notation, is it accurate
to say to my users that the site uses unicode or that it requires
unicode?

is there a mathematical formula to calculate a unicode value given its
utf8 value?

Rgds,
Sam

Jul 20 '05 #4
On Tue, 13 Oct 2003, S. suddenly blurted out:
oops i meant is there a formula to map unicode to sgml notation
If you're aiming to participate in big-8 newsgroups... (well, see
footnote [1])
{ was an example.
Sure, no problem with that.
i have finished my site using sgml notation for the special
chars.out of curiosity what i wanted to know was that if sgml is
related to unicode.
The SGML notation &#number; (technically this is a "numeric character
reference") refers to the decimal value of the character position in
the chosen "Document Character Set" (a technical term from SGML).

In versions of HTML starting with RFC2070 and continuing through
HTML4.01 and into XHTML, there is only one "Document Character Set" in
HTML, and that is Unicode.

In SGML itself, you could define any character set to be a "Document
Character Set". (This would be done in the "SGML Declaration", as I
understand it). But for HTML there is a non-negotiable "SGML
Declaration" which sets the document character set (for the particular
version of HTML under discussion): and that, for good reason, was
chosen to be Unicode.

Note that the Document Character Set has *no* relationship to the
external document character coding: by using the &#number; notation
it's possible, if you wish, to represent the entire Document Character
Set (i.e Unicode) using nothing more challenging than US-ASCII
character coding. But that's only one possible option - there are
many possible choices[2]

The external document coding is specified in MIME notation by the
so-called "charset" attribute, which is very confusing in this
context, since it has nothing whatever to do with the Document
Character Set. In current versions of HTML, many different "charset"
values are used, according to the locale and writing system in use and
other considerations; but there is only one Document Character Set,
namely Unicode.
i.e. is a sgml value Ӓ decimal for some unicode char
In HTML, this is certainly so.
or did sgml define its own table of char to decimal values.


No, SGML doesn't have a specific "table" of such values: it can use
whatever Document Character Set the SGML user cares to put in their
declaration, I believe. Of course, there are good practical reasons
for choosing Unicode.
is there a mathematical formula to calculate a unicode value
Unicode publications represent their characters using a hexadecimal
notation e.g U+04D2 for "CYRILLIC CAPITAL LETTER A WITH DIAERESIS",
which you would represent as your example Ӓ

Later, SGML adopted a syntax for hexadecimal numeric character
references, e.g &#x04D2; if you don't want to do the conversion - but
as far as its use in HTML for the WWW, you get slightly better support
across browsers if you use the decimal syntax instead.
given its utf8 value?


As I said before, utf-8 is an encoding of Unicode. The details are
published, but you'd be better advised to use some available library
or module which supports this encoding, and the other encodings of
Unicode, for you.

good luck

[1] If you're aiming to participate in big-8 newsgroups, you'd be
strongly advised to catch up with the netiquette conventions. In
particular, about not posting the same question separately to
different groups (yes, some of us read more than one group, and we
spot these things), and following the accepted rules of quotation: one
quotes, with attribution, the specific part of the previous thread
which sets the context for your followup - one puts one's comment
below the context-setting quote - and one snips all extraneous matter,
signatures etc. from what one is quoting.

[2] I have an overview aimed at optimising the choice, at
http://ppewww.ph.gla.ac.uk/~flavell/charset/checklist

(this now cross-posted to c.t.sgml, and followups suggested back at
c.i.w.a.html).

Jul 20 '05 #5
S.
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> patiently wrote
<snip>
[1] If you're aiming to participate in big-8 newsgroups, you'd be
strongly advised to catch up with the netiquette conventions. In

<snip>

thanks for your detailed answer to my question.i have posted many
times before but this is the first time someone took the time to
provide me a critique of my posting style (or lack of it). This one
should look better though.

rgds,
Sam
Jul 20 '05 #6
In article <f3************ **************@ posting.google. com> in
comp.infosystem s.www.authoring.html, S. <us********@yah oo.com>
wrote:
"Alan J. Flavell" <fl*****@ph.gla .ac.uk> patiently wrote
<snip>
[1] If you're aiming to participate in big-8 newsgroups, you'd be
strongly advised to catch up with the netiquette conventions. In

<snip>

thanks for your detailed answer to my question.i have posted many
times before but this is the first time someone took the time to
provide me a critique of my posting style (or lack of it). This one
should look better though.


Indeed it does. Thank you!

--
Stan Brown, Oak Road Systems, Cortland County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2 spec: http://www.w3.org/TR/REC-CSS2/
2.1 changes: http://www.w3.org/TR/CSS21/changes.html
validator: http://jigsaw.w3.org/css-validator/
Jul 20 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
5339
by: Usman | last post by:
Dear friends, I would like to ask about James Clark sx.exe parser from SGML to XML. I write the batch file like this : "E:\Project\sx\sx.exe" -wall "-DE:\Project\sx\entities" "-fE:\Project\error.log" -xndata "E:\Project\xyz.dtd" "E:\Project\xyz.sgm" > "E:\Project\xyz.xml" E:\Project\sx\sx.exe:E:\Project\xyz.sgm:15:55:W: reference to...
1
2153
by: krammer | last post by:
Hello, I have the following questions that I have not been able to find any *good* answers for. Your help would me much appreciated!, fyi, I am a Java XML guy and I have no experience with SGML so my questions will probably be XML biased. 1) Is is possible to have Unicode text inside an SGML file? an example would be something like...
1
2082
by: krammer | last post by:
Hello, Can any one please give me a short but concise pros and cons list of Unicode support in both SGML and XML? long story short, we are gonna port our leagacy SGML files to XML and the new XML files will have foreign (CJK) and Ascii/English in them. XML would be better to store the text in cuase it has better Unicode support than...
5
6748
by: Lars | last post by:
Why doesn't the W3C's HTML Validator recognize &euro; and what do I have to do to make my html-file valid?
25
2976
by: Andrew Thompson | last post by:
I was recently loading an HTML editor so I could find the charcode of that particularly obscure character using the editor's 'insert special character' dialog. It occured to me there had to be a better way. There are probably dozens, but here is my solution.. http://www.physci.org/codes/charset.jsp
1
2287
by: lkrubner | last post by:
>Alan J. Flavell Oct 7 2004, 1:44 pm show options >>On Thu, 7 Oct 2004, Shmuel (Seymour J.) Metz wrote: >> at 08:24 PM, "Alan J. Flavell" <flav...@ph.gla.ac.uk> said: >> >I think you mean "multiple character encoding schemes". >> Yes, although a different character set would imply a different >> encoding scheme. > >Absolutely not....
2
2793
by: Frantic | last post by:
I'm working on a list of japaneese entities that contain the entity, the unicode hexadecimal code and the xml/sgml entity used for that entity. A unicode document is read into the program, then the program sorts out every doublet and the hexadecimal unicode code is extracted, but I dont know a way to find the xml or sgml-entity equivalent to...
7
4011
by: 7stud | last post by:
Based on this example and the error: ----- u_str = u"abc\u9999" print u_str UnicodeEncodeError: 'ascii' codec can't encode character u'\u9999' in position 3: ordinal not in range(128) ------
0
571
by: M.-A. Lemburg | last post by:
On 2008-07-01 20:31, Peter Bulychev wrote: You could write a codec which translates Unicode into a ASCII lookalike characters, but AFAIK there is no standard for doing this. I guess the best choice is to use the Unicode code point names as basis. These can be accessed via unicodedata.name(). You can then create a mapping which can be...
0
7718
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7956
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7470
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
7809
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6041
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
1
5368
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
3498
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3480
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
763
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.