473,698 Members | 2,445 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

SGML Charset explorer..

I was recently loading an HTML editor
so I could find the charcode of that
particularly obscure character using the
editor's 'insert special character' dialog.

It occured to me there had to be a
better way. There are probably dozens,
but here is my solution..
http://www.physci.org/codes/charset.jsp

This page is my 'charset explorer', it displays
character codes in a table 456 at a time.

It also has links to a page giving larger
representations of each character. Vis.
http://www.physci.org/codes/char.jsp?char=65
http://www.physci.org/codes/char.jsp?char=84
http://www.physci.org/codes/char.jsp?char=1944

I hope it brings a..
http://www.physci.org/codes/char.jsp?char=9786
...to your mug.

--
Andrew Thompson
* http://www.PhySci.org/ Open-source software suite
* http://www.PhySci.org/codes/ Web & IT Help
* http://www.1point1C.org/ Science & Technology
Jul 20 '05 #1
25 3006
"Andrew Thompson" <Se********@www .invalid> wrote:
I was recently loading an HTML editor
so I could find the charcode of that
particularly obscure character using the
editor's 'insert special character' dialog.

It occured to me there had to be a
better way. There are probably dozens,
http://www.eki.ee/letter/ is my reference of choice.
but here is my solution..
http://www.physci.org/codes/charset.jsp


http://www.physci.org/codes/charset....8859-1&frame=1
Characters 127 to 159 in ISO-8859-1 (and all other ISO-8859 encodings)
are control characters. You seem to have some Windows-1252 characters
in there instead.

http://www.physci.org/codes/charset....8859-1&frame=2
There are only 256 characters in ISO-8859-1, so where did these come
from?

http://www.physci.org/codes/charset....8859-5&frame=1
Doesn't actually display any cyrillic characters. Mainly because
you've coded them as &#XXX; and numeric character references in HTML
always refer to unicode.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net > <http://steve.pugh.net/>
Jul 20 '05 #2
On Wed, 18 Feb 2004, Steve Pugh wrote:
http://www.physci.org/codes/charset....8859-1&frame=1
Characters 127 to 159 in ISO-8859-1 (and all other ISO-8859 encodings)
are control characters. You seem to have some Windows-1252 characters
in there instead.
Blame your own browser!
http://www.physci.org/codes/charset....8859-1&frame=2
There are only 256 characters in ISO-8859-1, so where did these come
from?
The site is a bit confusing. Only "frame=..." is important for the
displayed characters. One and the same document is then sent with
different charset parameters. That should have no effect - but
actually browsers will take a different typeface for each charset
parameter.
http://www.physci.org/codes/charset....8859-5&frame=1
Doesn't actually display any cyrillic characters.


<http://www.physci.org/codes/charset.jsp?cs= iso-8859-5&frame=5>

Jul 20 '05 #3
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote:
On Wed, 18 Feb 2004, Steve Pugh wrote:
http://www.physci.org/codes/charset....8859-1&frame=1
Characters 127 to 159 in ISO-8859-1 (and all other ISO-8859 encodings)
are control characters. You seem to have some Windows-1252 characters
in there instead.


Blame your own browser!


Blame all my own browsers! Every browser I have incorrectly displays,
for example, ™ as a trademark sign. That's NN4, NN6, NN7, IE5,
IE5.5, IE6, Op5, Op6, Op7, Moz 1.6, Firefox 0.8 and even Lynx.

But the site is claiming that -
"SGML character 153. This is the character "?".
In HTML you would write it:
<p>This is the character "™".</p>"
<http://www.physci.org/codes/char.jsp?char=1 53>
which is just plain wrong.
http://www.physci.org/codes/charset....8859-1&frame=2
There are only 256 characters in ISO-8859-1, so where did these come
from?


The site is a bit confusing. Only "frame=..." is important for the
displayed characters. One and the same document is then sent with
different charset parameters. That should have no effect - but
actually browsers will take a different typeface for each charset
parameter.


It really is deeply misleading.
http://www.physci.org/codes/charset....8859-5&frame=1
Doesn't actually display any cyrillic characters.


<http://www.physci.org/codes/charset.jsp?cs= iso-8859-5&frame=5>


That is displaying unicode characters 0401-0500 (rather than the more
useful 0400-04FF).

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net > <http://steve.pugh.net/>
Jul 20 '05 #4
Steve Pugh wrote:
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote:
On Wed, 18 Feb 2004, Steve Pugh wrote:
http://www.physci.org/codes/charset....8859-1&frame=1
Characters 127 to 159 in ISO-8859-1 (and all other ISO-8859
encodings) are control characters. You seem to have some
Windows-1252 characters in there instead.
Blame your own browser!


Blame all my own browsers! Every browser I have incorrectly displays,
for example, ™ as a trademark sign. That's NN4, NN6, NN7, IE5,
IE5.5, IE6, Op5, Op6, Op7, Moz 1.6, Firefox 0.8 and even Lynx.

But the site is claiming that -
"SGML character 153. This is the character "?".


Not here. My UA's (IE6/Moz 1.3 on XP) shows
http://localhost:8080/codes/char.jsp?char=153
as a 'tm' character.
In HTML you would write it:
<p>This is the character "™".</p>"
<http://www.physci.org/codes/char.jsp?char=1 53>
which is just plain wrong.
http://www.physci.org/codes/charset....8859-1&frame=2
There are only 256 characters in ISO-8859-1, so where did these come
from?

I got the impression that charset affected
the characters, that was not borne out by
my investigations, but I thought I would
leave it in there for the moment.
The site is a bit confusing. Only "frame=..." is important for the
displayed characters. One and the same document is then sent with
different charset parameters. That should have no effect - but
actually browsers will take a different typeface for each charset
parameter.

So I should remove all reference to charset?
It really is deeply misleading.
http://www.physci.org/codes/charset....8859-5&frame=1
Doesn't actually display any cyrillic characters.


<http://www.physci.org/codes/charset.jsp?cs= iso-8859-5&frame=5>


That is displaying unicode characters 0401-0500 (rather than the more
useful 0400-04FF).


....errr. That table is 16x16, ..or were you talking
hex there? I want to add the hex denomination
when I get a moment.

But on another note, I had a few questions
when I first posted, but the comments have
made me realise ..I have a lot of questions.

I'll cogitate the comments for a while before
I formulate my questions..

--
Andrew Thompson
* http://www.PhySci.org/ Open-source software suite
* http://www.PhySci.org/codes/ Web & IT Help
* http://www.1point1C.org/ Science & Technology
Jul 20 '05 #5
"Andrew Thompson" <Se********@www .invalid> wrote:
http://www.physci.org/codes/charset.jsp

This page is my 'charset explorer', it displays
character codes in a table 256 at a time.


Yeah, anyway. How do I save the applet with the periodic table? That's
my reason for loathing Flash and Java: when they are something
worthwhile it's always such a hassle to save.

Pack it all into 100+ nested tables, I say. Then use:

td table {
display: none
}
td:hover table {
display: table;
position: absolute;
z-index: 2;
}

- or something. Would be cool for one of those "this only works in
Opera" demo pages.
Jul 20 '05 #6
"Andrew Thompson" <Se********@www .invalid> wrote:
Steve Pugh wrote:
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote:
On Wed, 18 Feb 2004, Steve Pugh wrote:

http://www.physci.org/codes/charset....8859-1&frame=1
Characters 127 to 159 in ISO-8859-1 (and all other ISO-8859
encodings) are control characters. You seem to have some
Windows-1252 characters in there instead.

Blame your own browser!


Blame all my own browsers! Every browser I have incorrectly displays,
for example, ™ as a trademark sign. That's NN4, NN6, NN7, IE5,
IE5.5, IE6, Op5, Op6, Op7, Moz 1.6, Firefox 0.8 and even Lynx.

But the site is claiming that -
"SGML character 153. This is the character "?".


Not here. My UA's (IE6/Moz 1.3 on XP) shows
http://localhost:8080/codes/char.jsp?char=153
as a 'tm' character.


It's just that TM characters aren't in acsii and hence couldn't be
transmitted in the news message. I should have edited the cut and
paste before sending.

That's what my browser shows as well, which is what's wrong.
™ is undefined in HTML and should not be used. It's a widespread
browser bug/feature to translate this into a Windows-1252 character.
The trademark character is actually ™
http://www.physci.org/codes/charset....8859-1&frame=2
There are only 256 characters in ISO-8859-1, so where did these come
from?
I got the impression that charset affected the characters,


It does. But you didn't use any characters in your pages. You used
numeric character references, which in HTML are always references to
unicode.
The site is a bit confusing. Only "frame=..." is important for the
displayed characters. One and the same document is then sent with
different charset parameters. That should have no effect - but
actually browsers will take a different typeface for each charset
parameter.
So I should remove all reference to charset?


Probably best. Users already have the ability to change the character
set used to display the page via their browsers. As your tables don't
use anything that's character set dependent there's really no point in
including them.
<http://www.physci.org/codes/charset.jsp?cs= iso-8859-5&frame=5>


That is displaying unicode characters 0401-0500 (rather than the more
useful 0400-04FF).


...errr. That table is 16x16, ..or were you talking
hex there?


Yes, wasn't the FF a clue? ;-)

The point I was making is that character sets are zero based, so by
counting 1 to 256 rather than 0 to 255, and so on for your higher
frames, you're doing it differently to every other reference.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net > <http://steve.pugh.net/>
Jul 20 '05 #7
Karl Smith wrote:
Yeah, anyway. How do I save the applet with the periodic table?
??? Bit of a change of subject!

Anyway, I had never bothered to set-up
the periodic table _applet_ as an easy install,
but you can get the application here..
http://www.physci.org/install/download.jsp

It's around 1 Meg, and includes 5
other programs (the page lies and
says 6 - but the browser was so
dodgy I removed it)

Over the next year I plan to break the
software suite up into individual
programs (on the basis that not many
people need a software suite with both
a text editor and ..periodic table)
...That's
my reason for loathing Flash and Java: when they are something
worthwhile it's always such a hassle to save.
Java now offers Java Web-Start.
It offers a painless install for the user
(except for the 'you might die' security
warning that comes up when installing).

On the upside, program updates are
automatic, on the downside, even
Java developers have trouble finding
where the .jar files are actually installed.
Pack it all into 100+ nested tables, I say. Then use:

td table {
display: none
}
td:hover table {
display: table;
position: absolute;
z-index: 2;
}

- or something. Would be cool for one of those "this only works in
Opera" demo pages.


;-)

--
Andrew Thompson
* http://www.PhySci.org/ Open-source software suite
* http://www.PhySci.org/codes/ Web & IT Help
* http://www.1point1C.org/ Science & Technology
Jul 20 '05 #8
Steve Pugh wrote:
"Andrew Thompson" wrote:
Steve Pugh wrote:
Andreas Prilop <nh******@rrz n-user.uni-hannover.de> wrote: .... That is displaying unicode characters 0401-0500 (rather than the
more useful 0400-04FF).
...errr. That table is 16x16, ..or were you talking
hex there?


Yes, wasn't the FF a clue? ;-)


Well shucks, I need *big* clues.
...about '4x2' should do.
The point I was making is that character sets are zero based, so by
counting 1 to 256 rather than 0 to 255, and so on for your higher
frames, you're doing it differently to every other reference.


I am starting a trend.
...OK, no, not really.
I'll adjust it next couple of days.
Jul 20 '05 #9
"Andrew Thompson" <Se********@www .invalid> wrote:
Karl Smith wrote:
Yeah, anyway. How do I save the applet with the periodic table?


??? Bit of a change of subject!

Anyway, I had never bothered to set-up
the periodic table _applet_ as an easy install,
but you can get the application here..
http://www.physci.org/install/download.jsp


Not today. Java all crappy today. Can't get the page with the applet
to display, either.

So I've spent the last hour or so reviewing some available HTML
periodic tables and what a crappy bunch they are! The most appealing
(to look at) I've found so far is this:

http://www.dayah.com/periodic/

but it's a mess of JavaScript, font tags and rubbish underneath. And
I'm not certain the data is layed out correctly.

Shame 'bout the browser sniffing JavaScript that adds this admonition:
"Because of the complexity of this page, certain browsers may not
display it correctly. Your browser, Opera version 7.23, is
insufficient for viewing this page."
In IE6 it looks the same except for:
"Because of the complexity of this page, certain browsers may not
display it correctly. Your browser, Microsoft Internet Explorer
version 4.0, is sufficient for viewing this page."

Bugger that! I' gonna do my own pure CSS periodic table that ony works
in Opera.
Could you tell me what all those numbers mean and what the proper
layout for each element's data should be?
Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3182
by: Yossi P | last post by:
I'm developing a Hebrew-based web site and wondering wether it would be better to set a specific hebrew charset or UTF-8. As far as I understand the the biggest diffrence is that utf-8 consumes double-size (unicode) characters when saving data to the DB (-that's fine with me). My concern is, however, how does the explorer "know" if clients of my site have the right fonts I mean that if I'm using a specific charset in my web-site (i.e....
5
6757
by: Lars | last post by:
Why doesn't the W3C's HTML Validator recognize &euro; and what do I have to do to make my html-file valid?
6
2781
by: S. | last post by:
if in my website i am using the sgml { notation, is it accurate to say to my users that the site uses unicode or that it requires unicode? is there a mathematical formula to calculate a unicode value given its utf8 value? Rgds, Sam
11
868
by: Andrew Thompson | last post by:
I was recently loading an HTML editor so I could find the charcode of that particularly obscure character using the editor's 'insert special character' dialog. It occured to me there had to be a better way. There are probably dozens, but here is my solution.. http://www.physci.org/codes/charset.jsp
3
2017
by: Boris Kester | last post by:
Hello, I tried to validate this page: http://www.traveladventures.org/xhtml/ daralhajar01.html on validator.w3.org and got a message that the page is not valid transitional xhtml. However, after this message: Below are the results of attempting to parse this document with an SGML parser - I got nothing! So I have no clue what the problem might be. I tried to remove some Javascript but that did not make any difference. So now I have a...
2
1997
by: Mette Kulmbach | last post by:
I'm a danish librarian who would like to get my new homesite validated. I have tried to get it validatet in the w3c but I get the same message all the time. This is the massage. " Below are the results of attempting to parse this document with an SGML parser.
4
1305
by: Steven Bethard | last post by:
I have some plain text data and some SGML markup for that text that I need to align. (The SGML doesn't maintain the original whitespace, so I have to do some alignment; I can't just calculate the indices directly.) For example, some of my text looks like: TNF binding induces release of AIP1 (DAB2IP) from TNFR1, resulting in cytoplasmic translocation and concomitant formation of an intracellular signaling complex comprised of TRADD,...
3
2398
by: jimmy.williamson | last post by:
Hi, I'm currently working on a project where I am required to investigate how to convert SGML to XML, and then back again. >From what I've seen on the web so far, James Clark's SP software can convert SGML to XML, but thus far I cannot find anything that will go the other way. I realize that in converting SGML to XML I will lose a few things in
0
8675
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9160
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9029
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8897
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6521
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4370
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4619
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3050
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2331
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.