473,725 Members | 1,781 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How do I display character 151 (long hyphen) in XHTML (utf-8) ?

How do I display character 151 (long hyphen) in XHTML (utf-8) ?

Is there another character that will substitute? The W3C validation parser,
http://validator.w3.org, tells me that this character and the ones around it are illegal
- then, after resubmission it flags no errors.

So, are there any illegal characters between 0 and 255 in the UTF-8 character set or is it
just my imagination that the W3C validation parser thinks there are - say between 129-151,
or thereabouts; then later it changes its mind?

Jul 20 '05 #1
76 15137
Zenobia <5.**********@s pamgourmet.com> wrote:
How do I display character 151 (long hyphen) in XHTML (utf-8) ?
Position 151 is a control character in UTF-8, it is not a long hyphen.
There is no character called long hyphen in Unicode, maybe you're
thinking of the em dash, which is decimal 8212 in Unicode and 151 in
Windows-1252?

How are you currently trying to display the character? Are you
entering it directly or are you using —? The former can be okay
under some circumstances (i.e. if you're advertising your character
encoding as being Windows-1252 or similar and if all your audience can
cope with that encoding). The latter is just dangerous and will only
'work' because some browsers break (or at least severely bend) the
specification.
Is there another character that will substitute? The W3C validation parser,
http://validator.w3.org, tells me that this character and the ones around it are illegal
- then, after resubmission it flags no errors.
What the validator does, especially with XHTML documents, is often
slightly confusing; and validation is not an all-encompassing test for
every error.
So, are there any illegal characters between 0 and 255 in the UTF-8 character set or is it
just my imagination that the W3C validation parser thinks there are - say between 129-151,
or thereabouts; then later it changes its mind?


Under HTML, there are no illegal characters, but there are undefined
characters that should not be used. These are the range 128 to 159.

Although rather old Jukka's articles
http://www.cs.tut.fi/~jkorpela/www/windows-chars.html and
http://www.cs.tut.fi/~jkorpela/chars.html#win explain more than would
fit into a sensible post.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net > <http://steve.pugh.net/>
Jul 20 '05 #2
Zenobia <5.**********@s pamgourmet.com> wrote:
How do I display character 151 (long hyphen) in XHTML (utf-8) ?
Position 151 is a control character in UTF-8, it is not a long hyphen.
There is no character called long hyphen in Unicode, maybe you're
thinking of the em dash, which is decimal 8212 in Unicode and 151 in
Windows-1252?

How are you currently trying to display the character? Are you
entering it directly or are you using —? The former can be okay
under some circumstances (i.e. if you're advertising your character
encoding as being Windows-1252 or similar and if all your audience can
cope with that encoding). The latter is just dangerous and will only
'work' because some browsers break (or at least severely bend) the
specification.
Is there another character that will substitute? The W3C validation parser,
http://validator.w3.org, tells me that this character and the ones around it are illegal
- then, after resubmission it flags no errors.
What the validator does, especially with XHTML documents, is often
slightly confusing; and validation is not an all-encompassing test for
every error.
So, are there any illegal characters between 0 and 255 in the UTF-8 character set or is it
just my imagination that the W3C validation parser thinks there are - say between 129-151,
or thereabouts; then later it changes its mind?


Under HTML, there are no illegal characters, but there are undefined
characters that should not be used. These are the range 128 to 159.

Although rather old Jukka's articles
http://www.cs.tut.fi/~jkorpela/www/windows-chars.html and
http://www.cs.tut.fi/~jkorpela/chars.html#win explain more than would
fit into a sensible post.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <st***@pugh.net > <http://steve.pugh.net/>
Jul 20 '05 #3
Zenobia <5.**********@s pamgourmet.com> wrote:
How do I display character 151 (long hyphen) in XHTML (utf-8) ?

Is there another character that will substitute? The W3C validation parser,
http://validator.w3.org, tells me that this character and the ones around it are illegal
- then, after resubmission it flags no errors.

So, are there any illegal characters between 0 and 255 in the UTF-8 character set or is it
just my imagination that the W3C validation parser thinks there are - say between 129-151,
or thereabouts; then later it changes its mind?


The characters between 128 and 159 are not valid in HTML--they are
Windows extensions to the character set. The "long hyphen" (em dash)
should be coded as — .

See Jukka Korpela's page at

http://www.cs.tut.fi/~jkorpela/www/windows-chars.html

for information on the proper code to use for most of these
characters. For character 128, the Windows euro symbol, see

http://www.cs.tut.fi/~jkorpela/html/euro.html

Windows doesn't have characters for 129, 141, 143, 144, or 157. Jukka
left out the lower- and upper-case z-hacek at positions 158 and 142--I
don't know why!

--
Harlan Messinger
Remove the first dot from my e-mail address.
Veuillez ôter le premier point de mon adresse de courriel.
Jul 20 '05 #4
Zenobia <5.**********@s pamgourmet.com> wrote:
How do I display character 151 (long hyphen) in XHTML (utf-8) ?

Is there another character that will substitute? The W3C validation parser,
http://validator.w3.org, tells me that this character and the ones around it are illegal
- then, after resubmission it flags no errors.

So, are there any illegal characters between 0 and 255 in the UTF-8 character set or is it
just my imagination that the W3C validation parser thinks there are - say between 129-151,
or thereabouts; then later it changes its mind?


The characters between 128 and 159 are not valid in HTML--they are
Windows extensions to the character set. The "long hyphen" (em dash)
should be coded as — .

See Jukka Korpela's page at

http://www.cs.tut.fi/~jkorpela/www/windows-chars.html

for information on the proper code to use for most of these
characters. For character 128, the Windows euro symbol, see

http://www.cs.tut.fi/~jkorpela/html/euro.html

Windows doesn't have characters for 129, 141, 143, 144, or 157. Jukka
left out the lower- and upper-case z-hacek at positions 158 and 142--I
don't know why!

--
Harlan Messinger
Remove the first dot from my e-mail address.
Veuillez ôter le premier point de mon adresse de courriel.
Jul 20 '05 #5
On Sat, 10 Apr 2004, Zenobia wrote:
How do I display character 151 (long hyphen) in XHTML (utf-8) ?
The characters between 128 and 159 decimal in the XHTML Document
Character Set (Unicode) are control characters (see e.g
http://www.unicode.org/charts/PDF/U0080.pdf ) and are excluded from
use in XHTML.

Don't confuse them with the displayable characters in some other 8-bit
character encodings.
Is there another character that will substitute?
You might find
http://www.unicode.org/Public/MAPPIN...OWS/CP1252.TXT
to be useful, but basically you need to understand the (X)HTML
character representation model first. http://www.w3.org/TR/charmod/
The W3C validation parser, http://validator.w3.org, tells me that
this character and the ones around it are illegal - then, after
resubmission it flags no errors.
There's something of significance that you're not telling us.
So, are there any illegal characters between 0 and 255 in the UTF-8
character set


There is no "UTF-8 character set". UTF-8 is an encoding scheme of the
Unicode "character set".

Certainly the control characters x80-x9F (128-150 decimal), as well as
most of the control characters x00-x1F (0-31 decimal) , of the
Document Character Set (Unicode), are excluded from use in XHTML.

In the case of other encodings, you need to refer to the cross-mapping
tables (below http://www.unicode.org/Public/MAPPINGS/ ) to find the
equivalences.
Jul 20 '05 #6
On Sat, 10 Apr 2004, Zenobia wrote:
How do I display character 151 (long hyphen) in XHTML (utf-8) ?
The characters between 128 and 159 decimal in the XHTML Document
Character Set (Unicode) are control characters (see e.g
http://www.unicode.org/charts/PDF/U0080.pdf ) and are excluded from
use in XHTML.

Don't confuse them with the displayable characters in some other 8-bit
character encodings.
Is there another character that will substitute?
You might find
http://www.unicode.org/Public/MAPPIN...OWS/CP1252.TXT
to be useful, but basically you need to understand the (X)HTML
character representation model first. http://www.w3.org/TR/charmod/
The W3C validation parser, http://validator.w3.org, tells me that
this character and the ones around it are illegal - then, after
resubmission it flags no errors.
There's something of significance that you're not telling us.
So, are there any illegal characters between 0 and 255 in the UTF-8
character set


There is no "UTF-8 character set". UTF-8 is an encoding scheme of the
Unicode "character set".

Certainly the control characters x80-x9F (128-150 decimal), as well as
most of the control characters x00-x1F (0-31 decimal) , of the
Document Character Set (Unicode), are excluded from use in XHTML.

In the case of other encodings, you need to refer to the cross-mapping
tables (below http://www.unicode.org/Public/MAPPINGS/ ) to find the
equivalences.
Jul 20 '05 #7
On Sat, 10 Apr 2004 10:43:17 +0100, Steve Pugh <st***@pugh.net > wrote:
Zenobia <5.**********@s pamgourmet.com> wrote:
How do I display character 151 (long hyphen) in XHTML (utf-8) ?


Position 151 is a control character in UTF-8, it is not a long hyphen.
There is no character called long hyphen in Unicode, maybe you're
thinking of the em dash, which is decimal 8212 in Unicode and 151 in
Windows-1252?

How are you currently trying to display the character? Are you
entering it directly or are you using —? The former can be okay
under some circumstances (i.e. if you're advertising your character
encoding as being Windows-1252 or similar and if all your audience can
cope with that encoding). The latter is just dangerous and will only
'work' because some browsers break (or at least severely bend) the
specificatio n.


Thanks. Steve. I'm using — but the encoding has been changed to UTF-8 to make it
XHTML compliant. I see I'll have to change that. It seems to me that XHTML (UTF-8)
compliant code is too restrictive for my needs. I've just found out that the same
character is called &mdash; as well.

I prefer to used the named entity convention. I can't stand the idea of having to write a
acute as a number rather than &aacute; Numbers are meaningless, especially when your
editor WYSIWYG feature doesn't display characters correctly. It becomes impossible to
understand what you've written. I expect the browsers I'm writing for to understand things
like: &aacute; and &beta; - I can. I've been using the Mathematical, Greek and Symbolic
characters for HTML shown here:
www.intuitive.com/coolweb/entities.html and here
http://www.htmlhelp.com/reference/ht...s/symbols.html

Which particular encoding is that. From what spec. did it come from? Are all these named
entities specified in ISO-10646?

These Math, Greek and Symbolic characters have been around for years. I find it
astonishing that some modern browsers still can't support them. But I have to admit I
don't care. There's no way I'm going to memorize a bunch of numbers just so that I can
read my source code. However I would like to specify the correct encoding in my documents
in future.

Are the named entities ISO 10646 or ISO 8859-1?

Jul 20 '05 #8
On Sat, 10 Apr 2004 10:43:17 +0100, Steve Pugh <st***@pugh.net > wrote:
Zenobia <5.**********@s pamgourmet.com> wrote:
How do I display character 151 (long hyphen) in XHTML (utf-8) ?


Position 151 is a control character in UTF-8, it is not a long hyphen.
There is no character called long hyphen in Unicode, maybe you're
thinking of the em dash, which is decimal 8212 in Unicode and 151 in
Windows-1252?

How are you currently trying to display the character? Are you
entering it directly or are you using —? The former can be okay
under some circumstances (i.e. if you're advertising your character
encoding as being Windows-1252 or similar and if all your audience can
cope with that encoding). The latter is just dangerous and will only
'work' because some browsers break (or at least severely bend) the
specificatio n.


Thanks. Steve. I'm using — but the encoding has been changed to UTF-8 to make it
XHTML compliant. I see I'll have to change that. It seems to me that XHTML (UTF-8)
compliant code is too restrictive for my needs. I've just found out that the same
character is called &mdash; as well.

I prefer to used the named entity convention. I can't stand the idea of having to write a
acute as a number rather than &aacute; Numbers are meaningless, especially when your
editor WYSIWYG feature doesn't display characters correctly. It becomes impossible to
understand what you've written. I expect the browsers I'm writing for to understand things
like: &aacute; and &beta; - I can. I've been using the Mathematical, Greek and Symbolic
characters for HTML shown here:
www.intuitive.com/coolweb/entities.html and here
http://www.htmlhelp.com/reference/ht...s/symbols.html

Which particular encoding is that. From what spec. did it come from? Are all these named
entities specified in ISO-10646?

These Math, Greek and Symbolic characters have been around for years. I find it
astonishing that some modern browsers still can't support them. But I have to admit I
don't care. There's no way I'm going to memorize a bunch of numbers just so that I can
read my source code. However I would like to specify the correct encoding in my documents
in future.

Are the named entities ISO 10646 or ISO 8859-1?

Jul 20 '05 #9
On Sat, 10 Apr 2004 10:43:56 +0100, "Alan J. Flavell" <fl*****@ph.gla .ac.uk> wrote:
On Sat, 10 Apr 2004, Zenobia wrote:
The W3C validation parser, http://validator.w3.org, tells me that
this character and the ones around it are illegal - then, after
resubmission it flags no errors.


There's something of significance that you're not telling us.


Thanks for your answer too - I shall be looking at all the links you've given me here.
I've snipped your reply so that my answer stands out.

Yes I think the validator flagged them as illegal when I had 5 errors in my document. I
removed one of these errors, <br /> before the </body> tag, and suddenly there were no
fatal errors (just warnings). That, at least, is how I remember it.

I've given up with XHTML compliance now for my own pages because I have no intention of
using UTF-8 as I want to use the named entity convention for funny characters. [I'm
writing scientific articles]

Jul 20 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1335
by: Mihai N. | last post by:
I have observed that there are a couple of character sets that are Very short answer: that meta tells the browser what the encoding of the web page is. If the browser gets the code page wrong, some characters will apear damaged. ISO-8859-1 can only be used for western-european languages, but even there is lacking (no copyright, trademark, smart quotes, m-dash, n-dash, etc.) (it is possible to use any character in a ISO-8859-1 page by...
0
8747
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9392
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9091
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6694
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5997
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4505
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4773
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3211
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2150
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.