represent any Unicode character by means of a markup string coded in us-ascii

>Alan J. Flavell Oct 7 2004, 1:44 pm show options

On Thu, 7 Oct 2004, Shmuel (Seymour J.) Metz wrote:
at 08:24 PM, "Alan J. Flavell" <flav...@ph.gla.ac.uk> said:
>I think you mean "multiple character encoding schemes".

Yes, although a different character set would imply a different
encoding scheme.

Absolutely not. That's the whole point!

In (X)HTML you can (if you so choose) represent any Unicode character
by means of a markup string coded in us-ascii, even. The use of other
encoding schemes is merely a convenience when the desired character
repertoire fits a particular pattern, but whichever encoding scheme
you choose, you still - in principle - have access to any other
Unicode character you need, by means of &-notation.

I could change any Unicode character to its html notation, if only I
had a way to find out the Unicode value of the characters in the string
I'm given. But given a random set of string inputs, possibly copy and
pasted from WordPerfect or Microsoft Word or BBedit on a Mac, I don't
know how to find the Unicode value of those characters.

Jul 24 '05 #1

Subscribe Reply

2279

Alan J. Flavell

On Sat, 27 May 2005, lk******@geocities.com wrote:

I could change any Unicode character to its html notation, if only I
had a way to find out the Unicode value of the characters in the
string I'm given.
What's the context here? In order to know what "characters" you have
been given, you need to know what encoding they are represented in. If
they're not an encoding of Unicode itself, then you can normally refer
to the appropriate cross-mapping table at the Unicode site to
determine the corresponding hexadecimal Unicode value. That's the
value that you'd need (converted to decimal if you so choose) in the
&#...; representation in HTML.
But given a random set of string inputs, possibly copy and pasted
from WordPerfect or Microsoft Word or BBedit on a Mac, I don't know
how to find the Unicode value of those characters.

If you're talking about forms submission, then the usual arrangement
is that the characters are submitted using the same character encoding
as the page which contains the form which they're submitted from.
For working with modern browsers, I'd normally recommend that you use
utf-8 for that. (No good with NN4.*).

http://ppewww.ph.gla.ac.uk/~flavell/...form-i18n.html

(But if you've been sent utf-8 and you're willing to store files in
utf-8 then you don't really *have* to use &#...; representation
anyway. It's your choice, really.)

You're then reliant on what the client platform actually does when
copy/pasting from another application window into the form.

That can have some unexpected glitches, since Word (especially older
versions) has a nasty habit of changing to a non-standard font e.g
Symbol and inserting a Latin letter (e.g W) to get a symbol (e.g Omega
or Ohm sign). This doesn't really work in HTML - MS of course will
fool its users by repeating the error in MSIE, but a properly
conforming www-compatible browser will display the W that the markup
asked for - not the symbol that was intended.

Jul 24 '05 #2

Similar topics

unicode and strings

by: Jacob Friis | last post by:

I'm trying to learn Python via Marks Feedparser. <snip src="http://feedparser.org/docs/character-encoding.html"> If the character encoding can not be determined, Universal Feed Parser sets the...

Python

Unicode browser support charts

by: Nancy | last post by:

I recently completed a web page, "Browser Tests of Entities in 2004". http://www.santagata.us/characters/CharacterEntities.html It shows those characters that work in all of the version 5.2+...

HTML / CSS

Will standard C++ allow me to replace a string in a unicode-encoded text file?

by: Eric Lilja | last post by:

Hello, I had what I thought was normal text-file and I needed to locate a string matching a certain pattern in that file and, if found, replace that string. I thought this would be simple but I had...

C / C++

std::string vs. Unicode UTF-8

by: Wolfgang Draxinger | last post by:

I understand that it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implicaions. E.g. you can't count the amount of characters by length() | size()....

C / C++

Unicode and VBA

by: anantvrana | last post by:

Hello All, I am trying to read Unicode (Kanji character) data from a text file. When I store unicode data into variable my Kanji character gets messed up. I am using following code Open...

Microsoft Access / VBA

unicode mess in c++

by: damjan | last post by:

This may look like a silly question to someone, but the more I try to understand Unicode the more lost I feel. To say that I am not a beginner C++ programmer, only had no need to delve into...

C / C++

unicode

by: Chameleon | last post by:

I am trying to #define this: #ifdef UNICODE_STRINGS #define UC16 L typedef wstring String; #else #define UC16 typedef string String; #endif ....

C / C++

unicode to character

by: Jason | last post by:

This is a Chinese character in unicode: 挪 I made it in Javascript by adding "&#"+"25"+"386" I need to convert it in Javascript to this: æŒª (The actual character) How do I achieve this conversion...

Javascript

Python's handling of unicode surrogates

by: Adam Olsen | last post by:

As was seen in another thread, there's a great deal of confusion with regard to surrogates. Most programmers assume Python's unicode type exposes only complete characters. Even CPython's own...

Python

printing list containing unicode string

by: Xah Lee | last post by:

If i have a nested list, where the atoms are unicode strings, e.g. # -*- coding: utf-8 -*- ttt=, ,...] print ttt how can i print it without getting the u'\u1234' notation? i.e. i want it...

Python

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp