473,466 Members | 1,382 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

represent any Unicode character by means of a markup string coded in us-ascii

>Alan J. Flavell Oct 7 2004, 1:44 pm show options
On Thu, 7 Oct 2004, Shmuel (Seymour J.) Metz wrote:
at 08:24 PM, "Alan J. Flavell" <flav...@ph.gla.ac.uk> said:
>I think you mean "multiple character encoding schemes".

Yes, although a different character set would imply a different
encoding scheme.


Absolutely not. That's the whole point!

In (X)HTML you can (if you so choose) represent any Unicode character
by means of a markup string coded in us-ascii, even. The use of other
encoding schemes is merely a convenience when the desired character
repertoire fits a particular pattern, but whichever encoding scheme
you choose, you still - in principle - have access to any other
Unicode character you need, by means of &-notation.


I could change any Unicode character to its html notation, if only I
had a way to find out the Unicode value of the characters in the string
I'm given. But given a random set of string inputs, possibly copy and
pasted from WordPerfect or Microsoft Word or BBedit on a Mac, I don't
know how to find the Unicode value of those characters.

Jul 24 '05 #1
1 2279
On Sat, 27 May 2005, lk******@geocities.com wrote:
I could change any Unicode character to its html notation, if only I
had a way to find out the Unicode value of the characters in the
string I'm given.
What's the context here? In order to know what "characters" you have
been given, you need to know what encoding they are represented in. If
they're not an encoding of Unicode itself, then you can normally refer
to the appropriate cross-mapping table at the Unicode site to
determine the corresponding hexadecimal Unicode value. That's the
value that you'd need (converted to decimal if you so choose) in the
&#...; representation in HTML.
But given a random set of string inputs, possibly copy and pasted
from WordPerfect or Microsoft Word or BBedit on a Mac, I don't know
how to find the Unicode value of those characters.


If you're talking about forms submission, then the usual arrangement
is that the characters are submitted using the same character encoding
as the page which contains the form which they're submitted from.
For working with modern browsers, I'd normally recommend that you use
utf-8 for that. (No good with NN4.*).

http://ppewww.ph.gla.ac.uk/~flavell/...form-i18n.html

(But if you've been sent utf-8 and you're willing to store files in
utf-8 then you don't really *have* to use &#...; representation
anyway. It's your choice, really.)

You're then reliant on what the client platform actually does when
copy/pasting from another application window into the form.

That can have some unexpected glitches, since Word (especially older
versions) has a nasty habit of changing to a non-standard font e.g
Symbol and inserting a Latin letter (e.g W) to get a symbol (e.g Omega
or Ohm sign). This doesn't really work in HTML - MS of course will
fool its users by repeating the error in MSIE, but a properly
conforming www-compatible browser will display the W that the markup
asked for - not the symbol that was intended.
Jul 24 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Jacob Friis | last post by:
I'm trying to learn Python via Marks Feedparser. <snip src="http://feedparser.org/docs/character-encoding.html"> If the character encoding can not be determined, Universal Feed Parser sets the...
5
by: Nancy | last post by:
I recently completed a web page, "Browser Tests of Entities in 2004". http://www.santagata.us/characters/CharacterEntities.html It shows those characters that work in all of the version 5.2+...
8
by: Eric Lilja | last post by:
Hello, I had what I thought was normal text-file and I needed to locate a string matching a certain pattern in that file and, if found, replace that string. I thought this would be simple but I had...
32
by: Wolfgang Draxinger | last post by:
I understand that it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implicaions. E.g. you can't count the amount of characters by length() | size()....
1
by: anantvrana | last post by:
Hello All, I am trying to read Unicode (Kanji character) data from a text file. When I store unicode data into variable my Kanji character gets messed up. I am using following code Open...
12
by: damjan | last post by:
This may look like a silly question to someone, but the more I try to understand Unicode the more lost I feel. To say that I am not a beginner C++ programmer, only had no need to delve into...
18
by: Chameleon | last post by:
I am trying to #define this: #ifdef UNICODE_STRINGS #define UC16 L typedef wstring String; #else #define UC16 typedef string String; #endif ....
4
by: Jason | last post by:
This is a Chinese character in unicode: 挪 I made it in Javascript by adding "&#"+"25"+"386" I need to convert it in Javascript to this: 挪 (The actual character) How do I achieve this conversion...
17
by: Adam Olsen | last post by:
As was seen in another thread, there's a great deal of confusion with regard to surrogates. Most programmers assume Python's unicode type exposes only complete characters. Even CPython's own...
5
by: Xah Lee | last post by:
If i have a nested list, where the atoms are unicode strings, e.g. # -*- coding: utf-8 -*- ttt=, ,...] print ttt how can i print it without getting the u'\u1234' notation? i.e. i want it...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.