473,795 Members | 3,041 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

represent any Unicode character by means of a markup string coded in us-ascii

>Alan J. Flavell Oct 7 2004, 1:44 pm show options
On Thu, 7 Oct 2004, Shmuel (Seymour J.) Metz wrote:
at 08:24 PM, "Alan J. Flavell" <flav...@ph.gla .ac.uk> said:
>I think you mean "multiple character encoding schemes".

Yes, although a different character set would imply a different
encoding scheme.


Absolutely not. That's the whole point!

In (X)HTML you can (if you so choose) represent any Unicode character
by means of a markup string coded in us-ascii, even. The use of other
encoding schemes is merely a convenience when the desired character
repertoire fits a particular pattern, but whichever encoding scheme
you choose, you still - in principle - have access to any other
Unicode character you need, by means of &-notation.


I could change any Unicode character to its html notation, if only I
had a way to find out the Unicode value of the characters in the string
I'm given. But given a random set of string inputs, possibly copy and
pasted from WordPerfect or Microsoft Word or BBedit on a Mac, I don't
know how to find the Unicode value of those characters.

Jul 24 '05 #1
1 2297
On Sat, 27 May 2005, lk******@geocit ies.com wrote:
I could change any Unicode character to its html notation, if only I
had a way to find out the Unicode value of the characters in the
string I'm given.
What's the context here? In order to know what "characters " you have
been given, you need to know what encoding they are represented in. If
they're not an encoding of Unicode itself, then you can normally refer
to the appropriate cross-mapping table at the Unicode site to
determine the corresponding hexadecimal Unicode value. That's the
value that you'd need (converted to decimal if you so choose) in the
&#...; representation in HTML.
But given a random set of string inputs, possibly copy and pasted
from WordPerfect or Microsoft Word or BBedit on a Mac, I don't know
how to find the Unicode value of those characters.


If you're talking about forms submission, then the usual arrangement
is that the characters are submitted using the same character encoding
as the page which contains the form which they're submitted from.
For working with modern browsers, I'd normally recommend that you use
utf-8 for that. (No good with NN4.*).

http://ppewww.ph.gla.ac.uk/~flavell/...form-i18n.html

(But if you've been sent utf-8 and you're willing to store files in
utf-8 then you don't really *have* to use &#...; representation
anyway. It's your choice, really.)

You're then reliant on what the client platform actually does when
copy/pasting from another application window into the form.

That can have some unexpected glitches, since Word (especially older
versions) has a nasty habit of changing to a non-standard font e.g
Symbol and inserting a Latin letter (e.g W) to get a symbol (e.g Omega
or Ohm sign). This doesn't really work in HTML - MS of course will
fool its users by repeating the error in MSIE, but a properly
conforming www-compatible browser will display the W that the markup
asked for - not the symbol that was intended.
Jul 24 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1363
by: Jacob Friis | last post by:
I'm trying to learn Python via Marks Feedparser. <snip src="http://feedparser.org/docs/character-encoding.html"> If the character encoding can not be determined, Universal Feed Parser sets the bozo bit to 1 and sets bozo_exception to feedparser.CharacterEncodingUnknown. In this case, parsed values will be strings, not Unicode strings. </snip> I guess this means that all data will be unicode, and to put in a
5
4418
by: Nancy | last post by:
I recently completed a web page, "Browser Tests of Entities in 2004". http://www.santagata.us/characters/CharacterEntities.html It shows those characters that work in all of the version 5.2+ browsers that were tested and those that only work in some of them. Take a look, maybe you'll consider it useful. This is not my field (I'm an architect - you know the house construction kind), so if you notice any inaccuracies I'd appreciate a...
8
4070
by: Eric Lilja | last post by:
Hello, I had what I thought was normal text-file and I needed to locate a string matching a certain pattern in that file and, if found, replace that string. I thought this would be simple but I had problems getting my algorithm to work and in order to help me find the solution I decided to print each line to screen as I read them. Then, to my surprise, I noticed that there was a space between every character as I outputted the lines to the...
32
49732
by: Wolfgang Draxinger | last post by:
I understand that it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implicaions. E.g. you can't count the amount of characters by length() | size(). Instead one has to iterate through the string, parse all UTF-8 multibytes and count each multibyte as one character. To address this problem the GTKmm bindings for the GTK+ toolkit have implemented a own string class Glib::ustring...
1
17543
by: anantvrana | last post by:
Hello All, I am trying to read Unicode (Kanji character) data from a text file. When I store unicode data into variable my Kanji character gets messed up. I am using following code Open File1 For Input Access Read As #1 While Not EOF(1)
12
3047
by: damjan | last post by:
This may look like a silly question to someone, but the more I try to understand Unicode the more lost I feel. To say that I am not a beginner C++ programmer, only had no need to delve into character encoding intricacies before. In c/c++, the unicode characters are introduced by the means of wchar_t type. Based on the presence of _UNICODE definition C functions are macro'd to either the normal version or the one prefixed with w. Because...
18
620
by: Chameleon | last post by:
I am trying to #define this: #ifdef UNICODE_STRINGS #define UC16 L typedef wstring String; #else #define UC16 typedef string String; #endif ....
4
2736
by: Jason | last post by:
This is a Chinese character in unicode: 挪 I made it in Javascript by adding "&#"+"25"+"386" I need to convert it in Javascript to this: 挪 (The actual character) How do I achieve this conversion in Javascript? Jas
17
4533
by: Adam Olsen | last post by:
As was seen in another thread, there's a great deal of confusion with regard to surrogates. Most programmers assume Python's unicode type exposes only complete characters. Even CPython's own functions do this on occasion. This leads to different behaviour across platforms and makes it unnecessarily difficult to properly support all languages. To solve this I propose Python's unicode type using UTF-16 should have gaps in its index,...
5
6922
by: Xah Lee | last post by:
If i have a nested list, where the atoms are unicode strings, e.g. # -*- coding: utf-8 -*- ttt=, ,...] print ttt how can i print it without getting the u'\u1234' notation? i.e. i want it print just like this: , ...] I can of course write a loop then for each string use
0
10438
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10214
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10164
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7540
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5437
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5563
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4113
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3727
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2920
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.