473,216 Members | 1,584 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,216 software developers and data experts.

Telling Unicode and real & characters apart.

Hi there. I've written a simple program that makes a simple GET form
with a text input box and displays $_GET["foo"] when submitted.

Using Windows Character Map, I pasted in the Cyrillic capital "Ya" (the
backward R) and it came out as "Я". So far so good.

Then I sent in "[R] Я" (The [R] is the Cyrillic character again.)

That came out as "Я Я". How can I please tell the
difference between the Cyrillic and the character sequence '&', '#',
etc...?

It seems to me that the '&' character should be transformed into
"&" just like the Cyrillic characters. Perhaps I have misunderstood
something along the way.

LGK.

Sep 9 '05 #1
2 1776
Louise GK wrote:
Then I sent in "[R] Я" (The [R] is the Cyrillic character again.)

That came out as "Я Я". How can I please tell the
difference between the Cyrillic and the character sequence '&', '#',
etc...?


http://ppewww.ph.gla.ac.uk/~flavell/...form-i18n.html

The recommendation seems to be to UTF-8-ise.

--
Jock
Sep 9 '05 #2
On 9 Sep 2005 14:59:21 -0700, "Louise GK" <lo******@gmail.com> wrote:
Hi there. I've written a simple program that makes a simple GET form
with a text input box and displays $_GET["foo"] when submitted.

Using Windows Character Map, I pasted in the Cyrillic capital "Ya" (the
backward R) and it came out as "Я". So far so good.

Then I sent in "[R] Я" (The [R] is the Cyrillic character again.)

That came out as "Я Я". How can I please tell the
difference between the Cyrillic and the character sequence '&', '#',
etc...?

It seems to me that the '&' character should be transformed into
"&amp;" just like the Cyrillic characters. Perhaps I have misunderstood
something along the way.


What encoding is the page with the form in?

Some browsers will, if the page is in an encoding that does not contain the
character being pasted in, convert the character to an HTML character entity -
this is then indistinguishable from pasting the character entitity itself in.

Try the code below (filename: form_encoding.php), pasting a Ya followed by the
literal text "Я" into the input box.

Note what happens when you switch page encodings and resubmit the text;
iso-8859-15 doesn't contain a Ya, so the browser tries to make the best of an
impossible situation and sends the HTML character entity representation
instead.

The other two encodings, utf-8 and iso-8859-5 do contain Ya, so you get the
correct behaviour, i.e a Ya, and the text of the HTML entity.

<?php
$encoding = isset($_GET['encoding']) ? $_GET['encoding'] : 'iso-8859-15';
header("Content-type: text/html; charset=$encoding");
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>form encoding</title>
</head>
<body>
<form method="get" action="form_encoding.php">
<input type="radio" name="encoding" value="iso-8859-15"
id="encoding-iso-8859-15">
<label for="encoding-iso-8859-15">iso-8859-15 (Western European)</label><br>

<input type="radio" name="encoding" value="utf-8" id="encoding-utf-8">
<label for="encoding-utf-8">utf-8 (Unicode)</label><br>

<input type="radio" name="encoding" value="iso-8859-5"
id="encoding-iso-8859-5">
<label for="encoding-iso-8859-5">iso-8859-5 (Cyrillic)</label><br>

<input type="submit" value="Set Encoding">
</form>

<p>Encoding: <?php print $encoding; ?></p>

<form method="get" action="form_encoding.php">
<input type="hidden" name="encoding" value="<?php print
htmlspecialchars($encoding);?>"><br>
<input type="text" name="input">
<input type="submit">
</form>
<?php
if (isset($_GET['input']))
{
print htmlspecialchars($_GET['input'], ENT_QUOTES, $encoding);
}
?>
</body>
</html>

--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Sep 9 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Markus Ernst | last post by:
Hi I have a string such as Добро" that shows the cyrillic word "?????" in the browser. Now I played around with lots of examples and contributed functions in the manual...
8
by: Auric__ | last post by:
I need to copy some Unicode text to the Clipboard, but it ain't happening. :/ If there's no equivalent in VB, what's the API call? -- auric "underscore" "underscore" "at" hotmail "dot" com...
1
by: Klaubator | last post by:
Hi, A simple task is driving me crazy, just can figure out how to programatically write unicode characters to a SVG (XML) document. With an editor it is easy to write Unicode characters like...
27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
4
by: Basil | last post by:
Hello. I have compiler BC Builder 6.0. I have an example: #include <strstrea.h> int main () { wchar_t ff = {' s','d ', 'f', 'g', 't'};
6
by: Dennis Gearon | last post by:
This is what has to be eventually done:(as sybase, and probably others do it) http://www.ianywhere.com/whitepapers/unicode.html I'm not sure how that will affect LIKE and REGEX. ...
6
by: archana | last post by:
Hi all, can someone tell me difference between unicode and utf 8 or utf 18 and which one is supporting more character set. whic i should use to support character ucs-2. I want to use ucs-2...
2
by: Frantic | last post by:
I'm working on a list of japaneese entities that contain the entity, the unicode hexadecimal code and the xml/sgml entity used for that entity. A unicode document is read into the program, then the...
4
by: Tom Fields | last post by:
Hello! I like to use the XmlTextWriter to write some SVG files. But in some cases, I need the '&' as '&' and not as &amp;. Example: <glyph unicode="&#x4c;"/> Some code-snippet:
0
by: veera ravala | last post by:
ServiceNow is a powerful cloud-based platform that offers a wide range of services to help organizations manage their workflows, operations, and IT services more efficiently. At its core, ServiceNow...
0
by: VivesProcSPL | last post by:
Obviously, one of the original purposes of SQL is to make data query processing easy. The language uses many English-like terms and syntax in an effort to make it easy to learn, particularly for...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 3 Jan 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). For other local times, please check World Time Buddy In...
0
by: jianzs | last post by:
Introduction Cloud-native applications are conventionally identified as those designed and nurtured on cloud infrastructure. Such applications, rooted in cloud technologies, skillfully benefit from...
0
by: mar23 | last post by:
Here's the situation. I have a form called frmDiceInventory with subform called subfrmDice. The subform's control source is linked to a query called qryDiceInventory. I've been trying to pick up the...
0
by: abbasky | last post by:
### Vandf component communication method one: data sharing ​ Vandf components can achieve data exchange through data sharing, state sharing, events, and other methods. Vandf's data exchange method...
0
by: fareedcanada | last post by:
Hello I am trying to split number on their count. suppose i have 121314151617 (12cnt) then number should be split like 12,13,14,15,16,17 and if 11314151617 (11cnt) then should be split like...
0
by: stefan129 | last post by:
Hey forum members, I'm exploring options for SSL certificates for multiple domains. Has anyone had experience with multi-domain SSL certificates? Any recommendations on reliable providers or specific...
0
Git
by: egorbl4 | last post by:
Скачал я git, хотел начать настройку, а там вылезло вот это Что это? Что мне с этим делать? ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.