473,396 Members | 1,812 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Telling Unicode and real & characters apart.

Hi there. I've written a simple program that makes a simple GET form
with a text input box and displays $_GET["foo"] when submitted.

Using Windows Character Map, I pasted in the Cyrillic capital "Ya" (the
backward R) and it came out as "Я". So far so good.

Then I sent in "[R] Я" (The [R] is the Cyrillic character again.)

That came out as "Я Я". How can I please tell the
difference between the Cyrillic and the character sequence '&', '#',
etc...?

It seems to me that the '&' character should be transformed into
"&" just like the Cyrillic characters. Perhaps I have misunderstood
something along the way.

LGK.

Sep 9 '05 #1
2 1784
Louise GK wrote:
Then I sent in "[R] Я" (The [R] is the Cyrillic character again.)

That came out as "Я Я". How can I please tell the
difference between the Cyrillic and the character sequence '&', '#',
etc...?


http://ppewww.ph.gla.ac.uk/~flavell/...form-i18n.html

The recommendation seems to be to UTF-8-ise.

--
Jock
Sep 9 '05 #2
On 9 Sep 2005 14:59:21 -0700, "Louise GK" <lo******@gmail.com> wrote:
Hi there. I've written a simple program that makes a simple GET form
with a text input box and displays $_GET["foo"] when submitted.

Using Windows Character Map, I pasted in the Cyrillic capital "Ya" (the
backward R) and it came out as "Я". So far so good.

Then I sent in "[R] Я" (The [R] is the Cyrillic character again.)

That came out as "Я Я". How can I please tell the
difference between the Cyrillic and the character sequence '&', '#',
etc...?

It seems to me that the '&' character should be transformed into
"&amp;" just like the Cyrillic characters. Perhaps I have misunderstood
something along the way.


What encoding is the page with the form in?

Some browsers will, if the page is in an encoding that does not contain the
character being pasted in, convert the character to an HTML character entity -
this is then indistinguishable from pasting the character entitity itself in.

Try the code below (filename: form_encoding.php), pasting a Ya followed by the
literal text "Я" into the input box.

Note what happens when you switch page encodings and resubmit the text;
iso-8859-15 doesn't contain a Ya, so the browser tries to make the best of an
impossible situation and sends the HTML character entity representation
instead.

The other two encodings, utf-8 and iso-8859-5 do contain Ya, so you get the
correct behaviour, i.e a Ya, and the text of the HTML entity.

<?php
$encoding = isset($_GET['encoding']) ? $_GET['encoding'] : 'iso-8859-15';
header("Content-type: text/html; charset=$encoding");
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>form encoding</title>
</head>
<body>
<form method="get" action="form_encoding.php">
<input type="radio" name="encoding" value="iso-8859-15"
id="encoding-iso-8859-15">
<label for="encoding-iso-8859-15">iso-8859-15 (Western European)</label><br>

<input type="radio" name="encoding" value="utf-8" id="encoding-utf-8">
<label for="encoding-utf-8">utf-8 (Unicode)</label><br>

<input type="radio" name="encoding" value="iso-8859-5"
id="encoding-iso-8859-5">
<label for="encoding-iso-8859-5">iso-8859-5 (Cyrillic)</label><br>

<input type="submit" value="Set Encoding">
</form>

<p>Encoding: <?php print $encoding; ?></p>

<form method="get" action="form_encoding.php">
<input type="hidden" name="encoding" value="<?php print
htmlspecialchars($encoding);?>"><br>
<input type="text" name="input">
<input type="submit">
</form>
<?php
if (isset($_GET['input']))
{
print htmlspecialchars($_GET['input'], ENT_QUOTES, $encoding);
}
?>
</body>
</html>

--
Andy Hassall :: an**@andyh.co.uk :: http://www.andyh.co.uk
http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool
Sep 9 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Markus Ernst | last post by:
Hi I have a string such as Добро" that shows the cyrillic word "?????" in the browser. Now I played around with lots of examples and contributed functions in the manual...
8
by: Auric__ | last post by:
I need to copy some Unicode text to the Clipboard, but it ain't happening. :/ If there's no equivalent in VB, what's the API call? -- auric "underscore" "underscore" "at" hotmail "dot" com...
1
by: Klaubator | last post by:
Hi, A simple task is driving me crazy, just can figure out how to programatically write unicode characters to a SVG (XML) document. With an editor it is easy to write Unicode characters like...
27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
4
by: Basil | last post by:
Hello. I have compiler BC Builder 6.0. I have an example: #include <strstrea.h> int main () { wchar_t ff = {' s','d ', 'f', 'g', 't'};
6
by: Dennis Gearon | last post by:
This is what has to be eventually done:(as sybase, and probably others do it) http://www.ianywhere.com/whitepapers/unicode.html I'm not sure how that will affect LIKE and REGEX. ...
6
by: archana | last post by:
Hi all, can someone tell me difference between unicode and utf 8 or utf 18 and which one is supporting more character set. whic i should use to support character ucs-2. I want to use ucs-2...
2
by: Frantic | last post by:
I'm working on a list of japaneese entities that contain the entity, the unicode hexadecimal code and the xml/sgml entity used for that entity. A unicode document is read into the program, then the...
4
by: Tom Fields | last post by:
Hello! I like to use the XmlTextWriter to write some SVG files. But in some cases, I need the '&' as '&' and not as &amp;. Example: <glyph unicode="&#x4c;"/> Some code-snippet:
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.