By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,949 Members | 2,038 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,949 IT Pros & Developers. It's quick & easy.

Posting Unicode Form Values

P: n/a
A really weird thing (to me, anyway) I've encountered is in a UTF-8 test
script. Here, the input - a single two-byte Cyrillic character (as reported
by Javascript in the originating form) is posted to the receiving script,
where IIS or IE has expanded that to a 4-byte field. -- while the display of
that character is correct.

Can someone pls explain that? What encoding is the latter?

AS
Jul 19 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
"Arnold Shore" <do**@bother.me> wrote in message news:<Oz*************@TK2MSFTNGP10.phx.gbl>...
A really weird thing (to me, anyway) I've encountered is in a UTF-8 test
script. Here, the input - a single two-byte Cyrillic character (as reported
by Javascript in the originating form) is posted to the receiving script,
where IIS or IE has expanded that to a 4-byte field. -- while the display of
that character is correct.

Can someone pls explain that? What encoding is the latter?

Are you sure it's 4-byte? It's usually 6 bytes. For example,
Russian small 'd' in UTF-8 is 2-byte thing 0xD0B4.
What browser sends from a form is URL-encoding
( http://www.blooberry.com/indexdot/ht...rlencoding.htm )
of the above:
%D0%B4 - each byte as 3 ASCII symbols, 6 alltogether.
You can see it your self on my test pages:
a) URL-encoded data on single-byte Cyrillic windows-1251 page:
http://ourworld.compuserve.com/homep...or/inp1251.htm
b) URL-encoded data (same Russian letters for example) on UTF-8 page:
http://ourworld.compuserve.com/homep...r/utf8euro.htm
--
Regards,
Paul Gorodyansky
"Cyrillic (Russian): instructions for Windows and Internet":
http://ourworld.compuserve.com/homepages/PaulGor/

AS

Jul 19 '05 #2

P: n/a
Paul, thanks heaps; That's a big help. To answer, LenB(string) reports the
value as length 4. BUT:

I've looked at both the Cyrillic small YU and the small YA, and within the 4
bytes that I see the hex values are identical - while they should differ by
1. So I expect you're right re the length.
1. So how do I get at the length? (in ASP/VBScript)
2. And how do I decode the bytes into a value I can use further?
3. I don't see a "Russian small 'd' ", and 0xB4 is 180 - which appears
where?

I've been to your pages, but I 404 when I submit a Russian character. Some
temporary problem, I hope?

Thanks again, Paul. It's really appreciated.

AS
Are you sure it's 4-byte? It's usually 6 bytes. For example,
Russian small 'd' in UTF-8 is 2-byte thing 0xD0B4.
What browser sends from a form is URL-encoding
( http://www.blooberry.com/indexdot/ht...rlencoding.htm )
of the above:
%D0%B4 - each byte as 3 ASCII symbols, 6 alltogether.
You can see it your self on my test pages:
a) URL-encoded data on single-byte Cyrillic windows-1251 page:
http://ourworld.compuserve.com/homep...or/inp1251.htm
b) URL-encoded data (same Russian letters for example) on UTF-8 page:
http://ourworld.compuserve.com/homep...r/utf8euro.htm
--
Regards,
Paul Gorodyansky
"Cyrillic (Russian): instructions for Windows and Internet":
http://ourworld.compuserve.com/homepages/PaulGor/

AS



Jul 19 '05 #3

P: n/a
Pls disregard my prior posting; it's plain wrong. I'll get a night's sleep
and post something that's coherent. Sorry, all.

AS
Jul 19 '05 #4

P: n/a
Hello!

"Arnold Shore" <do**@bother.me> wrote in message news:<ey**************@TK2MSFTNGP11.phx.gbl>...
Paul, thanks heaps; That's a big help. To answer, LenB(string) reports the
value as length 4.
Strange... May be then ASP does automatic URL-DEcoding. But then
why it's 4?
You probably should look at the famous "ASP Internationalization"
article by M.Kaplan that describes how to use non-Western encoding
data there:
http://msdn.microsoft.com/msdnmag/issues/0700/localize/

BUT:

I've looked at both the Cyrillic small YU and the small YA, and within the 4
bytes that I see the hex values are identical - while they should differ by
1. So I expect you're right re the length.
No, now I don't think I am right - you don't have 6.
Also, if you see the same hex values then it may be a corruption -
each Russian letters got replaces by some symbol when something
went wrong. I should've given you the link to M.Kaplan's article
1st time...
1. So how do I get at the length? (in ASP/VBScript)
I don't know - I never worked with ASP/DBCScript. But I do know
how browser performs Form submission and it's what I wrote
1st time.
2. And how do I decode the bytes into a value I can use further?
3. I don't see a "Russian small 'd' ", and 0xB4 is 180 - which appears
where?

I've been to your pages, but I 404 when I submit a Russian character. Some
temporary problem, I hope?


No, it's 'by design' - I cannot have any server-side code with my
ISP, so 0 as I wrote there - I just let a data submitted from a form
be visible in _address line_ - URL-encoded strings such as
%D0%B4 on UTF-8 page for small Russian 'd' - don't pay attention to
404 (which just means that there is no Receiving software on my
server - which is true!), just look at Address Bar - the results are
there instead of being sent to server-side code.

--
Regards,
Paul Gorodyansky
"Cyrillic (Russian): instructions for Windows and Internet":
http://ourworld.compuserve.com/homepages/PaulGor/
Jul 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.