By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,154 Members | 1,043 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,154 IT Pros & Developers. It's quick & easy.

Mysterious functions of text encoding....

P: n/a
For me is a little bit mysterious how work encoding and decoding functions,
what is underneath of their calling?

Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1)

Encoding1.GetChars(string1);

Encoding1.GetChars(arrayofbytes1);

string1=Encoding1.GetString(arrayofbytes1);

If I know (perhaps) that a char is based on 2 bytes (16 bits)

and all Strings in C#(NET) are a set of chars

P.S. Please explain on plane of working with bytes (I come from C world)

I will appreciate


Jul 19 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
They read a Unicode char, and then determine the actual byte sequence
in whatever encoding you choose. For instance, if you use ASCII or
UTF-8 and pass "hi", it'll make a byte[] {0x68, 0x69}. If using
Unicode, it'd be 0x68, 0x00, 0x69, 0x00.

The same applies when you get chars or a string from bytes. It reads
the bytes and then determines what actual Unicode characters they are.

Also, there is no Encoding.GetChars(string) method (since it'd only
return chars of a string, which would always be the .NET internal
representation (Unicode)).

-mike
MVP

"Viorel" <vm*********@moldova.cc> wrote in message
news:eP*************@TK2MSFTNGP10.phx.gbl...
For me is a little bit mysterious how work encoding and decoding functions, what is underneath of their calling?

Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1)

Encoding1.GetChars(string1);

Encoding1.GetChars(arrayofbytes1);

string1=Encoding1.GetString(arrayofbytes1);

If I know (perhaps) that a char is based on 2 bytes (16 bits)

and all Strings in C#(NET) are a set of chars

P.S. Please explain on plane of working with bytes (I come from C world)
I will appreciate

Jul 19 '05 #2

P: n/a
Viorel <vm*********@moldova.cc> wrote:
So, if I understand correctly Encoding1.GetBytes (string1); takes content of
string1 represented in Unicode coverts (internally) the content in Encoding1
and then takes the bytes and returns to me as an array of bytes. It means
that internally it always happens conversions from Unicode to Encoding1.

And string1=Encoding1.GetString(arrayofbytes1);creates (internally) a string
in Encoding1 and then converts it to Unicode to be assigned to string1

Thus the rule(of language) of keeping all strings in Unicode is never
broken.
No.

Strings are *always* in Unicode. Encoding.GetString takes the sequence
of Unicode characters and converts them into a sequence of bytes which
represents (in the specified encoding) that sequence of characters.

For an example of how this might be done, have a look at my EBCDIC
encoding:
http://www.pobox.com/~skeet/csharp/ebcdic/

You might also find this article useful:
http://www.pobox.com/~skeet/csharp/unicode.html
My notice: First time I thought that all strings are kept in their
encoding.
Nope. Strings don't have any encoding associated with them.
I thought string1 is in Encoding1 and if
string2=Encoding2.GetString(arrayofbytes1); and Encoding1!= Encoding2
trying to assign string2 to string1 (string1=string2) it will arise an
exception. Thus it wouldn't be the need of internal (out of my view)
conversion from Encoding1 to Unicode and it would be more explicit .Am I
right?


Nope.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Jul 19 '05 #3

P: n/a
Strings are *always* in Unicode. Encoding.GetString takes (&from where?&)
the sequence
of Unicode characters and converts them into a sequence of bytes, which
represents (in the specified encoding) that sequence of characters (&if it
can else it truncates&).

If I am not wrong GetString:

1) takes a sequence of bytes

2) creates from that sequence a string('lying' on that bytes ) based
on encoder object(which called GetString)

3) that string are converted in Unicode format(finally to be used by
me)

If these steps are not implemented in C#, could they achieve the same
result?

If no, explain me more detailed the way from bytes to string in same manner
as above

Thank you very much.

Jul 19 '05 #4

P: n/a
[If you could make it clearer which bit you're quoting, it would make
your posts easier to read... I've reformatted it here.]

Viorel <vm*********@moldova.cc> wrote:
Jon Skeet wrote:
Strings are *always* in Unicode. Encoding.GetString takes
from where?
From the parameter you pass it.
the sequence of Unicode characters and converts them into a sequence of
bytes, which represents (in the specified encoding) that sequence of
characters

if it can else it truncates
What do you mean by "truncates" here? It doesn't just truncate the
string or byte array. If the encoding is passed a sequence of bytes it
doesn't fully understand (eg including some bits with the top bit set
where the encoding is ASCII) I don't believe there's any particular
specified behaviour - I prefer to end up with '?' in the returned
string, myself.
If I am not wrong GetString:

1) takes a sequence of bytes
Yes.
2) creates from that sequence a string('lying' on that bytes ) based
on encoder object(which called GetString)
No. I don't know where you get this idea from.
3) that string are converted in Unicode format(finally to be used by
me)
You're wrong. There's no need for some strange middle string.
If these steps are not implemented in C#, could they achieve the same
result?
No, because a string is *always* in Unicode.
If no, explain me more detailed the way from bytes to string in same manner
as above


It depends on how the encoding implementation wants to do it - as I
said before, if you want an example implementation, look at my EBCDIC
encoding.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Jul 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.