473,382 Members | 1,423 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Mysterious functions of text encoding....

For me is a little bit mysterious how work encoding and decoding functions,
what is underneath of their calling?

Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1)

Encoding1.GetChars(string1);

Encoding1.GetChars(arrayofbytes1);

string1=Encoding1.GetString(arrayofbytes1);

If I know (perhaps) that a char is based on 2 bytes (16 bits)

and all Strings in C#(NET) are a set of chars

P.S. Please explain on plane of working with bytes (I come from C world)

I will appreciate


Jul 19 '05 #1
4 3397
They read a Unicode char, and then determine the actual byte sequence
in whatever encoding you choose. For instance, if you use ASCII or
UTF-8 and pass "hi", it'll make a byte[] {0x68, 0x69}. If using
Unicode, it'd be 0x68, 0x00, 0x69, 0x00.

The same applies when you get chars or a string from bytes. It reads
the bytes and then determines what actual Unicode characters they are.

Also, there is no Encoding.GetChars(string) method (since it'd only
return chars of a string, which would always be the .NET internal
representation (Unicode)).

-mike
MVP

"Viorel" <vm*********@moldova.cc> wrote in message
news:eP*************@TK2MSFTNGP10.phx.gbl...
For me is a little bit mysterious how work encoding and decoding functions, what is underneath of their calling?

Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1)

Encoding1.GetChars(string1);

Encoding1.GetChars(arrayofbytes1);

string1=Encoding1.GetString(arrayofbytes1);

If I know (perhaps) that a char is based on 2 bytes (16 bits)

and all Strings in C#(NET) are a set of chars

P.S. Please explain on plane of working with bytes (I come from C world)
I will appreciate

Jul 19 '05 #2
Viorel <vm*********@moldova.cc> wrote:
So, if I understand correctly Encoding1.GetBytes (string1); takes content of
string1 represented in Unicode coverts (internally) the content in Encoding1
and then takes the bytes and returns to me as an array of bytes. It means
that internally it always happens conversions from Unicode to Encoding1.

And string1=Encoding1.GetString(arrayofbytes1);creates (internally) a string
in Encoding1 and then converts it to Unicode to be assigned to string1

Thus the rule(of language) of keeping all strings in Unicode is never
broken.
No.

Strings are *always* in Unicode. Encoding.GetString takes the sequence
of Unicode characters and converts them into a sequence of bytes which
represents (in the specified encoding) that sequence of characters.

For an example of how this might be done, have a look at my EBCDIC
encoding:
http://www.pobox.com/~skeet/csharp/ebcdic/

You might also find this article useful:
http://www.pobox.com/~skeet/csharp/unicode.html
My notice: First time I thought that all strings are kept in their
encoding.
Nope. Strings don't have any encoding associated with them.
I thought string1 is in Encoding1 and if
string2=Encoding2.GetString(arrayofbytes1); and Encoding1!= Encoding2
trying to assign string2 to string1 (string1=string2) it will arise an
exception. Thus it wouldn't be the need of internal (out of my view)
conversion from Encoding1 to Unicode and it would be more explicit .Am I
right?


Nope.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Jul 19 '05 #3
Strings are *always* in Unicode. Encoding.GetString takes (&from where?&)
the sequence
of Unicode characters and converts them into a sequence of bytes, which
represents (in the specified encoding) that sequence of characters (&if it
can else it truncates&).

If I am not wrong GetString:

1) takes a sequence of bytes

2) creates from that sequence a string('lying' on that bytes ) based
on encoder object(which called GetString)

3) that string are converted in Unicode format(finally to be used by
me)

If these steps are not implemented in C#, could they achieve the same
result?

If no, explain me more detailed the way from bytes to string in same manner
as above

Thank you very much.

Jul 19 '05 #4
[If you could make it clearer which bit you're quoting, it would make
your posts easier to read... I've reformatted it here.]

Viorel <vm*********@moldova.cc> wrote:
Jon Skeet wrote:
Strings are *always* in Unicode. Encoding.GetString takes
from where?
From the parameter you pass it.
the sequence of Unicode characters and converts them into a sequence of
bytes, which represents (in the specified encoding) that sequence of
characters

if it can else it truncates
What do you mean by "truncates" here? It doesn't just truncate the
string or byte array. If the encoding is passed a sequence of bytes it
doesn't fully understand (eg including some bits with the top bit set
where the encoding is ASCII) I don't believe there's any particular
specified behaviour - I prefer to end up with '?' in the returned
string, myself.
If I am not wrong GetString:

1) takes a sequence of bytes
Yes.
2) creates from that sequence a string('lying' on that bytes ) based
on encoder object(which called GetString)
No. I don't know where you get this idea from.
3) that string are converted in Unicode format(finally to be used by
me)
You're wrong. There's no need for some strange middle string.
If these steps are not implemented in C#, could they achieve the same
result?
No, because a string is *always* in Unicode.
If no, explain me more detailed the way from bytes to string in same manner
as above


It depends on how the encoding implementation wants to do it - as I
said before, if you want an example implementation, look at my EBCDIC
encoding.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Jul 19 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: SteveS | last post by:
Can anyone help with a mysterious problem that has arisen since 'upgrading' from 8 to 9.2.0.4? The situation is this: Queries that worked fine under 8 are now producing *really* strange...
2
by: Peter Gerstbach | last post by:
Hi, I want to convert with XSLT/XPATH a String like "Aaa bbb ccc" with variant length into to "AaaBbbCcc". I think it should be possible with these steps: 1) tokenize the String with ' ' as...
2
by: Bryan Olson | last post by:
The current Python standard library provides two cryptographic hash functions: MD5 and SHA-1 . The authors of MD5 originally stated: It is conjectured that it is computationally infeasible to...
8
by: Viorel | last post by:
For me is a little bit mysterious how work encoding and decoding functions, what is underneath of their calling? Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1) ...
8
by: Brand Bogard | last post by:
Does the C standard include a library function to convert an 8 bit character string to a 16 bit character string?
11
by: Spencer | last post by:
I am working on a program that uses System.Xml and an XML file. I have the following code in my project that returns a NullReferenceException: profileDataDoc = new XmlDocument();...
10
by: tshad | last post by:
I have a Dll I created in VS 2000. The namespace is MyFunctions and the Class is CryptoUtil. I have a program that is using the Class but it can't access it directly. I have a class (below)...
9
by: Gerry | last post by:
I'm using pyExcelerator and xlrd to read and write data from and to two spreadsheets. I created the "read" spreadsheet by importing a text file - and I had no unicode aspirations. When I read...
4
by: oveshot16 | last post by:
When I am writing the degree symbol to a text document using the streamwriter it shows up with a Mysterious  in front of it. I know this has something to do with its encoding but I am new to...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.