Mysterious functions of text encoding....

Viorel

For me is a little bit mysterious how work encoding and decoding functions,
what is underneath of their calling?

Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1)

Encoding1.GetChars(string1);

Encoding1.GetChars(arrayofbytes1);

string1=Encoding1.GetString(arrayofbytes1);

If I know (perhaps) that a char is based on 2 bytes (16 bits)

and all Strings in C#(NET) are a set of chars

P.S. Please explain on plane of working with bytes (I come from C world)

I will appreciate

Jul 19 '05 #1

Subscribe Post Reply

3397

Michael Giagnocavo [MVP]

They read a Unicode char, and then determine the actual byte sequence
in whatever encoding you choose. For instance, if you use ASCII or
UTF-8 and pass "hi", it'll make a byte[] {0x68, 0x69}. If using
Unicode, it'd be 0x68, 0x00, 0x69, 0x00.

The same applies when you get chars or a string from bytes. It reads
the bytes and then determines what actual Unicode characters they are.

Also, there is no Encoding.GetChars(string) method (since it'd only
return chars of a string, which would always be the .NET internal
representation (Unicode)).

-mike
MVP

"Viorel" <vm*********@moldova.cc> wrote in message
news:eP*************@TK2MSFTNGP10.phx.gbl...

For me is a little bit mysterious how work encoding and decoding functions, what is underneath of their calling?

Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1)

Encoding1.GetChars(string1);

Encoding1.GetChars(arrayofbytes1);

string1=Encoding1.GetString(arrayofbytes1);

If I know (perhaps) that a char is based on 2 bytes (16 bits)

and all Strings in C#(NET) are a set of chars

P.S. Please explain on plane of working with bytes (I come from C world)
I will appreciate

Jul 19 '05 #2

Jon Skeet

Viorel <vm*********@moldova.cc> wrote:

So, if I understand correctly Encoding1.GetBytes (string1); takes content of
string1 represented in Unicode coverts (internally) the content in Encoding1
and then takes the bytes and returns to me as an array of bytes. It means
that internally it always happens conversions from Unicode to Encoding1.

And string1=Encoding1.GetString(arrayofbytes1);creates (internally) a string
in Encoding1 and then converts it to Unicode to be assigned to string1

Thus the rule(of language) of keeping all strings in Unicode is never
broken.
No.

Strings are *always* in Unicode. Encoding.GetString takes the sequence
of Unicode characters and converts them into a sequence of bytes which
represents (in the specified encoding) that sequence of characters.

For an example of how this might be done, have a look at my EBCDIC
encoding:
http://www.pobox.com/~skeet/csharp/ebcdic/

You might also find this article useful:
http://www.pobox.com/~skeet/csharp/unicode.html
My notice: First time I thought that all strings are kept in their
encoding.
Nope. Strings don't have any encoding associated with them.
I thought string1 is in Encoding1 and if
string2=Encoding2.GetString(arrayofbytes1); and Encoding1!= Encoding2
trying to assign string2 to string1 (string1=string2) it will arise an
exception. Thus it wouldn't be the need of internal (out of my view)
conversion from Encoding1 to Unicode and it would be more explicit .Am I
right?

Nope.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too

Jul 19 '05 #3

Viorel

Strings are *always* in Unicode. Encoding.GetString takes (&from where?&)
the sequence
of Unicode characters and converts them into a sequence of bytes, which
represents (in the specified encoding) that sequence of characters (&if it
can else it truncates&).

If I am not wrong GetString:

1) takes a sequence of bytes

2) creates from that sequence a string('lying' on that bytes ) based
on encoder object(which called GetString)

3) that string are converted in Unicode format(finally to be used by
me)

If these steps are not implemented in C#, could they achieve the same
result?

If no, explain me more detailed the way from bytes to string in same manner
as above

Thank you very much.

Jul 19 '05 #4

Jon Skeet

[If you could make it clearer which bit you're quoting, it would make
your posts easier to read... I've reformatted it here.]

Viorel <vm*********@moldova.cc> wrote:

Jon Skeet wrote:
Strings are *always* in Unicode. Encoding.GetString takes
from where?
From the parameter you pass it.
the sequence of Unicode characters and converts them into a sequence of
bytes, which represents (in the specified encoding) that sequence of
characters

if it can else it truncates
What do you mean by "truncates" here? It doesn't just truncate the
string or byte array. If the encoding is passed a sequence of bytes it
doesn't fully understand (eg including some bits with the top bit set
where the encoding is ASCII) I don't believe there's any particular
specified behaviour - I prefer to end up with '?' in the returned
string, myself.
If I am not wrong GetString:

1) takes a sequence of bytes
Yes.
2) creates from that sequence a string('lying' on that bytes ) based
on encoder object(which called GetString)
No. I don't know where you get this idea from.
3) that string are converted in Unicode format(finally to be used by
me)
You're wrong. There's no need for some strange middle string.
If these steps are not implemented in C#, could they achieve the same
result?
No, because a string is *always* in Unicode.
If no, explain me more detailed the way from bytes to string in same manner
as above

It depends on how the encoding implementation wants to do it - as I
said before, if you want an example implementation, look at my EBCDIC
encoding.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too

Jul 19 '05 #5

Similar topics

Mysterious 9.2.0.4 (on HP-UX) problem

by: SteveS | last post by:

Can anyone help with a mysterious problem that has arisen since 'upgrading' from 8 to 9.2.0.4? The situation is this: Queries that worked fine under 8 are now producing *really* strange...

Oracle Database

XSLT/xpath string functions

by: Peter Gerstbach | last post by:

Hi, I want to convert with XSLT/XPATH a String like "Aaa bbb ccc" with variant length into to "AaaBbbCcc". I think it should be possible with these steps: 1) tokenize the String with ' ' as...

.NET Framework

Better crypto hash functions, long, with code

by: Bryan Olson | last post by:

The current Python standard library provides two cryptographic hash functions: MD5 and SHA-1 . The authors of MD5 originally stated: It is conjectured that it is computationally infeasible to...

Python

Mysterious functions of text encoding....

by: Viorel | last post by:

For me is a little bit mysterious how work encoding and decoding functions, what is underneath of their calling? Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1) ...

.NET Framework

8 bit character string to 16 bit character string

by: Brand Bogard | last post by:

Does the C standard include a library function to convert an 8 bit character string to a 16 bit character string?

C / C++

Mysterious NullReferenceException

by: Spencer | last post by:

I am working on a program that uses System.Xml and an XML file. I have the following code in my project that returns a NullReferenceException: profileDataDoc = new XmlDocument();...

C# / C Sharp

Shared functions not accessible

by: tshad | last post by:

I have a Dll I created in VS 2000. The namespace is MyFunctions and the Class is CryptoUtil. I have a program that is using the Class but it can't access it directly. I have a class (below)...

Visual Basic .NET

mysterious unicode

by: Gerry | last post by:

I'm using pyExcelerator and xlrd to read and write data from and to two spreadsheets. I created the "read" spreadsheet by importing a text file - and I had no unicode aspirations. When I read...

Python

How to get rid of mysterious Â char

by: oveshot16 | last post by:

When I am writing the degree symbol to a text document using the streamwriter it shows up with a Mysterious Â in front of it. I know this has something to do with its encoding but I am new to...

C# / C Sharp

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General