473,386 Members | 1,793 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

GetByte adding an extra byte?

string USMesg;

USMesg = "¬Credits¬Remaining¬";

byte[] bArray = Encoding.UTF8.GetBytes(USMesg);

After I execute the code, the first two bytes are:

[0] = 0xC2
[1] = 0xAC //This is the character I was expecting

Why has an extra byte been inserted after each ¬?

I tried using Encoding.ASCII.GetBytes() but that translates my ¬ to a
0x3F (Question mark ?)

Anyone any idea whats happened and how I can get round this problem?

Regards,

Steven
*** Sent via Developersdex http://www.developersdex.com ***
Aug 22 '07 #1
7 2359
Hi,

Unicode characters use two bytes.

"Steven Blair" <st**********@btinternet.comwrote in message
news:e9****************@TK2MSFTNGP03.phx.gbl...
string USMesg;

USMesg = "¬Credits¬Remaining¬";

byte[] bArray = Encoding.UTF8.GetBytes(USMesg);

After I execute the code, the first two bytes are:

[0] = 0xC2
[1] = 0xAC //This is the character I was expecting

Why has an extra byte been inserted after each ¬?

I tried using Encoding.ASCII.GetBytes() but that translates my ¬ to a
0x3F (Question mark ?)

Anyone any idea whats happened and how I can get round this problem?

Regards,

Steven
*** Sent via Developersdex http://www.developersdex.com ***

Aug 22 '07 #2

On Aug 22, 3:27 pm, Steven Blair <steven.bl...@btinternet.comwrote:
USMesg = "¬Credits¬Remaining¬";
byte[] bArray = Encoding.UTF8.GetBytes(USMesg);
[1] = 0xAC //This is the character I was expecting
Why has an extra byte been inserted after each ¬?
This is how UTF-8 works. I assume that when/if you review the UTF-8
specifications, you will find that the character "¬" is to be
represented as 0xC2AC in this particular scenario.
I tried using Encoding.ASCII.GetBytes()
ASCII and UTF-8 are not interchangeable.
Anyone any idea whats happened and how I can get round this problem?
What is happening is that characters are being converted to bytes,
using the character encoding you specify. The process of going
between actual characters and bits is very complex.

Perhaps what you want is Encoding.Default.GetBytes()? This will use
the system default ANSI codepage (in your case Windows-1252, which
internally means ISO-8859-1 (aka "Latin-1" or "Western European")).
This might encode "¬" as 0xAC, or it might not.

However, if you want to write predictable code, you must agree with
whoever will read the bytes back upon which encoding to use.
Otherwise, when reading 0xC2AC back into a string, the reader might
get a tiny picture of a tiny goat instead of the "¬".

Aug 22 '07 #3
On Aug 22, 2:27 pm, Steven Blair <steven.bl...@btinternet.comwrote:
string USMesg;

USMesg = "¬Credits¬Remaining¬";

byte[] bArray = Encoding.UTF8.GetBytes(USMesg);

After I execute the code, the first two bytes are:

[0] = 0xC2
[1] = 0xAC //This is the character I was expecting

Why has an extra byte been inserted after each ¬?

I tried using Encoding.ASCII.GetBytes() but that translates my ¬ to a
0x3F (Question mark ?)

Anyone any idea whats happened and how I can get round this problem?
It sounds like you should read up on the UTF-8 format. Using
Encoding.Default may well give you what you want, but UTF-8 is
generally a better format these days.

See http://pobox.com/~skeet/csharp/unicode.html and the referenced
links there.

Jon

Aug 22 '07 #4

On Aug 22, 4:04 pm, "Ignacio Machin \( .NET/ C# MVP \)" <machin TA
laceupsolutions.comwrote:
Unicode characters use two bytes.
O RLY?

"Unicode" can be anything from UTF7 to UTF32. In this case, it was
UTF8. Each use different numbers of bits to represent characters.
Also, UTF8 uses anywhere between 1 and 4 (IIRC) bytes to represent a
character.

Perhaps you were thinking of "wide characters" from old-school Win32
programming?

Aug 22 '07 #5
On Aug 22, 3:04 pm, "Ignacio Machin \( .NET/ C# MVP \)" <machin TA
laceupsolutions.comwrote:
Unicode characters use two bytes.
True, but irrelevant in this case - the important thing is that the
UTF-8 encoded version of the relevant character takes two bytes.
(Other characters can take 1 or 3.)

Jon

Aug 22 '07 #6
Encoding.Default.GetBytes() does the job.

Thanks for the help.
*** Sent via Developersdex http://www.developersdex.com ***
Aug 22 '07 #7
Opps, too early in the morning and not enough coffee :)
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:11**********************@q4g2000prc.googlegro ups.com...
On Aug 22, 3:04 pm, "Ignacio Machin \( .NET/ C# MVP \)" <machin TA
laceupsolutions.comwrote:
>Unicode characters use two bytes.

True, but irrelevant in this case - the important thing is that the
UTF-8 encoded version of the relevant character takes two bytes.
(Other characters can take 1 or 3.)

Jon

Aug 22 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

53
by: Floyd | last post by:
GetByte(x, 3) should return the 3rd byte of the 32 bit integer x. Allowed operators: ! ~ & ^ | + << >> (no assignment!). Would the easiest way to do this be just creating 4 bit masks... and using...
1
by: David Union | last post by:
I am doing very simple code... in the middle of an http request, i set a filename (with path) and do a Response.WriteFile(filenamewithpath) then Response.End(). I have tried Response.Clear()...
3
by: Benny Ng | last post by:
Dear all, The following is the source. The password is encrypted and saved into the Binary in SQL2K. Now I want to create a new page to compare the existed password and the password that in the...
1
by: Rain | last post by:
if i had a line of message, how do i get the 1st two bytes? how to code this? thank you so much in advance...
10
by: Guillermo_Lopez | last post by:
Hello All, I am using VBA in access to perform some calculations. There is a particular sumation that is wrong (barely). this code is withing a loop. TDist = TDist + TempDist Both TDist...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.