GetByte adding an extra byte?

Steven Blair

string USMesg;

USMesg = "¬Credits¬Remaining¬";

byte[] bArray = Encoding.UTF8.GetBytes(USMesg);

After I execute the code, the first two bytes are:

[0] = 0xC2
[1] = 0xAC //This is the character I was expecting

Why has an extra byte been inserted after each ¬?

I tried using Encoding.ASCII.GetBytes() but that translates my ¬ to a
0x3F (Question mark ?)

Anyone any idea whats happened and how I can get round this problem?

Regards,

Steven
*** Sent via Developersdex http://www.developersdex.com ***

Aug 22 '07 #1

Subscribe Post Reply

2359

Ignacio Machin $ .NET/ C# MVP $

Hi,

Unicode characters use two bytes.

"Steven Blair" <st**********@btinternet.comwrote in message
news:e9****************@TK2MSFTNGP03.phx.gbl...

string USMesg;

USMesg = "¬Credits¬Remaining¬";

byte[] bArray = Encoding.UTF8.GetBytes(USMesg);

After I execute the code, the first two bytes are:

[0] = 0xC2
[1] = 0xAC //This is the character I was expecting

Why has an extra byte been inserted after each ¬?

I tried using Encoding.ASCII.GetBytes() but that translates my ¬ to a
0x3F (Question mark ?)

Anyone any idea whats happened and how I can get round this problem?

Regards,

Steven
*** Sent via Developersdex http://www.developersdex.com ***

Aug 22 '07 #2

UL-Tomten

On Aug 22, 3:27 pm, Steven Blair <steven.bl...@btinternet.comwrote:

USMesg = "¬Credits¬Remaining¬";
byte[] bArray = Encoding.UTF8.GetBytes(USMesg);
[1] = 0xAC //This is the character I was expecting
Why has an extra byte been inserted after each ¬?

This is how UTF-8 works. I assume that when/if you review the UTF-8
specifications, you will find that the character "¬" is to be
represented as 0xC2AC in this particular scenario.

I tried using Encoding.ASCII.GetBytes()

ASCII and UTF-8 are not interchangeable.

Anyone any idea whats happened and how I can get round this problem?

What is happening is that characters are being converted to bytes,
using the character encoding you specify. The process of going
between actual characters and bits is very complex.

Perhaps what you want is Encoding.Default.GetBytes()? This will use
the system default ANSI codepage (in your case Windows-1252, which
internally means ISO-8859-1 (aka "Latin-1" or "Western European")).
This might encode "¬" as 0xAC, or it might not.

However, if you want to write predictable code, you must agree with
whoever will read the bytes back upon which encoding to use.
Otherwise, when reading 0xC2AC back into a string, the reader might
get a tiny picture of a tiny goat instead of the "¬".

Aug 22 '07 #3

Jon Skeet [C# MVP]

On Aug 22, 2:27 pm, Steven Blair <steven.bl...@btinternet.comwrote:

string USMesg;

USMesg = "¬Credits¬Remaining¬";

byte[] bArray = Encoding.UTF8.GetBytes(USMesg);

After I execute the code, the first two bytes are:

[0] = 0xC2
[1] = 0xAC //This is the character I was expecting

Why has an extra byte been inserted after each ¬?

I tried using Encoding.ASCII.GetBytes() but that translates my ¬ to a
0x3F (Question mark ?)

Anyone any idea whats happened and how I can get round this problem?

It sounds like you should read up on the UTF-8 format. Using
Encoding.Default may well give you what you want, but UTF-8 is
generally a better format these days.

See http://pobox.com/~skeet/csharp/unicode.html and the referenced
links there.

Jon

Aug 22 '07 #4

UL-Tomten

On Aug 22, 4:04 pm, "Ignacio Machin $ .NET/ C# MVP $" <machin TA
laceupsolutions.comwrote:

Unicode characters use two bytes.

O RLY?

"Unicode" can be anything from UTF7 to UTF32. In this case, it was
UTF8. Each use different numbers of bits to represent characters.
Also, UTF8 uses anywhere between 1 and 4 (IIRC) bytes to represent a
character.

Perhaps you were thinking of "wide characters" from old-school Win32
programming?

Aug 22 '07 #5

Jon Skeet [C# MVP]

On Aug 22, 3:04 pm, "Ignacio Machin $ .NET/ C# MVP $" <machin TA
laceupsolutions.comwrote:

Unicode characters use two bytes.

True, but irrelevant in this case - the important thing is that the
UTF-8 encoded version of the relevant character takes two bytes.
(Other characters can take 1 or 3.)

Jon

Aug 22 '07 #6

Steven Blair

Encoding.Default.GetBytes() does the job.

Thanks for the help.
*** Sent via Developersdex http://www.developersdex.com ***

Aug 22 '07 #7

Ignacio Machin $ .NET/ C# MVP $

Opps, too early in the morning and not enough coffee :)
"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:11**********************@q4g2000prc.googlegro ups.com...

On Aug 22, 3:04 pm, "Ignacio Machin $ .NET/ C# MVP $" <machin TA
laceupsolutions.comwrote:
>Unicode characters use two bytes.

True, but irrelevant in this case - the important thing is that the
UTF-8 encoded version of the relevant character takes two bytes.
(Other characters can take 1 or 3.)

Jon

Aug 22 '07 #8

Similar topics

GetByte(int x, int n)

by: Floyd | last post by:

GetByte(x, 3) should return the 3rd byte of the 32 bit integer x. Allowed operators: ! ~ & ^ | + << >> (no assignment!). Would the easiest way to do this be just creating 4 bit masks... and using...

C / C++

Response.WriteFile seems to be adding an extra byte

by: David Union | last post by:

I am doing very simple code... in the middle of an http request, i set a filename (with path) and do a Response.WriteFile(filenamewithpath) then Response.End(). I have tried Response.Clear()...

Visual Basic .NET

I saved password into the database with "binary" data type from GetByte method. How can I to compare the password that between database and transfered from the page?

by: Benny Ng | last post by:

Dear all, The following is the source. The password is encrypted and saved into the Binary in SQL2K. Now I want to create a new page to compare the existed password and the password that in the...

.NET Framework

getbyte

by: Rain | last post by:

if i had a line of message, how do i get the 1st two bytes? how to code this? thank you so much in advance...

C# / C Sharp

Something wrong adding numbers

by: Guillermo_Lopez | last post by:

Hello All, I am using VBA in access to perform some calculations. There is a particular sumation that is wrong (barely). this code is withing a loop. TDist = TDist + TempDist Both TDist...

Microsoft Access / VBA

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing