Before we start, let's get rid of the trees so that we can see the wood:
Dim plainText As String = "t═e"
MessageBox.Show("before " & plainText)
Dim plainTextBytes As Byte() = Encoding.ASCII.GetBytes(plainText)
Dim decodeString As String = Encoding.ASCII.GetString(plainTextBytes)
MessageBox.Show("after " & decodeString)
The 1st MessageBox.Show displays before te and the 2nd displays after
t???e which is exactly correct and is also what is expected.
The reason that the 'apparent' space does not show between the and the e
in the 1st MessageBox.Show is that, in a proportional font, that particular
ANSI character, (144 decimal), has either no width or is very narrow. I may
be corrected on this but I think it is called an emSpace, which I interpret
as meaning that it is 1 em wide, which is 1 point which is also 1/72 of an
inch.
The two preceeding characters have the codes 226 decimal and 8226 decimal
respectively. The first of these is in the ASNI range but the second
requires 2 bytes to represent and therefore is true unicode.
The documentation for the Encoding.ASCII.GetBytes method states
categorically that it 'encodes all the characters in the specified string
into a sequence of bytes'. Nowhere does it give the impression that it
'strips' characters out. When it comes across a non-ASCII character or an
ASCII charcater that is considered 'unprintable' it substutes the byte &3FH
(63 decimal) which, of course, is the ? character. Therefore, the after
t???e displayed by the 2nd MessageBox.Show is correct.
If you really want the non-ASCII and ASCII 'unprintable' characters stripped
out then you can use any number of algorithms that REMOVE the 'offending'
characters from the string. One such algorithm is demonstrated:
Dim plainText As String = "t═e"
MessageBox.Show("before " & plainText)
Dim decodeString As String = String.Empty
For _i = 0 To plainText.Length - 1
If AscW(plainText(_i)) >= 32 AndAlso AscW(plainText(_i)) < 127 Then
decodeString &= plainText(_i)
Next
MessageBox.Show("after " & decodeString)
Now, the 1st MessageBox.Show displays before te and the 2nd displays after
te which is what you appear to want.
"Dan" <Da*@discussions.microsoft.comwrote in message
news:4F**********************************@microsof t.com...
>I have the following code section that I thought would strip out all the
non-ascii characters from a string after decoding it. Unfortunately the
non-ascii characters are still in the string.
What am I doing wrong?
Dim plainText As String
plainText = "t═e"
Dim plainTextBytes() As Byte
Dim enc As Encoding = Encoding.ASCII
plainTextBytes = enc.GetBytes(plainText)
Dim str As String
str = enc.GetString(plainTextBytes).ToString
MessageBox.Show("before " & str)
Dim decodeString As String = enc.GetString(plainTextBytes)
MessageBox.Show("after " & decodeString)
Any help would be greatly appreciated.
Dan