473,421 Members | 1,537 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,421 software developers and data experts.

StreamReader omits 0x93 and 0x94 when reading text file

Hello,

I'm using the following code to read a text file in VB.NET.

Dim sr As StreamReader = File.OpenText(strFilePath)
Dim input As String = sr.ReadLine()

While Not input Is Nothing
strReturn += input + vbCrLf
input = sr.Read
End While

sr.Close()

For most cases this works fine. However, we have found that the opening
(0x93 - ") and closing (0x94 - ") quotation marks are being dropped without
warning or error.

Eg.
Original Text: This is some "quoted" text.
Text read in: This is some quoted text.

Does anyone have any clues as to what is going on here? Any advice is
appreciated.

Sincerely,
Drew Berkemeyer
Nov 20 '05 #1
7 7732
* "Drew Berkemeyer" <sp**@menow.com> scripsit:
I'm using the following code to read a text file in VB.NET.

Dim sr As StreamReader = File.OpenText(strFilePath)
Dim input As String = sr.ReadLine()

While Not input Is Nothing
strReturn += input + vbCrLf
input = sr.Read
'Read' or 'ReadLine'?
End While

sr.Close()


--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>
Nov 20 '05 #2
Hi Drew,

In addition to Herfried's suggestion, it seems that the " represented the
0x22 in ASCII coding.
You may try to the code below.
Sub Main()
For i As Integer = 0 To 255
Console.WriteLine("0x" + Hex(i) + ": " + Chr(i))
Next
End Sub

We will find the result as below.
0x22: "

and
0x93:
0x94:
the two are control character they are not the printable character.
If I have any misunderstanding, please feel free to let me know.

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 20 '05 #3
Thank you for the reply, but...

ASCII 0x93 and 0x94 are *not* control characters. They are the open and
close quotes.

If I open notepad and type Alt+0147 (0x93) I get ". If I type Alt+0148
(0x94) I get ".

So, to answer my own question posted earlier... Here's the solution. I was
not using the proper Encoding. I had created a StreamReader like this:

Dim sr As StreamReader = New StreamReader(strFilePath)

The correct code for reading in a plain text file (which does not eat "
and " chars) is:

Dim sr As StreamReader = New StreamReader(strFilePath,
System.Text.Encoding.Default)
Dim input As String = sr.ReadLine()

While Not input Is Nothing
strReturn += input + vbCrLf
input = sr.ReadLine()
End While

sr.Close()

Thanks again for your help. I appreciate the effort.

- Drew
""Peter Huang"" <v-******@online.microsoft.com> wrote in message
news:X5**************@cpmsftngxa10.phx.gbl...
Hi Drew,

In addition to Herfried's suggestion, it seems that the " represented the
0x22 in ASCII coding.
You may try to the code below.
Sub Main()
For i As Integer = 0 To 255
Console.WriteLine("0x" + Hex(i) + ": " + Chr(i))
Next
End Sub

We will find the result as below.
0x22: "

and
0x93:
0x94:
the two are control character they are not the printable character.
If I have any misunderstanding, please feel free to let me know.

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 20 '05 #4
* "Drew Berkemeyer" <sp**@menow.com> scripsit:
Thank you for the reply, but...

ASCII 0x93 and 0x94 are *not* control characters. They are the open and
close quotes.
They are not quotes by definition. ASCII is a 7-bit encoding that
doesn't include more than 128 characters.

<URL:http://www.asciitable.com/>
If I open notepad and type Alt+0147 (0x93) I get ". If I type Alt+0148
(0x94) I get ".


Right, but that's not ASCII.

--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>
Nov 20 '05 #5
Drew,
0x93 in ASCII is a 0x13 while 0x94 in ASCII is 0x14, as Herfried stated,
ASCII is a 7 bit characters, the high bit is ignored at best, exceptioned at
worst. (they are simply not valid ASCII).

When you open notepad you are in ANSI, with a specific code page. (the code
page is defined by the Windows Control Panel). ANSI is a full 8 bit
characters. 0x93 & 0x94 are typographic quote characters in the US ANSI code
page, I believe they are typographic quote characters in most European ANSI
code pages also.

Based on your original post, it appears you are using
System.IO.File.OpenText which opens the file in UTF-8, 0x93 & 0x94 are NOT
typographic quote characters in UTF-8! As UTF-8 is an 8-bit encoding for
Unicode, 8-bit Unicode characters are using for char points 128 to 255, the
typographic quote characters are 0x201C and 0x201D in Unicode.

To see a full explaination of Unicode and Encodings see:

http://www.yoda.arachsys.com/csharp/unicode.html
To see the Unicode code point for a character in Character Map, look in the
lower left corner of the window. It gives the Unicode code point, while the
lower right gives the ANSI/keyboard short cut. Note the "character set"
combo box in Character Map is the Encoding in .NET.

Common Unicode typographic quote chars include, but are not limited to:

' what most people think of quote chars
Const Apostrophe As Char = ChrW(&H27) ' single quotes
Const Quote As Char = ChrW(&H22) ' double quotes

' various typographic quote characters
Const LeftSingleQuote As Char = ChrW(&H2018)
Const RightSingleQuote As Char = ChrW(&H2019)
Const LeftDoubleQuote As Char = ChrW(&H201C)
Const RightDoubleQuote As Char = ChrW(&H201D)

' other typographic quote characters (international)
' Note: HP48 uses these for delimiters
Const LeftPointingDoubleAngleQuote As Char = ChrW(&HAB)
Const RightPointingDoubleAngleQuote As Char = ChrW(&HBB)

' other typographic quote characters (international)
Const SingleLow9Quote As Char = ChrW(&H201A)
Const SingleHighReversed9Quote As Char = ChrW(&H201B)
Const DoubleLow9Quote As Char = ChrW(&H201E)

The above are valid for Unicode encodings (UTF-8).

Hope this helps
Jay
"Drew Berkemeyer" <sp**@menow.com> wrote in message
news:uG**************@TK2MSFTNGP11.phx.gbl...
Thank you for the reply, but...

ASCII 0x93 and 0x94 are *not* control characters. They are the open and
close quotes.

If I open notepad and type Alt+0147 (0x93) I get ". If I type Alt+0148
(0x94) I get ".

So, to answer my own question posted earlier... Here's the solution. I was
not using the proper Encoding. I had created a StreamReader like this:

Dim sr As StreamReader = New StreamReader(strFilePath)

The correct code for reading in a plain text file (which does not eat "
and " chars) is:

Dim sr As StreamReader = New StreamReader(strFilePath,
System.Text.Encoding.Default)
Dim input As String = sr.ReadLine()

While Not input Is Nothing
strReturn += input + vbCrLf
input = sr.ReadLine()
End While

sr.Close()

Thanks again for your help. I appreciate the effort.
1
- Drew
""Peter Huang"" <v-******@online.microsoft.com> wrote in message
news:X5**************@cpmsftngxa10.phx.gbl...
Hi Drew,

In addition to Herfried's suggestion, it seems that the " represented the 0x22 in ASCII coding.
You may try to the code below.
Sub Main()
For i As Integer = 0 To 255
Console.WriteLine("0x" + Hex(i) + ": " + Chr(i))
Next
End Sub

We will find the result as below.
0x22: "

and
0x93:
0x94:
the two are control character they are not the printable character.
If I have any misunderstanding, please feel free to let me know.

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no

rights.


Nov 20 '05 #6
Drew,
I should add, to read a text file (notepad file) in you default Windows
encoding you can use the following:

Imports System.Text

Dim sr As New StreamReader(strFilePath, Encoding.Default)

Hope this helps
Jay

"Drew Berkemeyer" <sp**@menow.com> wrote in message
news:uG**************@TK2MSFTNGP11.phx.gbl...
Thank you for the reply, but...

ASCII 0x93 and 0x94 are *not* control characters. They are the open and
close quotes.

If I open notepad and type Alt+0147 (0x93) I get ". If I type Alt+0148
(0x94) I get ".

So, to answer my own question posted earlier... Here's the solution. I was
not using the proper Encoding. I had created a StreamReader like this:

Dim sr As StreamReader = New StreamReader(strFilePath)

The correct code for reading in a plain text file (which does not eat "
and " chars) is:

Dim sr As StreamReader = New StreamReader(strFilePath,
System.Text.Encoding.Default)
Dim input As String = sr.ReadLine()

While Not input Is Nothing
strReturn += input + vbCrLf
input = sr.ReadLine()
End While

sr.Close()

Thanks again for your help. I appreciate the effort.

- Drew
""Peter Huang"" <v-******@online.microsoft.com> wrote in message
news:X5**************@cpmsftngxa10.phx.gbl...
Hi Drew,

In addition to Herfried's suggestion, it seems that the " represented the 0x22 in ASCII coding.
You may try to the code below.
Sub Main()
For i As Integer = 0 To 255
Console.WriteLine("0x" + Hex(i) + ": " + Chr(i))
Next
End Sub

We will find the result as below.
0x22: "

and
0x93:
0x94:
the two are control character they are not the printable character.
If I have any misunderstanding, please feel free to let me know.

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no

rights.


Nov 20 '05 #7
Thank you to both Herfried and Jay.

I should have come back here sooner! I've spent the last week going round
and round with this only to discover exactly what you posted! <sigh> Thank
you so much.

Just as you explained, my problem is that I was using the incorrect Encoding
object. After much research of both the available options (which was not
straight forward research) and the content of the the file I am opening (in
this case notepad on Windows) I realized I should be using the following:

Imports System.Text

Dim sr As New StreamReader(strFilePath,
Encoding.GetEncoding("windows-1252")

I chose "windows-1252" because it will produce the same results on all
systems and is not dependent on (as pointed out by Jay) the default code
page settings of the computer running the code.

Thank you all for your assistance. I'm glad to get this one behind me and
learn something new.

Sincerely,
Drew Berkemeyer
"Drew Berkemeyer" <sp**@menow.com> wrote in message
news:uG**************@TK2MSFTNGP11.phx.gbl...
Thank you for the reply, but...

ASCII 0x93 and 0x94 are *not* control characters. They are the open and
close quotes.

If I open notepad and type Alt+0147 (0x93) I get ". If I type Alt+0148
(0x94) I get ".

So, to answer my own question posted earlier... Here's the solution. I was
not using the proper Encoding. I had created a StreamReader like this:

Dim sr As StreamReader = New StreamReader(strFilePath)

The correct code for reading in a plain text file (which does not eat "
and " chars) is:

Dim sr As StreamReader = New StreamReader(strFilePath,
System.Text.Encoding.Default)
Dim input As String = sr.ReadLine()

While Not input Is Nothing
strReturn += input + vbCrLf
input = sr.ReadLine()
End While

sr.Close()

Thanks again for your help. I appreciate the effort.

- Drew
""Peter Huang"" <v-******@online.microsoft.com> wrote in message
news:X5**************@cpmsftngxa10.phx.gbl...
Hi Drew,

In addition to Herfried's suggestion, it seems that the " represented the 0x22 in ASCII coding.
You may try to the code below.
Sub Main()
For i As Integer = 0 To 255
Console.WriteLine("0x" + Hex(i) + ": " + Chr(i))
Next
End Sub

We will find the result as below.
0x22: "

and
0x93:
0x94:
the two are control character they are not the printable character.
If I have any misunderstanding, please feel free to let me know.

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no

rights.


Nov 20 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: Dan V. | last post by:
How do I create a one line text file with these control codes? e.g.: 144 = 0x90 and 147 = 0x93? I am trying to create a one line text file with these characters all one one row with no spaces. ...
9
by: ShadowOfTheBeast | last post by:
Hi, I have got a major headache understanding streamReader and streamWriter relationship. I know how to use the streamreader and streamwriter independently. but how do you write out using the...
3
by: Arno | last post by:
Hi, I'm using TcpClient for communication between two PC running a small piece of software. The protocol used has been designed internally and is HTTP similar (command line, headers, body). A...
21
by: JoKur | last post by:
Hello, First let me tell you that I'm very new to C# and learning as I go. I'm trying to write a client application to communicate with a server (that I didn't write). Each message from the...
13
by: mloichate | last post by:
I must read a very heavy-weight text plain file (usually .txt extension) )and replace a given character with another given character in all text inside the file. My application was working pretty...
4
by: Max | last post by:
I'm using StreamReader.ReadToEnd() to populate and string from a file and then display it as a literal on my web site. The problem is I'm losing all the special characters like Æ,Ø, and Å that...
16
by: vvenk | last post by:
Hello: When I use either one to read a Text file, I get the same result. The length of the string that the file's content has been written into is the same. However, if the file is binary,...
1
by: Henry | last post by:
Hi, reading a textfile with the ReadLine method of a Streamreader objects the resulting string does not contain the special characters from the source file: e.g. Paragraph § and all the umlauts...
4
by: moondaddy | last post by:
I need to edit the text in many files so I'm writing a small routine to do this. First I have a method that loops through all the files in a directory and passes the full file path to another...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.