473,397 Members | 1,972 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

StreamReaders and encoding

I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByVal strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.StreamReader

Try
objStreamReader = System.IO.File.OpenText(strFilename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader.ReadToEnd
objStreamReader.Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly read a
text file with a different encoding. I have a text file with some French
accents in it, like "acheté". My function would return "achet", dropping
the é completely. I'm not sure how to address this and it's very
important to make it continue to work as it has with the plain English
files I usually use it with. Anyone know how to address this? Thanks!

Matt
Nov 19 '05 #1
5 1643
MattB wrote:
I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByVal strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.StreamReader

Try
objStreamReader = System.IO.File.OpenText(strFilename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader.ReadToEnd
objStreamReader.Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly
read a text file with a different encoding. I have a text file with
some French accents in it, like "acheté". My function would return
"achet", dropping the é completely. I'm not sure how to address this
and it's very important to make it continue to work as it has with
the plain English files I usually use it with. Anyone know how to
address this? Thanks!


Same answer as always: Use the correct encoding. File.OpenText() always
uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@"C:\Foo\Bar.txt", Encoding.Default)
{
// ...
}

Note that Encoding.Default is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #2
Joerg Jooss wrote:
MattB wrote:

I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByVal strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.StreamReader

Try
objStreamReader = System.IO.File.OpenText(strFilename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader.ReadToEnd
objStreamReader.Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly
read a text file with a different encoding. I have a text file with
some French accents in it, like "acheté". My function would return
"achet", dropping the é completely. I'm not sure how to address this
and it's very important to make it continue to work as it has with
the plain English files I usually use it with. Anyone know how to
address this? Thanks!

Same answer as always: Use the correct encoding. File.OpenText() always
uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@"C:\Foo\Bar.txt", Encoding.Default)
{
// ...
}

Note that Encoding.Default is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,


Thanks for the reply!

Do you know if I can detect the encoding of the text file somehow, so
this app will work correctly with differently encoded text files?

Got any links or examples?

Thanks again!

Matt
Nov 19 '05 #3
Joerg Jooss wrote:
MattB wrote:

I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByVal strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.StreamReader

Try
objStreamReader = System.IO.File.OpenText(strFilename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader.ReadToEnd
objStreamReader.Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly
read a text file with a different encoding. I have a text file with
some French accents in it, like "acheté". My function would return
"achet", dropping the é completely. I'm not sure how to address this
and it's very important to make it continue to work as it has with
the plain English files I usually use it with. Anyone know how to
address this? Thanks!

Same answer as always: Use the correct encoding. File.OpenText() always
uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@"C:\Foo\Bar.txt", Encoding.Default)
{
// ...
}

Note that Encoding.Default is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,


OK, so I tried creating the StreamReader as you said, and I tried every
encoding I could and nothing could read my text file with French
characters correctly. For example, the word "acheté" comes across as
"achet".
It entirely possible (even likely) I'm taking the wrong approach.
Can anyone with US English Windows put the word "acheté" in a text file
and have the last character come through?

Maybe I'll try reading it as binary next...

Any suggestions appreciated!

Matt
Nov 19 '05 #4
MattB wrote:

[...]
OK, so I tried creating the StreamReader as you said, and I tried
every encoding I could and nothing could read my text file with
French characters correctly. For example, the word "acheté" comes
across as "achet". It entirely possible (even likely) I'm taking the
wrong approach. Can anyone with US English Windows put the word
"acheté" in a text file and have the last character come through?

Maybe I'll try reading it as binary next...


There's no such thing as binary text. There are only bytes, which after
decoding them to characters, may become meaningful text.

The only way to solve this problem is to understand which character
encoding is being used. Can you load the file in a hex editor and try
to find out what bytes are used to represent the 'é'?

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #5
MattB wrote:
Joerg Jooss wrote:
MattB wrote:

I have the following function I use in my application quite a bit
(I missed the VFP one and decided to make my own):

Public Shared Function File2String(ByVal strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.StreamReader

Try
objStreamReader = System.IO.File.OpenText(strFilename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader.ReadToEnd
objStreamReader.Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly
read a text file with a different encoding. I have a text file
with some French accents in it, like "acheté". My function would
return "achet", dropping the é completely. I'm not sure how to
address this and it's very important to make it continue to work
as it has with the plain English files I usually use it with.
Anyone know how to address this? Thanks!

Same answer as always: Use the correct encoding. File.OpenText()
always uses an UTF-8 StreamReader implicitly. Create your own
StreamReader instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@"C:\Foo\Bar.txt", Encoding.Default)
{
// ...
}

Note that Encoding.Default is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,


Thanks for the reply!

Do you know if I can detect the encoding of the text file somehow, so
this app will work correctly with differently encoded text files?


I answered this yesterday -- see http://tinyurl.com/cn7z8.
Got any links or examples?


See Jon Skeet's page on Unicode and .NET:
http://www.yoda.arachsys.com/csharp/unicode.html

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Christopher H. Laco | last post by:
Long story longer. I need to get web user input into a backend system that a) only grocks single byte encoding, b) expectes the data transer to be 1 bytes = 1 character, and c) uses the HP Roman-6...
8
by: davisjoseph | last post by:
Hi All, I'm newbie to this XML world. My problem is to identify the encoding type of XML at runtime. What currently I'm doing is checking whether BOM is available in the XML; based on the BOM...
8
by: Demon News | last post by:
I'm trying to do a transform (Using XmlTransform class in c#) and in the Transform I'm specifying the the output xsl below: <xsl:output method="xml" encoding="UTF-8" indent="no"/> the...
5
by: Waldy | last post by:
Hi there, how do you set the encoding format of an XML string? When I was outputting the XML to a file you can specify the encoding format like so: XmlTextWriter myWriter; myWriter = new...
4
by: fitsch | last post by:
Hi, I am trying to write a generic RSS/Atom/OPML feed client. The problem is, that those xml feeds may have different encodings: - <?xml version="1.0" encoding="ISO-8859-1" ?>... - <?xml...
0
by: Chris McDonough | last post by:
ElementTree's XML serialization routine implied by tree._write(file, node, encoding, namespaces looks like this (elided): def _write(self, file, node, encoding, namespaces): # write XML to file...
4
by: Christina | last post by:
Hey Guys, Currently, I am using the below code: Dim oReqDoc as XmlDocument Dim requiredBytes As Byte() requiredBytes = System.Text.UTF8Encoding.UTF8.GetBytes(oReqDoc.InnerXml). Here, I am...
3
by: mortb | last post by:
1. How do I determine which encoding a xmldocument or xmlreader uses when opening a document? I'm not just talking about the <?xml encoding="utf-8"?attribute, but the actual encoding of the...
1
by: ujjwaltrivedi | last post by:
Hey guys, Can anyone tell me how to create a text file with Unicode Encoding. In am using FileStream Finalfile = new FileStream("finalfile.txt", FileMode.Append, FileAccess.Write); ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.