473,761 Members | 9,379 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

StreamReaders and encoding

I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByV al strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.Strea mReader

Try
objStreamReader = System.IO.File. OpenText(strFil ename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader .ReadToEnd
objStreamReader .Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly read a
text file with a different encoding. I have a text file with some French
accents in it, like "acheté". My function would return "achet", dropping
the é completely. I'm not sure how to address this and it's very
important to make it continue to work as it has with the plain English
files I usually use it with. Anyone know how to address this? Thanks!

Matt
Nov 19 '05 #1
5 1663
MattB wrote:
I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByV al strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.Strea mReader

Try
objStreamReader = System.IO.File. OpenText(strFil ename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader .ReadToEnd
objStreamReader .Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly
read a text file with a different encoding. I have a text file with
some French accents in it, like "acheté". My function would return
"achet", dropping the é completely. I'm not sure how to address this
and it's very important to make it continue to work as it has with
the plain English files I usually use it with. Anyone know how to
address this? Thanks!


Same answer as always: Use the correct encoding. File.OpenText() always
uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@" C:\Foo\Bar.txt" , Encoding.Defaul t)
{
// ...
}

Note that Encoding.Defaul t is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,
--
http://www.joergjooss.de
mailto:ne****** **@joergjooss.d e
Nov 19 '05 #2
Joerg Jooss wrote:
MattB wrote:

I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByV al strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.Strea mReader

Try
objStreamReader = System.IO.File. OpenText(strFil ename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader .ReadToEnd
objStreamReader .Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly
read a text file with a different encoding. I have a text file with
some French accents in it, like "acheté". My function would return
"achet", dropping the é completely. I'm not sure how to address this
and it's very important to make it continue to work as it has with
the plain English files I usually use it with. Anyone know how to
address this? Thanks!

Same answer as always: Use the correct encoding. File.OpenText() always
uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@" C:\Foo\Bar.txt" , Encoding.Defaul t)
{
// ...
}

Note that Encoding.Defaul t is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,


Thanks for the reply!

Do you know if I can detect the encoding of the text file somehow, so
this app will work correctly with differently encoded text files?

Got any links or examples?

Thanks again!

Matt
Nov 19 '05 #3
Joerg Jooss wrote:
MattB wrote:

I have the following function I use in my application quite a bit (I
missed the VFP one and decided to make my own):

Public Shared Function File2String(ByV al strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.Strea mReader

Try
objStreamReader = System.IO.File. OpenText(strFil ename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader .ReadToEnd
objStreamReader .Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly
read a text file with a different encoding. I have a text file with
some French accents in it, like "acheté". My function would return
"achet", dropping the é completely. I'm not sure how to address this
and it's very important to make it continue to work as it has with
the plain English files I usually use it with. Anyone know how to
address this? Thanks!

Same answer as always: Use the correct encoding. File.OpenText() always
uses an UTF-8 StreamReader implicitly. Create your own StreamReader
instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@" C:\Foo\Bar.txt" , Encoding.Defaul t)
{
// ...
}

Note that Encoding.Defaul t is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,


OK, so I tried creating the StreamReader as you said, and I tried every
encoding I could and nothing could read my text file with French
characters correctly. For example, the word "acheté" comes across as
"achet".
It entirely possible (even likely) I'm taking the wrong approach.
Can anyone with US English Windows put the word "acheté" in a text file
and have the last character come through?

Maybe I'll try reading it as binary next...

Any suggestions appreciated!

Matt
Nov 19 '05 #4
MattB wrote:

[...]
OK, so I tried creating the StreamReader as you said, and I tried
every encoding I could and nothing could read my text file with
French characters correctly. For example, the word "acheté" comes
across as "achet". It entirely possible (even likely) I'm taking the
wrong approach. Can anyone with US English Windows put the word
"acheté" in a text file and have the last character come through?

Maybe I'll try reading it as binary next...


There's no such thing as binary text. There are only bytes, which after
decoding them to characters, may become meaningful text.

The only way to solve this problem is to understand which character
encoding is being used. Can you load the file in a hex editor and try
to find out what bytes are used to represent the 'é'?

Cheers,
--
http://www.joergjooss.de
mailto:ne****** **@joergjooss.d e
Nov 19 '05 #5
MattB wrote:
Joerg Jooss wrote:
MattB wrote:

I have the following function I use in my application quite a bit
(I missed the VFP one and decided to make my own):

Public Shared Function File2String(ByV al strFile)
'Open a file for reading
Dim strFilename As String = strFile
'Get a StreamReader class that can be used to read the file
Dim objStreamReader As System.IO.Strea mReader

Try
objStreamReader = System.IO.File. OpenText(strFil ename)
Catch ex As Exception
Return Nothing
End Try

Dim str As String = objStreamReader .ReadToEnd
objStreamReader .Close()
Return str
End Function

It's been working well, but I just found out it doesn't correctly
read a text file with a different encoding. I have a text file
with some French accents in it, like "acheté". My function would
return "achet", dropping the é completely. I'm not sure how to
address this and it's very important to make it continue to work
as it has with the plain English files I usually use it with.
Anyone know how to address this? Thanks!

Same answer as always: Use the correct encoding. File.OpenText()
always uses an UTF-8 StreamReader implicitly. Create your own
StreamReader instance instead and specify the desired encoding.

using (StreamReader reader =
new StreamReader(@" C:\Foo\Bar.txt" , Encoding.Defaul t)
{
// ...
}

Note that Encoding.Defaul t is your OS default encoding (most likely
Windows-1252) and represents the best guess if UTF-8 doesn't apply.

Cheers,


Thanks for the reply!

Do you know if I can detect the encoding of the text file somehow, so
this app will work correctly with differently encoded text files?


I answered this yesterday -- see http://tinyurl.com/cn7z8.
Got any links or examples?


See Jon Skeet's page on Unicode and .NET:
http://www.yoda.arachsys.com/csharp/unicode.html

Cheers,
--
http://www.joergjooss.de
mailto:ne****** **@joergjooss.d e
Nov 19 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
687
by: Christopher H. Laco | last post by:
Long story longer. I need to get web user input into a backend system that a) only grocks single byte encoding, b) expectes the data transer to be 1 bytes = 1 character, and c) uses the HP Roman-6 codepage system wide. As much as it sounds good, UTF/Unicode encoding is not an option, nor is changing the codepage. Tackling the first is easy via Encoding.Default.GetBytes and shoving it over the network. However, Encoding.Default is the...
8
2496
by: davisjoseph | last post by:
Hi All, I'm newbie to this XML world. My problem is to identify the encoding type of XML at runtime. What currently I'm doing is checking whether BOM is available in the XML; based on the BOM I'm identifying the encoding type. Here is the problem, some type of UTF-8 encoded file does'nt have BOM in the starting. So I'm identying the file as iso-8859-1 encoded which is actually encoded in UTF-8. I dont have much idea about the...
8
607
by: Demon News | last post by:
I'm trying to do a transform (Using XmlTransform class in c#) and in the Transform I'm specifying the the output xsl below: <xsl:output method="xml" encoding="UTF-8" indent="no"/> the resulting xml ends up with the following declaration: <?xml version="1.0" encoding="utf-16"?> changing the encoding to utf-16, is there something I'm doing wrong? Is it
5
5249
by: Waldy | last post by:
Hi there, how do you set the encoding format of an XML string? When I was outputting the XML to a file you can specify the encoding format like so: XmlTextWriter myWriter; myWriter = new XmlTextWriter(myXMLFile, System.Text.Encoding.UTF8);
4
8848
by: fitsch | last post by:
Hi, I am trying to write a generic RSS/Atom/OPML feed client. The problem is, that those xml feeds may have different encodings: - <?xml version="1.0" encoding="ISO-8859-1" ?>... - <?xml version="1.0" encoding="utf-8" ?>... - ... I am using the WebRequest functionality to get the feeds. So, my code
0
1972
by: Chris McDonough | last post by:
ElementTree's XML serialization routine implied by tree._write(file, node, encoding, namespaces looks like this (elided): def _write(self, file, node, encoding, namespaces): # write XML to file tag = node.tag if tag is Comment: file.write("<!-- %s -->" % _escape_cdata(node.text, encoding)) elif tag is ProcessingInstruction: file.write("<?%s?>" % _escape_cdata(node.text, encoding))
4
8376
by: Christina | last post by:
Hey Guys, Currently, I am using the below code: Dim oReqDoc as XmlDocument Dim requiredBytes As Byte() requiredBytes = System.Text.UTF8Encoding.UTF8.GetBytes(oReqDoc.InnerXml). Here, I am encoding my xml string in UTF8 format.
3
5494
by: mortb | last post by:
1. How do I determine which encoding a xmldocument or xmlreader uses when opening a document? I'm not just talking about the <?xml encoding="utf-8"?attribute, but the actual encoding of the characters in the underlying stream. 2. How do I make sure that the encoding of my created xmldocument or xmlwriter is in utf-8? Thanks! /mortb
1
32946
by: ujjwaltrivedi | last post by:
Hey guys, Can anyone tell me how to create a text file with Unicode Encoding. In am using FileStream Finalfile = new FileStream("finalfile.txt", FileMode.Append, FileAccess.Write); ###Question: Now this creates finalfile.txt with ANSI Encoding ...which is a default. Either tell me how to change the default or how to create a
0
9333
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10107
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9765
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8768
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7324
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6599
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5214
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5361
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3442
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.