472,784 Members | 778 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,784 software developers and data experts.

ASCII vs Unicode

Hi -

I'm setting up a streamreader in a VB.NET app to read a text file and
display its contents in a multiline textbox.

If I set it up with System.Text.Encoding.Unicode, it reads a unicode file
just fine. If I set it up as ASCII, it reads a non-unicode text file. But
I don't know the file format in advance.

How can my app determine whether to use Unicode encoding before I read the
file?

- Jeff
Nov 21 '05 #1
6 6934
Hi Jeff,

Based on my test, we do not need to specified the encoding when we read a
file into string, .net framework will handle the issue.
Private Const FILE_NAME As String = "c:\unicode.txt"
Private Const FILE_NAME1 As String = "c:\ascii.txt"
Public Sub Main()
If Not File.Exists(FILE_NAME) Then
Console.WriteLine("{0} does not exist.", FILE_NAME)
Return
End If
Dim sr As StreamReader = File.OpenText(FILE_NAME)
Dim input As String
input = sr.ReadToEnd()
Console.WriteLine(input)
sr.Close()

sr = File.OpenText(FILE_NAME1)
input = sr.ReadToEnd()
Console.WriteLine(input)
sr.Close()
End Sub

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #2
Thanks for responding, Peter -

I wish my results were the same as yours. I've attached a couple of files,
ASCII.txt and Unicode.txt. My code is below, FYI.

I've got a form with a textbox on it, and a couple of radiobuttons (encode
or not). To make it simple, I also included a couple of buttons, one for
each file. I display the results of the function below in the textbox.

With encoding, the Unicode file displays fine, and the ASCII file is a
string of unreadable characters. With no encoding, the ASCII file displays
fine, and the Unicode file only displays the first character (=).

I've tried replacing the If-Else-End If block of the code with "rdr =
File.OpenText(rFile)", but my results are the same as the no encoding
results just described.

Note that the Unicode file that I'm trying to read is a log file created by
Microsoft's SQL Server Desktop Engine Setup.exe.

I'd appreciate additional help to solve this problem.

- Jeff
My code:

Public Function ReadFile(ByVal rFile As String) As String
Dim fi As FileInfo
Dim rdr As StreamReader

Try

ReadFile = ""

fi = New FileInfo(rFile)
If Not fi.Exists Then
MessageBox.Show("File Not Found." & ControlChars.CrLf & rFile)
Exit Function
End If

fi = Nothing
If frmMain.optUnicode.Checked Then
rdr = New StreamReader(rFile, System.Text.Encoding.Unicode)
Else
rdr = New StreamReader(rFile)
End If

ReadFile = rdr.ReadToEnd

rdr.Close()
rdr = Nothing

Catch ex As Exception
MessageBox.Show(ex.ToString)

End Try

End Function
""Peter Huang"" <v-******@online.microsoft.com> wrote in message
news:gF**************@cpmsftngxa06.phx.gbl...
Hi Jeff,

Based on my test, we do not need to specified the encoding when we read a
file into string, .net framework will handle the issue.
Private Const FILE_NAME As String = "c:\unicode.txt"
Private Const FILE_NAME1 As String = "c:\ascii.txt"
Public Sub Main()
If Not File.Exists(FILE_NAME) Then
Console.WriteLine("{0} does not exist.", FILE_NAME)
Return
End If
Dim sr As StreamReader = File.OpenText(FILE_NAME)
Dim input As String
input = sr.ReadToEnd()
Console.WriteLine(input)
sr.Close()

sr = File.OpenText(FILE_NAME1)
input = sr.ReadToEnd()
Console.WriteLine(input)
sr.Close()
End Sub

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #3
Hi Jeff,

FFFE is the byte order mark of Unicode, it signifies to Unicode that the
bytes are little endian.
If you use the similar as below to write a string into a file with unicode
encoding, you will find that there is the FFFE leader characters.
Dim wr As New StreamWriter("C:\testuni.txt", True,
System.Text.Encoding.Unicode)
wr.Write(input)
wr.Close()

If you use the notepad.exe to save text as unicode file in the save as
dialog, you will find the same FFFE occurance.

Now the different between my unicode and your unicode file is that the
unicode.txt you provide did not have the FFFE leader characters which cause
the problem. StreamReader needs that to do the right decoding from byte
stream to string.

Acutally if you use the StreamReader to read your unicode.txt into a string
and then use the Console.WriteLine to print the string, you will find that
the string will be displayed correctly but a space between every two
characters.

So far now I think you may try to add the FFFE at the very beginning of the
unicode file when you generate the file.
You may try to use the hex editor to observe the unicode.txt. e.g.
UltraEdit is a good hex editor.
Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #4
Hi Jeff,

FFFE is the byte order mark of Unicode, it signifies to Unicode that the
bytes are little endian.
If you use the similar as below to write a string into a file with unicode
encoding, you will find that there is the FFFE leader characters.
Dim wr As New StreamWriter("C:\testuni.txt", True,
System.Text.Encoding.Unicode)
wr.Write(input)
wr.Close()

If you use the notepad.exe to save text as unicode file in the save as
dialog, you will find the same FFFE occurance.

Now the different between my unicode and your unicode file is that the
unicode.txt you provide did not have the FFFE leader characters which cause
the problem. StreamReader needs that to do the right decoding from byte
stream to string.

Acutally if you use the StreamReader to read your unicode.txt into a string
and then use the Console.WriteLine to print the string, you will find that
the string will be displayed correctly but a space between every two
characters.

So far now I think you may try to add the FFFE at the very beginning of the
unicode file when you generate the file.
You may try to use the hex editor to observe the unicode.txt. e.g.
UltraEdit is a good hex editor.
Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #5
Thanks, Peter -

But, as I mentioned in my post, I am not creating the unicode file.
(Microsoft creates it as a log file written by their setup.exe for Microsoft
SQL Server Desktop Edition.) And the unicode file that MS creates, while it
doesn't have the FFFE at its start, is quite readable with
StreamReader(rFile, System.Text.Encoding.Unicode).

I'm not looking for a way to create a unicode file. I'm looking for the
best way to display the contents of a text file in a multiline textbox, when
I don't know in advance whether it's ASCII or unicode.

Please help.

- Jeff

""Peter Huang"" <v-******@online.microsoft.com> wrote in message
news:Lx**************@cpmsftngxa06.phx.gbl...
Hi Jeff,

FFFE is the byte order mark of Unicode, it signifies to Unicode that the
bytes are little endian.
If you use the similar as below to write a string into a file with unicode
encoding, you will find that there is the FFFE leader characters.
Dim wr As New StreamWriter("C:\testuni.txt", True,
System.Text.Encoding.Unicode)
wr.Write(input)
wr.Close()

If you use the notepad.exe to save text as unicode file in the save as
dialog, you will find the same FFFE occurance.

Now the different between my unicode and your unicode file is that the
unicode.txt you provide did not have the FFFE leader characters which cause the problem. StreamReader needs that to do the right decoding from byte
stream to string.

Acutally if you use the StreamReader to read your unicode.txt into a string and then use the Console.WriteLine to print the string, you will find that
the string will be displayed correctly but a space between every two
characters.

So far now I think you may try to add the FFFE at the very beginning of the unicode file when you generate the file.
You may try to use the hex editor to observe the unicode.txt. e.g.
UltraEdit is a good hex editor.
Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #6
Hi Jeff,

Based on my test, the msde setup.exe tool will generate the log file with
the FFFE tag.
I run the command line as below.
setup /l*v C:\msde.log

After that, I will get the msde.log file, if I open it in the hex editor I
will find the flag FFFE.

If we do not have the flag, we can not identity the file's encoding.
e.g. the string below
=
is stored as
FFFE3d00

From the FFFE, streamreader will know that it is unicode, and it will
convert the 3d00 as the unicode.

But is we just encoding it as
3d00
the we can decoding in two way, acsii or unicode way.
If in unicode way, the 3d00 will be one character "=".
but if in ascii way, the 3d00 will be two character 3d and 00 i.e. "=" and
the character represented by ascii code(00)

Maybe there is any problem with the SQL MSDE setup program. As for that
issue, I think the SQL group will be better.
microsoft.public.sqlserver.msde
or
microsoft.public.sqlserver.setup

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Marian Aldenhövel | last post by:
Hi, I am very new to Python and have run into the following problem. If I do something like dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
2
by: Martín Marconcini | last post by:
Hello there, I'm writting (or trying to) a Console Application in C#. I has to be console. I remember back in the old days of Cobol (Unisys), Clipper and even Basic, I used to use a program...
18
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found...
24
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special...
19
by: many_years_after | last post by:
Hi,everyone: Have you any ideas? Say whatever you know about this. thanks.
2
by: joakim.hove | last post by:
Hello, I am having great problems writing norwegian characters æøå to file from a python application. My (simplified) scenario is as follows: 1. I have a web form where the user can enter his...
19
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag"...
4
by: Oleg Parashchenko | last post by:
Hello, I'm working on an unicode-aware application. I like to use "print" to debug programs, but in this case it was nightmare. The most popular result of "print" was: UnicodeDecodeError:...
399
by: =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= | last post by:
PEP 1 specifies that PEP authors need to collect feedback from the community. As the author of PEP 3131, I'd like to encourage comments to the PEP included below, either here (comp.lang.python), or...
0
by: Rina0 | last post by:
Cybersecurity engineering is a specialized field that focuses on the design, development, and implementation of systems, processes, and technologies that protect against cyber threats and...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
14
DJRhino1175
by: DJRhino1175 | last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this - If...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
0
by: lllomh | last post by:
How does React native implement an English player?
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.