473,770 Members | 4,552 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

ASCII vs Unicode

Hi -

I'm setting up a streamreader in a VB.NET app to read a text file and
display its contents in a multiline textbox.

If I set it up with System.Text.Enc oding.Unicode, it reads a unicode file
just fine. If I set it up as ASCII, it reads a non-unicode text file. But
I don't know the file format in advance.

How can my app determine whether to use Unicode encoding before I read the
file?

- Jeff
Nov 21 '05 #1
6 7032
Hi Jeff,

Based on my test, we do not need to specified the encoding when we read a
file into string, .net framework will handle the issue.
Private Const FILE_NAME As String = "c:\unicode.txt "
Private Const FILE_NAME1 As String = "c:\ascii.t xt"
Public Sub Main()
If Not File.Exists(FIL E_NAME) Then
Console.WriteLi ne("{0} does not exist.", FILE_NAME)
Return
End If
Dim sr As StreamReader = File.OpenText(F ILE_NAME)
Dim input As String
input = sr.ReadToEnd()
Console.WriteLi ne(input)
sr.Close()

sr = File.OpenText(F ILE_NAME1)
input = sr.ReadToEnd()
Console.WriteLi ne(input)
sr.Close()
End Sub

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #2
Thanks for responding, Peter -

I wish my results were the same as yours. I've attached a couple of files,
ASCII.txt and Unicode.txt. My code is below, FYI.

I've got a form with a textbox on it, and a couple of radiobuttons (encode
or not). To make it simple, I also included a couple of buttons, one for
each file. I display the results of the function below in the textbox.

With encoding, the Unicode file displays fine, and the ASCII file is a
string of unreadable characters. With no encoding, the ASCII file displays
fine, and the Unicode file only displays the first character (=).

I've tried replacing the If-Else-End If block of the code with "rdr =
File.OpenText(r File)", but my results are the same as the no encoding
results just described.

Note that the Unicode file that I'm trying to read is a log file created by
Microsoft's SQL Server Desktop Engine Setup.exe.

I'd appreciate additional help to solve this problem.

- Jeff
My code:

Public Function ReadFile(ByVal rFile As String) As String
Dim fi As FileInfo
Dim rdr As StreamReader

Try

ReadFile = ""

fi = New FileInfo(rFile)
If Not fi.Exists Then
MessageBox.Show ("File Not Found." & ControlChars.Cr Lf & rFile)
Exit Function
End If

fi = Nothing
If frmMain.optUnic ode.Checked Then
rdr = New StreamReader(rF ile, System.Text.Enc oding.Unicode)
Else
rdr = New StreamReader(rF ile)
End If

ReadFile = rdr.ReadToEnd

rdr.Close()
rdr = Nothing

Catch ex As Exception
MessageBox.Show (ex.ToString)

End Try

End Function
""Peter Huang"" <v-******@online.m icrosoft.com> wrote in message
news:gF******** ******@cpmsftng xa06.phx.gbl...
Hi Jeff,

Based on my test, we do not need to specified the encoding when we read a
file into string, .net framework will handle the issue.
Private Const FILE_NAME As String = "c:\unicode.txt "
Private Const FILE_NAME1 As String = "c:\ascii.t xt"
Public Sub Main()
If Not File.Exists(FIL E_NAME) Then
Console.WriteLi ne("{0} does not exist.", FILE_NAME)
Return
End If
Dim sr As StreamReader = File.OpenText(F ILE_NAME)
Dim input As String
input = sr.ReadToEnd()
Console.WriteLi ne(input)
sr.Close()

sr = File.OpenText(F ILE_NAME1)
input = sr.ReadToEnd()
Console.WriteLi ne(input)
sr.Close()
End Sub

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #3
Hi Jeff,

FFFE is the byte order mark of Unicode, it signifies to Unicode that the
bytes are little endian.
If you use the similar as below to write a string into a file with unicode
encoding, you will find that there is the FFFE leader characters.
Dim wr As New StreamWriter("C :\testuni.txt", True,
System.Text.Enc oding.Unicode)
wr.Write(input)
wr.Close()

If you use the notepad.exe to save text as unicode file in the save as
dialog, you will find the same FFFE occurance.

Now the different between my unicode and your unicode file is that the
unicode.txt you provide did not have the FFFE leader characters which cause
the problem. StreamReader needs that to do the right decoding from byte
stream to string.

Acutally if you use the StreamReader to read your unicode.txt into a string
and then use the Console.WriteLi ne to print the string, you will find that
the string will be displayed correctly but a space between every two
characters.

So far now I think you may try to add the FFFE at the very beginning of the
unicode file when you generate the file.
You may try to use the hex editor to observe the unicode.txt. e.g.
UltraEdit is a good hex editor.
Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #4
Hi Jeff,

FFFE is the byte order mark of Unicode, it signifies to Unicode that the
bytes are little endian.
If you use the similar as below to write a string into a file with unicode
encoding, you will find that there is the FFFE leader characters.
Dim wr As New StreamWriter("C :\testuni.txt", True,
System.Text.Enc oding.Unicode)
wr.Write(input)
wr.Close()

If you use the notepad.exe to save text as unicode file in the save as
dialog, you will find the same FFFE occurance.

Now the different between my unicode and your unicode file is that the
unicode.txt you provide did not have the FFFE leader characters which cause
the problem. StreamReader needs that to do the right decoding from byte
stream to string.

Acutally if you use the StreamReader to read your unicode.txt into a string
and then use the Console.WriteLi ne to print the string, you will find that
the string will be displayed correctly but a space between every two
characters.

So far now I think you may try to add the FFFE at the very beginning of the
unicode file when you generate the file.
You may try to use the hex editor to observe the unicode.txt. e.g.
UltraEdit is a good hex editor.
Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #5
Thanks, Peter -

But, as I mentioned in my post, I am not creating the unicode file.
(Microsoft creates it as a log file written by their setup.exe for Microsoft
SQL Server Desktop Edition.) And the unicode file that MS creates, while it
doesn't have the FFFE at its start, is quite readable with
StreamReader(rF ile, System.Text.Enc oding.Unicode).

I'm not looking for a way to create a unicode file. I'm looking for the
best way to display the contents of a text file in a multiline textbox, when
I don't know in advance whether it's ASCII or unicode.

Please help.

- Jeff

""Peter Huang"" <v-******@online.m icrosoft.com> wrote in message
news:Lx******** ******@cpmsftng xa06.phx.gbl...
Hi Jeff,

FFFE is the byte order mark of Unicode, it signifies to Unicode that the
bytes are little endian.
If you use the similar as below to write a string into a file with unicode
encoding, you will find that there is the FFFE leader characters.
Dim wr As New StreamWriter("C :\testuni.txt", True,
System.Text.Enc oding.Unicode)
wr.Write(input)
wr.Close()

If you use the notepad.exe to save text as unicode file in the save as
dialog, you will find the same FFFE occurance.

Now the different between my unicode and your unicode file is that the
unicode.txt you provide did not have the FFFE leader characters which cause the problem. StreamReader needs that to do the right decoding from byte
stream to string.

Acutally if you use the StreamReader to read your unicode.txt into a string and then use the Console.WriteLi ne to print the string, you will find that
the string will be displayed correctly but a space between every two
characters.

So far now I think you may try to add the FFFE at the very beginning of the unicode file when you generate the file.
You may try to use the hex editor to observe the unicode.txt. e.g.
UltraEdit is a good hex editor.
Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #6
Hi Jeff,

Based on my test, the msde setup.exe tool will generate the log file with
the FFFE tag.
I run the command line as below.
setup /l*v C:\msde.log

After that, I will get the msde.log file, if I open it in the hex editor I
will find the flag FFFE.

If we do not have the flag, we can not identity the file's encoding.
e.g. the string below
=
is stored as
FFFE3d00

From the FFFE, streamreader will know that it is unicode, and it will
convert the 3d00 as the unicode.

But is we just encoding it as
3d00
the we can decoding in two way, acsii or unicode way.
If in unicode way, the 3d00 will be one character "=".
but if in ascii way, the 3d00 will be two character 3d and 00 i.e. "=" and
the character represented by ascii code(00)

Maybe there is any problem with the SQL MSDE setup program. As for that
issue, I think the SQL group will be better.
microsoft.publi c.sqlserver.msd e
or
microsoft.publi c.sqlserver.set up

Best regards,

Peter Huang
Microsoft Online Partner Support

Get Secure! - www.microsoft.com/security
This posting is provided "AS IS" with no warranties, and confers no rights.

Nov 21 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
4365
by: Marian Aldenhövel | last post by:
Hi, I am very new to Python and have run into the following problem. If I do something like dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain non-ascii characters.
4
6072
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3 script that grabs some web pages from the web, regex parse the data and stores it localy to xml file for further use.. at first i had no problem using python minidom and everything concerning
2
6203
by: Martín Marconcini | last post by:
Hello there, I'm writting (or trying to) a Console Application in C#. I has to be console. I remember back in the old days of Cobol (Unisys), Clipper and even Basic, I used to use a program (its name i cannot recall now...) where I designed the "screen" using this "program" and then saved it into an ASCII file. (thus, using 'extended' ASCII's like Lines, Corners, etc. and making screens look nicer and more professional). Then reading a...
18
34146
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found Encoding.Convert, but that needs byte arrays. Thanks, /Ger
24
9068
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special charcaters from an ASCII perspective. I get the following error: > SQLiteCur.execute(sql, row)
19
32839
by: many_years_after | last post by:
Hi,everyone: Have you any ideas? Say whatever you know about this. thanks.
2
6713
by: joakim.hove | last post by:
Hello, I am having great problems writing norwegian characters æøå to file from a python application. My (simplified) scenario is as follows: 1. I have a web form where the user can enter his name. 2. I use the cgi module module to get to the input from the user: .... name = form.value
19
3338
by: Thomas W | last post by:
I'm getting really annoyed with python in regards to unicode/ascii-encoding problems. The string below is the encoding of the norwegian word "fødselsdag". I stored the string as "fødselsdag" but somewhere in my code it got translated into the mess above and I cannot get the original string back. It cannot be printed in the console or written a plain text-file. I've tried to convert it using
4
5378
by: Oleg Parashchenko | last post by:
Hello, I'm working on an unicode-aware application. I like to use "print" to debug programs, but in this case it was nightmare. The most popular result of "print" was: UnicodeDecodeError: 'ascii' codec can't decode byte 0xXX in position 0: ordinal not in range(128) I spent two hours fixing it, and I hope it's done. The solution is one
399
12938
by: =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= | last post by:
PEP 1 specifies that PEP authors need to collect feedback from the community. As the author of PEP 3131, I'd like to encourage comments to the PEP included below, either here (comp.lang.python), or to python-3000@python.org In summary, this PEP proposes to allow non-ASCII letters as identifiers in Python. If the PEP is accepted, the following identifiers would also become valid as class, function, or variable names: Löffelstiel,...
0
9602
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9439
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10237
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10017
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9882
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8905
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7431
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
2
3589
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2832
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.