473,406 Members | 2,208 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Unicode Character Issue

Hi

I am trying to read text files that are saved in ANSI format with Unicode
characters such as French e German big S etc, and as I read the file these
characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the Stream
Reader the application will recognize it as Unicode characters.
Thank you,
Samuel
Jun 27 '08 #1
10 1390
Hello -

There is an overloaded constructor that takes an Encoding (e.g.
System.Text.UnicodeEncoding). Visual Basic treats all strings as
Unicode so if you do a ReadLine() to a String it should work.

Joe

On Jun 4, 12:04*pm, "Samuel" <samuel.shul...@ntlworld.comwrote:
Hi

I am trying to read text files that are saved in ANSI format with Unicode
characters such as French e German big S etc, and as I read the file these
characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the Stream
Reader the application will recognize it as Unicode characters.

Thank you,
Samuel
Jun 27 '08 #2
Hello -

There is an overloaded constructor that takes an Encoding (e.g.
System.Text.UnicodeEncoding). Visual Basic treats its String as
Unicode so you should be find if you use ReadLine().

Joe

On Jun 4, 12:04*pm, "Samuel" <samuel.shul...@ntlworld.comwrote:
Hi

I am trying to read text files that are saved in ANSI format with Unicode
characters such as French e German big S etc, and as I read the file these
characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the Stream
Reader the application will recognize it as Unicode characters.

Thank you,
Samuel
Jun 27 '08 #3

Here is a small, working example. Disregard
the ReadBinary stuff. That was just something
from another project and could be written
smaller/cleaner/etc.

Regards,

Joergen Bech

---snip---

Option Explicit On
Option Strict On

Imports System.IO
Imports System.Text
Imports System

Public Class Form1

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles MyBase.Load
Dim ansi() As Byte = ReadBinary("d:\ansi.txt")
Dim s As String =
Encoding.GetEncoding("iso-8859-1").GetString(ansi)
TextBox1.Text = s

' Dim sb As New StringBuilder
' For Each b As Byte In ansi
' sb.Append(Chr(b))
' Next
' TextBox1.Text = sb.ToString
End Sub

Public Shared Function ReadBinary(ByVal file As String) As Byte()
Dim errorInformation As String = String.Empty
Dim result As Byte() = ReadBinary(file, errorInformation)
If String.IsNullOrEmpty(errorInformation) Then
Return result
Else
Throw New IOException(errorInformation)
End If
End Function

Public Shared Function ReadBinary(ByVal file As String, _
ByRef errorInformation As
String) As Byte()

Try
Dim fInfo As New FileInfo(file)
Dim numBytes As Long = fInfo.Length
Dim fStream As New FileStream(file, FileMode.Open,
FileAccess.Read)
Dim br As New BinaryReader(fStream)
Dim data() As Byte = br.ReadBytes(CInt(numBytes))

br.Close()
fStream.Close()

Return data

Catch ex As Exception
errorInformation = ex.Message
Return Nothing

End Try

End Function

On Wed, 4 Jun 2008 17:04:48 +0100, "Samuel"
<sa************@ntlworld.comwrote:
>Hi

I am trying to read text files that are saved in ANSI format with Unicode
characters such as French e German big S etc, and as I read the file these
characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the Stream
Reader the application will recognize it as Unicode characters.
Thank you,
Samuel
Jun 27 '08 #4
Unicode didn't work but 'Default' did

Thank you,
Samuel
"Joe Duchtel" <du*****@gmail.comwrote in message
news:ae**********************************@79g2000h sk.googlegroups.com...
Hello -

There is an overloaded constructor that takes an Encoding (e.g.
System.Text.UnicodeEncoding). Visual Basic treats all strings as
Unicode so if you do a ReadLine() to a String it should work.

Joe

On Jun 4, 12:04 pm, "Samuel" <samuel.shul...@ntlworld.comwrote:
Hi

I am trying to read text files that are saved in ANSI format with Unicode
characters such as French e German big S etc, and as I read the file these
characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the Stream
Reader the application will recognize it as Unicode characters.

Thank you,
Samuel

Jun 27 '08 #5
Samuel,

Are you sure that the program that is showing you the characters is able to
read and show those characters.

You would not be the first to make the wrong conclussion because of this.

By instance Windows mail (Outlook Express) will not show these characters
as it is not send in HTML format.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:Og**************@TK2MSFTNGP02.phx.gbl...
Hi

I am trying to read text files that are saved in ANSI format with Unicode
characters such as French e German big S etc, and as I read the file these
characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the Stream
Reader the application will recognize it as Unicode characters.
Thank you,
Samuel
Jun 27 '08 #6
Certainly, because when I save the same document in Unicode format I can
read it without any problem

Thank you,
Samuel
"Cor Ligthert[MVP]" <no************@planet.nlwrote in message
news:C3**********************************@microsof t.com...
Samuel,

Are you sure that the program that is showing you the characters is able
to read and show those characters.

You would not be the first to make the wrong conclussion because of this.

By instance Windows mail (Outlook Express) will not show these characters
as it is not send in HTML format.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:Og**************@TK2MSFTNGP02.phx.gbl...
>Hi

I am trying to read text files that are saved in ANSI format with Unicode
characters such as French e German big S etc, and as I read the file
these characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the Stream
Reader the application will recognize it as Unicode characters.
Thank you,
Samuel

Jun 27 '08 #7
Samuel,

By the way what do you mean with Ansi format.

Wikipedia does not know it.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:ej**************@TK2MSFTNGP06.phx.gbl...
Certainly, because when I save the same document in Unicode format I can
read it without any problem

Thank you,
Samuel
"Cor Ligthert[MVP]" <no************@planet.nlwrote in message
news:C3**********************************@microsof t.com...
>Samuel,

Are you sure that the program that is showing you the characters is able
to read and show those characters.

You would not be the first to make the wrong conclussion because of this.

By instance Windows mail (Outlook Express) will not show these
characters as it is not send in HTML format.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:Og**************@TK2MSFTNGP02.phx.gbl...
>>Hi

I am trying to read text files that are saved in ANSI format with
Unicode characters such as French e German big S etc, and as I read the
file these characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the Stream
Reader the application will recognize it as Unicode characters.
Thank you,
Samuel

Jun 27 '08 #8
Open notepad and look in the list of the Encoding - did I have to say
encoding

Regards,
Samuel
"Cor Ligthert[MVP]" <no************@planet.nlwrote in message
news:5C**********************************@microsof t.com...
Samuel,

By the way what do you mean with Ansi format.

Wikipedia does not know it.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:ej**************@TK2MSFTNGP06.phx.gbl...
>Certainly, because when I save the same document in Unicode format I can
read it without any problem

Thank you,
Samuel
"Cor Ligthert[MVP]" <no************@planet.nlwrote in message
news:C3**********************************@microso ft.com...
>>Samuel,

Are you sure that the program that is showing you the characters is able
to read and show those characters.

You would not be the first to make the wrong conclussion because of
this.

By instance Windows mail (Outlook Express) will not show these
characters as it is not send in HTML format.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:Og**************@TK2MSFTNGP02.phx.gbl...
Hi

I am trying to read text files that are saved in ANSI format with
Unicode characters such as French e German big S etc, and as I read the
file these characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the
Stream Reader the application will recognize it as Unicode characters.
Thank you,
Samuel



Jun 27 '08 #9

In the context of the original post, Wikipedia does indeed have a few
articles describing the codes:
http://en.wikipedia.org/wiki/Windows_1252
http://en.wikipedia.org/wiki/ISO-8859-1

Regards,

Joergen Bech

On Thu, 5 Jun 2008 19:16:23 +0200, "Cor Ligthert[MVP]"
<no************@planet.nlwrote:
>Samuel,

By the way what do you mean with Ansi format.

Wikipedia does not know it.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:ej**************@TK2MSFTNGP06.phx.gbl...
>Certainly, because when I save the same document in Unicode format I can
read it without any problem

Thank you,
Samuel
"Cor Ligthert[MVP]" <no************@planet.nlwrote in message
news:C3**********************************@microso ft.com...
>>Samuel,

Are you sure that the program that is showing you the characters is able
to read and show those characters.

You would not be the first to make the wrong conclussion because of this.

By instance Windows mail (Outlook Express) will not show these
characters as it is not send in HTML format.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:Og**************@TK2MSFTNGP02.phx.gbl...
Hi

I am trying to read text files that are saved in ANSI format with
Unicode characters such as French e German big S etc, and as I read the
file these characters appear as squares etc.

I know that if the file would be saved as Unicode this wouldn't be a
problem.

The question is whether there is an option that when I create the Stream
Reader the application will recognize it as Unicode characters.
Thank you,
Samuel


Jun 27 '08 #10
Samuel,

There is often spoken about ANSI when is meant the code table for
characters, but ANSI does give as much information as when you ask for about
a problem with a car and tell that it is an European brand.

The used bit format, or tell that you are using a code table, or even ASCII
although most people are then in fact talking about extended ASCII which is
for English and Dutch code table 437 gives much more information. While
using true ASCII which only has 7 bits characters (in fact 26 alphabetic
ones in upper and lower case).

As you are encoding between those, it will always take the characters as are
in the resulting code.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:OH**************@TK2MSFTNGP03.phx.gbl...
Open notepad and look in the list of the Encoding - did I have to say
encoding

Regards,
Samuel
"Cor Ligthert[MVP]" <no************@planet.nlwrote in message
news:5C**********************************@microsof t.com...
>Samuel,

By the way what do you mean with Ansi format.

Wikipedia does not know it.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:ej**************@TK2MSFTNGP06.phx.gbl...
>>Certainly, because when I save the same document in Unicode format I can
read it without any problem

Thank you,
Samuel
"Cor Ligthert[MVP]" <no************@planet.nlwrote in message
news:C3**********************************@micros oft.com...
Samuel,

Are you sure that the program that is showing you the characters is
able to read and show those characters.

You would not be the first to make the wrong conclussion because of
this.

By instance Windows mail (Outlook Express) will not show these
characters as it is not send in HTML format.

Cor

"Samuel" <sa************@ntlworld.comschreef in bericht
news:Og**************@TK2MSFTNGP02.phx.gbl...
Hi
>
I am trying to read text files that are saved in ANSI format with
Unicode characters such as French e German big S etc, and as I read
the file these characters appear as squares etc.
>
I know that if the file would be saved as Unicode this wouldn't be a
problem.
>
The question is whether there is an option that when I create the
Stream Reader the application will recognize it as Unicode characters.
>
>
Thank you,
Samuel
>

Jun 27 '08 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: wolfgang haefelinger | last post by:
Hi, I wonder whether someone could explain me a bit what's going on here: import sys # I'm running Mandrake 1o and Windows XP. print sys.version ## 2.3.3 (#2, Feb 17 2004, 11:45:40)
30
by: aurora | last post by:
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up...
22
by: Keith MacDonald | last post by:
Hello, Is there a portable (at least for VC.Net and g++) method to convert text between wchar_t and char, using the standard library? I may have missed something obvious, but the section on...
4
by: Basil | last post by:
Hello. I have compiler BC Builder 6.0. I have an example: #include <strstrea.h> int main () { wchar_t ff = {' s','d ', 'f', 'g', 't'};
2
by: Alex Guryanow | last post by:
Hi, I have windows app written in Borland C++ Builder 5.0. Using ODBC driver windows app connects to database on linux server. Database is created with UNICODE encoding. When pg-server is...
1
by: jrs_14618 | last post by:
Hello All, This post is essentially a reply a previous post/thread here on this mailing.database.myodbc group titled: MySQL 4.0, FULL-TEXT Indexing and Search Arabic Data, Unicode I was...
11
by: George Sakkis | last post by:
The following snippet results in different outcome for (at least) the last three major releases: # Python 2.3.4 u'%94' # Python 2.4.2 UnicodeDecodeError: 'ascii' codec can't decode byte...
17
by: Adam Olsen | last post by:
As was seen in another thread, there's a great deal of confusion with regard to surrogates. Most programmers assume Python's unicode type exposes only complete characters. Even CPython's own...
6
by: geegeegeegee | last post by:
Hi All, I have come across a difficult problem to do with extracting UniCode characters from RTF strings. A detailed description of my problem is below, if anyone could help, it would be much...
14
by: Russell E. Owen | last post by:
I have code like this: except Exception, e: self.setState(self.Failed, str(e)) which fails if the exception contains a unicode argument. I did, of course, try unicode(e) but that fails. The...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.