By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,417 Members | 1,856 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,417 IT Pros & Developers. It's quick & easy.

Reading a text file with spanish accents

P: n/a
I am at an absolute loss on what is going on here. I have a text file
with some Spanish writing. Some of the characters have accents. I have
not found anyway to read this text file and echo the output to the
console showing the accents.

I have tried using UTF-8 but it does not like the accent characters.

It basically converts
Añoro esta situación

to
A?oro esta situaci?n

What am I missing?
Amy
Oct 12 '07 #1
Share this Question
Share on Google+
5 Replies


P: n/a
Amy,

Well, it's possible that you are reading the file correctly from UTF-8,
but the font for the console doesn't support those characters. What is the
font that you are using and does it support those characters?
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Amy L." <am**@paxemail.comwrote in message
news:eJ**************@TK2MSFTNGP05.phx.gbl...
>I am at an absolute loss on what is going on here. I have a text file with
some Spanish writing. Some of the characters have accents. I have not
found anyway to read this text file and echo the output to the console
showing the accents.

I have tried using UTF-8 but it does not like the accent characters.

It basically converts
Añoro esta situación

to
A?oro esta situaci?n

What am I missing?
Amy
Oct 12 '07 #2

P: n/a
Amy,

The Spanish characters are in the 1252 characterset. It is in my idea good
to check that in the Country settings . The way to handle this seems for me
in almost every Windows OS version different, so I cannot tell you that. I
have had problems enough with this where in not every application the
characters were showed right although that was when using combined set 1250
and 1252.

http://msdn2.microsoft.com/en-us/library/aa912040.aspx

Cor

Oct 12 '07 #3

P: n/a
Nicholas Paldino [.NET/C# MVP] wrote:
Amy,

Well, it's possible that you are reading the file correctly from
UTF-8, but the font for the console doesn't support those characters.
What is the font that you are using and does it support those characters?

In testing I decided to print each char to the screen along with its
byte value. The code is merely a (int)c where c is a char.

When using StreamReader with Encoding.UTF8 the ñ gets displayed as a ?
with a code of 65535

When using StreamReader with Encoding.Default the ñ gets displayed as a
ñ with a code of 241

When using FileStream with no encoding (don't believe you can set it)
and than printing the characters of the bytes ñ gets displayed as a ñ
with a code of 241.

When attempting to convert the byte array returned from the FileStream
to a String in UTF8 via below the sting does not convert properly (I get
the ? for the accented characters).

UTF8Encoding temp = new UTF8Encoding( true );
Console.WriteLine( temp.GetString( b ) );

However, if I do
Console.WriteLine( System.Text.Encoding.Default.GetString( b ) );

It prints correctly.

I have read that using "Encoding.Default" is not good - however it seems
to be the only thing that works. I know the characters are for the most
part being read in correctly especially with FileStream. It just seems
like I am lost on what to do about the encoding of them.

Thoughts?
Darrell

Oct 12 '07 #4

P: n/a
Amy L. <am**@paxemail.comwrote:

<snip>
However, if I do
Console.WriteLine( System.Text.Encoding.Default.GetString( b ) );

It prints correctly.

I have read that using "Encoding.Default" is not good - however it seems
to be the only thing that works. I know the characters are for the most
part being read in correctly especially with FileStream. It just seems
like I am lost on what to do about the encoding of them.
*Characters* are not read at all by a FileStream. Bytes are read by a
FileStream. An Encoding is the way of converting between bytes and
characters.

If your file is effectively encoded using Encoding.Default, that's what
you should use. It would be generally better if you were able to start
with a UTF-8 file, but if you can't control whatever produces the file,
then you need to follow its lead.

Picking an encoding is a bit like picking an image format - you might
prefer PNG to BMP, but if someone gives you a BMP file and you try to
read it as if it were a PNG, you won't get the right picture.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Oct 12 '07 #5

P: n/a
Hi Amy,

Just a quick bit of info of the top of my head, (i havent read into
detail about your problem in the above discussion), but what first
comes to mind is why you are trying to use UTF-8 and NOT UTF-16.

The 8 stands for 8bits which is can hold 0-255 decimal values (ala
ASCII character set). UTF-16 was introduced to handle international
character-sets, as it is 16bit, hence a capacity to hold 65536
different characters - from 0 - 65535 (64k)

Hope this helps.

Cheers,
Ben

Oct 12 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.