473,401 Members | 2,127 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,401 software developers and data experts.

An input UTF-8 encoded file is output as a ANSI encoded file. Why?

I am inputing a UTF-8 encoded file into memory using the following code...

Expand|Select|Wrap|Line Numbers
  1.             try
  2.             {
  3.                 StreamReader readFile = new StreamReader(pathNames[0]);
  4.                 while (line != null)
  5.                 {
  6.                     line = readFile.ReadLine();
  7.                     compareData.Add(line);
  8.                 }
  9.             }
  10.             catch (Exception f)
  11.             {
  12.                 Console.WriteLine(f.Message);
  13.                 Console.ReadLine();
  14.             }
  15.  
The input data includes some special characters such as "ü" and "ß". When I check the input "line" in debug mode, and the subsequent output file, those characters are look like this "�". Why?

Joe
Apr 28 '12 #1
5 2290
RhysW
70
because the software you are storing it in deosnt know what those characters are, those characters aren't part of the font thats being used, there just isnt an equivalent graphical representation of the code value of those characters.
Apr 30 '12 #2
@RhysW
The software is C# 2010 version, and it does recognize it, because I can key in alt 129 in a string and the character ü appears in the software. No, this has something to do with the way that the file is being read, I just don't know what! I have just seen that if I open the txt file using Exel the same thing happens, but if I open the file with notepad the characters are OK. But hey! thank you for at least making a sensible suggestion, I appreaciate it. Joe
Apr 30 '12 #3
RhysW
70
no i mean the file, not vis studio or its equivalent, i mean the literal file that its being stored in, as in if youre reading from notepad i think if you opened notepad it would show that questionmark not the character. if you open up some files in notepad and it deosnt know the symbol it displays that questionmark in its place, this might be the problem though i havent checked

Edit: though hecking in notepad it does support those characters, so im not sure, what sortware is the file actually stored as?
Apr 30 '12 #4
Thanks for your input and sorry I did not reply sooner, but I managed to get round the problem, I think. I opened the txt file with notepad copied and pasted the whole file into a new notepad and saved it as a UTF-8 file. The C# program seems happy with this but exel still doesn't like. I have seen some funny things in my life in the IT world but this is got to be one of the strang ones. I bet the answere is really simple, but don't have time to investigate. Once again, thank you for your input. Joe
May 3 '12 #5
Plater
7,872 Expert 4TB
The default encoding is ASCII in us-en locale.
Did you set the encoding type of the output stream? For instance streamwriter takes an encoding paramater (which could be utf-8)
May 11 '12 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Lakshmi Narayanan | last post by:
Hi experts, My problem is, for password <input name="password" type="password"> element the size given is 20. For another one <input name="username"> is also 20. But in browser the size differs...
16
by: lawrence | last post by:
I was told in another newsgroup (about XML, I was wondering how to control user input) that most modern browsers empower the designer to cast the user created input to a particular character...
4
by: Madha K | last post by:
I am developing a web application that need to support UTF-8 characters. For client side validations, can javascript be used to handle UTF-8 characters. 1) How javascript can be used to restrict...
23
by: lawrence | last post by:
I'd love to ask why this page is not rendering correctly in Safari on a Macintosh but I suspect someone will tell me to validate the page first. Nevertheless, if anyone sees an obvious reason that...
20
by: Jacky Cheung | last post by:
Hi, I am developing a vCard application which have to support UTF-8. Does the UTF-8 in char* will crash the strlen, I mean does UTF-8 have some char which treat as NULL character in strlen? ...
12
by: Rafał Maj Raf256 | last post by:
Hi, I have an UNICODE text file endcoded in UTF-8. I should store the UNICODE strings in my program for example in std::wstring right? To be able to work on them normally, so that std::wstring...
5
by: Kamal R. Prasad | last post by:
Hello, I am using a lexer (lex specification supplied to lex) to parse data, and one of the requirements is to handle UTF-8 characters. My understanding is that the first non-ascii character's...
3
by: Nobody | last post by:
I'm trying to put together code to deal with a SOAP with attachements response, and I'd like to process the response in a single pass. The SOAP with attachments specification returns XML in a MIME...
10
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email...
44
by: Kulgan | last post by:
Hi I am struggling to find definitive information on how IE 5.5, 6 and 7 handle character input (I am happy with the display of text). I have two main questions: 1. Does IE automaticall...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.