473,774 Members | 2,252 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Characters missing when reading from file.

I'm trying to read a text file that contains international
(specifically Polish) characters line by line. I'm using the following
C# code:

FileStream lStream = new FileStream(pFil eName, FileMode.Open);
using (StreamReader lReader = new StreamReader(lS tream))
{
string lLine;
while ((lLine = lReader.ReadLin e()) != null)
ProcessLine(/* blah..blah */);
}

The problem is that all Polish characters are missing. It doesn't even
show them incorrectly. It just completely drops the Polish chars and
the string is shorter than expected as a result. Does anyone know how
to fix this?

Aug 28 '06 #1
8 1727
Bart,

Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(St ream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.

Michael

"ba***********@ gmail.com" wrote:
I'm trying to read a text file that contains international
(specifically Polish) characters line by line. I'm using the following
C# code:

FileStream lStream = new FileStream(pFil eName, FileMode.Open);
using (StreamReader lReader = new StreamReader(lS tream))
{
string lLine;
while ((lLine = lReader.ReadLin e()) != null)
ProcessLine(/* blah..blah */);
}

The problem is that all Polish characters are missing. It doesn't even
show them incorrectly. It just completely drops the Polish chars and
the string is shorter than expected as a result. Does anyone know how
to fix this?

Aug 28 '06 #2
Michael wrote:
Bart,

Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(St ream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.
Thanks. Do you know where I can get more information about the
character encoding?

Regards,
Bart.

Aug 29 '06 #3
That's the real question isn't it! :) Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!

Michael

"ba***********@ gmail.com" wrote:
Michael wrote:
Bart,

Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(St ream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.

Thanks. Do you know where I can get more information about the
character encoding?

Regards,
Bart.

Aug 29 '06 #4

Michael wrote:
That's the real question isn't it! :) Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!
I found out that the file is in ASCII using the Eastern European code
page, and that's why it doesn't work. My question was where can I get
more information about using character encodings and conversions in
..NET, so that I can make it work. I found the MSDN documentation to be
rather short.

Thanks,
Bart.

Aug 29 '06 #5

You mean ANSI then, right? Take a look at
System.Text.Enc oding.GetEncodi ng().

Resources to help you. Good question. I've bene fortunate, the last time I
had to deal with this was many years ago as we have been able to ensure that
files that we needed to parse used UTF8. Try:

Links -
overview - http://www.yoda.arachsys.com/csharp/unicode.html
MS's Global Dev Portal - http://www.microsoft.com/globaldev/default.mspx

Books (I haven't look at any of these so don't know how good they are) -
.NET Internationaliz ation: The Developer's Guide to Building Global
Windows and Web Applications - http://www.bookpool.com/sm/0321341384
Internationaliz ation and Localization Using Microsoft .NET -
http://www.bookpool.com/sm/1590590023

Michael

"ba***********@ gmail.com" wrote:
>
Michael wrote:
That's the real question isn't it! :) Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!

I found out that the file is in ASCII using the Eastern European code
page, and that's why it doesn't work. My question was where can I get
more information about using character encodings and conversions in
..NET, so that I can make it work. I found the MSDN documentation to be
rather short.

Thanks,
Bart.

Aug 30 '06 #6
Bart,

Maybe does this help you to find the right code page you have to convert.

http://www.vb-tips.com/dbPages.aspx?...f-76c81839e6c9

As the v is not used in Polish, does the rest of the world as far as I know
not use the l with hypen in it and therefore everybody outside Polen is
mostly saying Walensa.

You should see what "wauwelen" means in Dutch as you are not a fan of him

:-)

Cor

<ba***********@ gmail.comschree f in bericht
news:11******** **************@ b28g2000cwb.goo glegroups.com.. .
>
Michael wrote:
>That's the real question isn't it! :) Unfortunately, that really
depends on
the source of the file. If you are unable to ask the person that created
the
file, try Unicode and keep your fingers crossed!

I found out that the file is in ASCII using the Eastern European code
page, and that's why it doesn't work. My question was where can I get
more information about using character encodings and conversions in
.NET, so that I can make it work. I found the MSDN documentation to be
rather short.

Thanks,
Bart.

Aug 30 '06 #7
You probably need to find out what encoding (or codepage) was used to
write the file, and pass that in, e.g.

new StreamReader(IS tream, Encoding.UTF8)

or - if the file has byte order marks at the start, you /may/ be able
to auto-detect:

new StreamReader(IS tream, true)

Marc

Aug 31 '06 #8
Michael wrote:
You mean ANSI then, right? Take a look at
System.Text.Enc oding.GetEncodi ng().
<snip>

Thanks. It works with GetEncoding(125 0). The link you provided contains
some useful information too.

Regards,
Bart.

Aug 31 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

28
2803
by: wwj | last post by:
void main() { char* p="Hello"; printf("%s",p); *p='w'; printf("%s",p); }
5
3413
by: SunSmile | last post by:
Hi, I am logging my exceptions to a word document(*.doc). After the size of word document is 5KB. I am archiving the word document to *.doc.1 Here when I am archiving the word document to *.doc.1, the new line characters("\r\n") are lost and the *.doc.1 document is instead displaying a null square space. I am using Log4Net to log the exceptions.
2
2891
by: Ross Noe via .NET 247 | last post by:
(Type your message here) -------------------------------- From: Ross Noe I created an XML file using ASP that has Russian characters. Forsome reason ASP.Net doesn't read the Russian charactersproperly. Is .net capable of reading Russian characters from anXML file created with ASP? The application that the users use toupdate the XML file is still written in ASP and not in ASP.Net.Thanks in advance. ----------------------- Posted by a...
3
5127
by: Chip | last post by:
There is surprisingly little information on the various encoding options for reading a text file. I have what seems to be a very basic issue: I'm reading a text file that includes Spanish characters such as "ñ". When I read the file into a string, that character is missing. Encoding seems to be the culprit. File writers SHOULD begin a file with the BOM (Byte Order Mark) to let us know what encoding to read the file with, but most software...
16
3427
by: Hugh Janus | last post by:
Hi all, I posted a couple of weeks ago with what I thought was a problem with the file system reading accented characters however, after debugging line by line I have now found the true problem. I am storing a list of files in an XML file as a sort of database. Some of these filenames have accented characters (i.e. á é í ó ú or ñ). However, upon writing the filename to the XML file, the accented character is dropped. This causes a...
8
3713
by: david.lindsay.green | last post by:
Hello all, I am quite new a web scripting and making web pages in general and I have stumbled across a problem I have as yet been unable to solve. I am trying to take the contents of a textarea box and save it to a file. This step is not to hard however the contents of the textarea is mostly latex source so it contains just about every special character you can imagine. My question is this, how do I save an exact copy of the textarea...
2
2554
by: Engineerik | last post by:
using vb.net 2003, I am reading an ascii text file which is shared with a legacy DOS program. The characters "«" (ascii code 171) and "¬" (ascii code 172) are used in the file. The DOS app reads these as binary input and when converted to text they translate to characters "½" (ascii code 189) and "¼" (ascii code 188) respectively. When using a StreamReader to read the file the characters do not appear at all. That is the line in...
2
2454
by: ricky | last post by:
Can anybody help with the function to get rid of extra characters in the file. I want to remove the string from the file.So i read from input file and pass the string say "john" if found dnt write it to the ouput file but if not found write all the line to the output file so i read line by line cin.get(input,line) if(line != s) output<<line
14
5772
by: Zoro | last post by:
My task is to read html files from disk and save them onto SQL Server database field. I have created an nvarchar(max) field to hold them. The problem is that some characters, particularly html entities, and French/German special characters are lost and/or replaced by a question mark. This is really frustrating. I have tried using StreamReader with ALL the encodings available and none work correctly. Each encoding handles some characters...
0
10267
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9914
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8939
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7463
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6717
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5355
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5484
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4012
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2852
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.