I'm trying to read a text file that contains international
(specifically Polish) characters line by line. I'm using the following
C# code:
FileStream lStream = new FileStream(pFil eName, FileMode.Open);
using (StreamReader lReader = new StreamReader(lS tream))
{
string lLine;
while ((lLine = lReader.ReadLin e()) != null)
ProcessLine(/* blah..blah */);
}
The problem is that all Polish characters are missing. It doesn't even
show them incorrectly. It just completely drops the Polish chars and
the string is shorter than expected as a result. Does anyone know how
to fix this? 8 1727
Bart,
Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(St ream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.
Michael
"ba***********@ gmail.com" wrote:
I'm trying to read a text file that contains international
(specifically Polish) characters line by line. I'm using the following
C# code:
FileStream lStream = new FileStream(pFil eName, FileMode.Open);
using (StreamReader lReader = new StreamReader(lS tream))
{
string lLine;
while ((lLine = lReader.ReadLin e()) != null)
ProcessLine(/* blah..blah */);
}
The problem is that all Polish characters are missing. It doesn't even
show them incorrectly. It just completely drops the Polish chars and
the string is shorter than expected as a result. Does anyone know how
to fix this?
Michael wrote:
Bart,
Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(St ream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.
Thanks. Do you know where I can get more information about the
character encoding?
Regards,
Bart.
That's the real question isn't it! :) Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!
Michael
"ba***********@ gmail.com" wrote:
Michael wrote:
Bart,
Just making a guess on this one. Do you know what encoding the Polish file
is in? Check out the StreamReader(St ream, Encoding) constructor. By default
the stream is read in UTF8Encoding. Chaging to the other constructor allows
you to specify ASCII, Unicode, UTF7 or UTF8.
Thanks. Do you know where I can get more information about the
character encoding?
Regards,
Bart.
Michael wrote:
That's the real question isn't it! :) Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!
I found out that the file is in ASCII using the Eastern European code
page, and that's why it doesn't work. My question was where can I get
more information about using character encodings and conversions in
..NET, so that I can make it work. I found the MSDN documentation to be
rather short.
Thanks,
Bart.
You mean ANSI then, right? Take a look at
System.Text.Enc oding.GetEncodi ng().
Resources to help you. Good question. I've bene fortunate, the last time I
had to deal with this was many years ago as we have been able to ensure that
files that we needed to parse used UTF8. Try:
Links -
overview - http://www.yoda.arachsys.com/csharp/unicode.html
MS's Global Dev Portal - http://www.microsoft.com/globaldev/default.mspx
Books (I haven't look at any of these so don't know how good they are) -
.NET Internationaliz ation: The Developer's Guide to Building Global
Windows and Web Applications - http://www.bookpool.com/sm/0321341384
Internationaliz ation and Localization Using Microsoft .NET - http://www.bookpool.com/sm/1590590023
Michael
"ba***********@ gmail.com" wrote:
>
Michael wrote:
That's the real question isn't it! :) Unfortunately, that really depends on
the source of the file. If you are unable to ask the person that created the
file, try Unicode and keep your fingers crossed!
I found out that the file is in ASCII using the Eastern European code
page, and that's why it doesn't work. My question was where can I get
more information about using character encodings and conversions in
..NET, so that I can make it work. I found the MSDN documentation to be
rather short.
Thanks,
Bart.
Bart,
Maybe does this help you to find the right code page you have to convert. http://www.vb-tips.com/dbPages.aspx?...f-76c81839e6c9
As the v is not used in Polish, does the rest of the world as far as I know
not use the l with hypen in it and therefore everybody outside Polen is
mostly saying Walensa.
You should see what "wauwelen" means in Dutch as you are not a fan of him
:-)
Cor
<ba***********@ gmail.comschree f in bericht
news:11******** **************@ b28g2000cwb.goo glegroups.com.. .
>
Michael wrote:
>That's the real question isn't it! :) Unfortunately, that really depends on the source of the file. If you are unable to ask the person that created the file, try Unicode and keep your fingers crossed!
I found out that the file is in ASCII using the Eastern European code
page, and that's why it doesn't work. My question was where can I get
more information about using character encodings and conversions in
.NET, so that I can make it work. I found the MSDN documentation to be
rather short.
Thanks,
Bart.
You probably need to find out what encoding (or codepage) was used to
write the file, and pass that in, e.g.
new StreamReader(IS tream, Encoding.UTF8)
or - if the file has byte order marks at the start, you /may/ be able
to auto-detect:
new StreamReader(IS tream, true)
Marc
Michael wrote:
You mean ANSI then, right? Take a look at
System.Text.Enc oding.GetEncodi ng().
<snip>
Thanks. It works with GetEncoding(125 0). The link you provided contains
some useful information too.
Regards,
Bart. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: wwj |
last post by:
void main()
{
char* p="Hello";
printf("%s",p);
*p='w';
printf("%s",p);
}
|
by: SunSmile |
last post by:
Hi,
I am logging my exceptions to a word document(*.doc). After the size of
word document is 5KB. I am archiving the word document to *.doc.1
Here when I am archiving the word document to *.doc.1, the new line
characters("\r\n") are lost and the *.doc.1 document is instead displaying a
null square space.
I am using Log4Net to log the exceptions.
|
by: Ross Noe via .NET 247 |
last post by:
(Type your message here)
--------------------------------
From: Ross Noe
I created an XML file using ASP that has Russian characters. Forsome reason ASP.Net doesn't read the Russian charactersproperly. Is .net capable of reading Russian characters from anXML file created with ASP? The application that the users use toupdate the XML file is still written in ASP and not in ASP.Net.Thanks in advance.
-----------------------
Posted by a...
|
by: Chip |
last post by:
There is surprisingly little information on the various encoding options for
reading a text file. I have what seems to be a very basic issue: I'm reading
a text file that includes Spanish characters such as "ñ". When I read the
file into a string, that character is missing. Encoding seems to be the
culprit. File writers SHOULD begin a file with the BOM (Byte Order Mark) to
let us know what encoding to read the file with, but most software...
|
by: Hugh Janus |
last post by:
Hi all,
I posted a couple of weeks ago with what I thought was a problem with
the file system reading accented characters however, after debugging
line by line I have now found the true problem.
I am storing a list of files in an XML file as a sort of database.
Some of these filenames have accented characters (i.e. á é í ó ú
or ñ). However, upon writing the filename to the XML file, the
accented character is dropped. This causes a...
| |
by: david.lindsay.green |
last post by:
Hello all, I am quite new a web scripting and making web pages in
general and I have stumbled across a problem I have as yet been unable
to solve. I am trying to take the contents of a textarea box and save
it to a file. This step is not to hard however the contents of the
textarea is mostly latex source so it contains just about every special
character you can imagine. My question is this, how do I save an exact
copy of the textarea...
|
by: Engineerik |
last post by:
using vb.net 2003, I am reading an ascii text file which is shared with a
legacy DOS program. The characters "«" (ascii code 171) and "¬" (ascii code
172) are used in the file. The DOS app reads these as binary input and when
converted to text they translate to characters "½" (ascii code 189) and "¼"
(ascii code 188) respectively.
When using a StreamReader to read the file the characters do not appear at
all. That is the line in...
|
by: ricky |
last post by:
Can anybody help with the function to get rid of extra characters in
the file.
I want to remove the string from the file.So i read from input file and
pass the string say "john" if found dnt write it to the ouput file but
if not found write all the line to the output file
so i read line by line
cin.get(input,line)
if(line != s)
output<<line
|
by: Zoro |
last post by:
My task is to read html files from disk and save them onto SQL Server
database field. I have created an nvarchar(max) field to hold them.
The problem is that some characters, particularly html entities, and
French/German special characters are lost and/or replaced by a
question mark.
This is really frustrating. I have tried using StreamReader with ALL
the encodings available and none work correctly. Each encoding handles
some characters...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
| |
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |