StreamReader / StreamWriter Encoding

Jaroslav Jakes

Hi,

please help.

Sounds so simple. We receive textfiles (customer orders) as e-mail
attachment. These textfiles contain a simple structure of orders, like:
custno, itemno, qty, text

Since these textfile are made on different systems, the field "text" causes
some trouble.

Characters like ä, ö, ü are not convertet in each case correctly.

The source code looks like:

- open (streamread, encoding = default, detect encoding = true) textfile
- convert to a new structure
- write (streamwriter) new textfile

What would you suggest? How could we "detect" the encoding of the file in
order to convert the text-field correctly?

Thanks and regards - Jari

Nov 16 '05 #1

Subscribe Reply

9360

Marcin Grzêbski

Hi Jaroslav,

I can recommend to use a byte (characted) histogram to determine
frequency of occuring character codes (from 128 to 255).
If you will compare those values to GERMAN, POLISH (or any other
encoding) "special" codes then you can guess source encoding.

It can be more sophistricated (e.g. dictionary-based) algorithm
to eliminate errors.

HTH
Marcin

Hi,

please help.

Sounds so simple. We receive textfiles (customer orders) as e-mail
attachment. These textfiles contain a simple structure of orders, like:
custno, itemno, qty, text

Since these textfile are made on different systems, the field "text" causes
some trouble.

Characters like ä, ö, ü are not convertet in each case correctly.

The source code looks like:

- open (streamread, encoding = default, detect encoding = true) textfile
- convert to a new structure
- write (streamwriter) new textfile

What would you suggest? How could we "detect" the encoding of the file in
order to convert the text-field correctly?

Thanks and regards - Jari

Nov 16 '05 #2

Jaroslav Jakes

Hi Marcin,

do you have a link for samples or further description? Sorry, don't know,
how to do that...

Thanks and regards - Jari

"Marcin Grzêbski" <mg*******@taxu ssi.no.com.spam .pl> schrieb im Newsbeitrag
news:ct******** **@atlantis.new s.tpi.pl...

Hi Jaroslav,

I can recommend to use a byte (characted) histogram to determine
frequency of occuring character codes (from 128 to 255).
If you will compare those values to GERMAN, POLISH (or any other
encoding) "special" codes then you can guess source encoding.

It can be more sophistricated (e.g. dictionary-based) algorithm
to eliminate errors.

HTH
Marcin
Hi,

please help.

Sounds so simple. We receive textfiles (customer orders) as e-mail
attachment. These textfiles contain a simple structure of orders, like:
custno, itemno, qty, text

Since these textfile are made on different systems, the field "text" causes some trouble.

Characters like ä, ö, ü are not convertet in each case correctly.

The source code looks like:

- open (streamread, encoding = default, detect encoding = true) textfile
- convert to a new structure
- write (streamwriter) new textfile

What would you suggest? How could we "detect" the encoding of the file in order to convert the text-field correctly?

Thanks and regards - Jari

Nov 16 '05 #3

Marcin Grzêbski

hmmm...
I don't know any links or samples but i'm sure that your problem
occured at this group some time ago.

I can show you a concept of this alghorithm:

int germanEncodingC ounter=0;
int polishEncodingC ounter=0;
byte[] bytesOfText; // a table with bytes of text file

// i don't know a german char-codes so i used a random numbers
for(int i=0; i<bytesOfText.L ength; i++) {
swith( butesOfText[i] ) {
case 170:
germanEncodingC ounter++;
break;
case 163:
germanEncodingC ounter++;
polishEncodingC ounter++; // £
break;
case 175:
polishEncodingC ounter++; // ¯
break;
}
}

if( polishEncodingC ounter>0
|| germanEncodingC ounter>0 ) {
if( germanEncodingC ounter>polishEn codingCounter ) {
// it looks like a german encoding
}
else if( polishEncodingC ounter>germanEn codingCounter ) {
// it looks like a polish encoding
}
else {
// i'm confused??
}
}
else {
// encoding not found!
}

HTH
Marcin

Hi Marcin,

do you have a link for samples or further description? Sorry, don't know,
how to do that...

Thanks and regards - Jari

"Marcin Grzêbski" <mg*******@taxu ssi.no.com.spam .pl> schrieb im Newsbeitrag
news:ct******** **@atlantis.new s.tpi.pl...
Hi Jaroslav,

I can recommend to use a byte (characted) histogram to determine
frequency of occuring character codes (from 128 to 255).
If you will compare those values to GERMAN, POLISH (or any other
encoding) "special" codes then you can guess source encoding.

It can be more sophistricated (e.g. dictionary-based) algorithm
to eliminate errors.

HTH
Marcin

Hi,

please help.

Sounds so simple. We receive textfiles (customer orders) as e-mail
attachment . These textfiles contain a simple structure of orders, like:
custno, itemno, qty, text

Since these textfile are made on different systems, the field "text"
causes
some trouble.

Characters like ä, ö, ü are not convertet in each case correctly.

The source code looks like:

- open (streamread, encoding = default, detect encoding = true) textfile
- convert to a new structure
- write (streamwriter) new textfile

What would you suggest? How could we "detect" the encoding of the file
in
order to convert the text-field correctly?

Thanks and regards - Jari

Nov 16 '05 #4

Jaroslav Jakes

Hi Marcin,

thanks! I understood what I am to do...

Regards - Jari

"Marcin Grzêbski" <mg*******@taxu ssi.no.com.spam .pl> schrieb im Newsbeitrag
news:ct******** **@atlantis.new s.tpi.pl...

hmmm...
I don't know any links or samples but i'm sure that your problem
occured at this group some time ago.

I can show you a concept of this alghorithm:

int germanEncodingC ounter=0;
int polishEncodingC ounter=0;
byte[] bytesOfText; // a table with bytes of text file

// i don't know a german char-codes so i used a random numbers
for(int i=0; i<bytesOfText.L ength; i++) {
swith( butesOfText[i] ) {
case 170:
germanEncodingC ounter++;
break;
case 163:
germanEncodingC ounter++;
polishEncodingC ounter++; // £
break;
case 175:
polishEncodingC ounter++; // ¯
break;
}
}

if( polishEncodingC ounter>0
|| germanEncodingC ounter>0 ) {
if( germanEncodingC ounter>polishEn codingCounter ) {
// it looks like a german encoding
}
else if( polishEncodingC ounter>germanEn codingCounter ) {
// it looks like a polish encoding
}
else {
// i'm confused??
}
}
else {
// encoding not found!
}

HTH
Marcin
Hi Marcin,

do you have a link for samples or further description? Sorry, don't know, how to do that...

Thanks and regards - Jari

"Marcin Grzêbski" <mg*******@taxu ssi.no.com.spam .pl> schrieb im Newsbeitrag news:ct******** **@atlantis.new s.tpi.pl...
Hi Jaroslav,

I can recommend to use a byte (characted) histogram to determine
frequency of occuring character codes (from 128 to 255).
If you will compare those values to GERMAN, POLISH (or any other
encoding) "special" codes then you can guess source encoding.

It can be more sophistricated (e.g. dictionary-based) algorithm
to eliminate errors.

HTH
Marcin
Hi,

please help.

Sounds so simple. We receive textfiles (customer orders) as e-mail
attachment . These textfiles contain a simple structure of orders, like:
custno, itemno, qty, text

Since these textfile are made on different systems, the field "text"

causes
some trouble.

Characters like ä, ö, ü are not convertet in each case correctly.

The source code looks like:

- open (streamread, encoding = default, detect encoding = true) textfile- convert to a new structure
- write (streamwriter) new textfile

What would you suggest? How could we "detect" the encoding of the file

in
order to convert the text-field correctly?

Thanks and regards - Jari

Nov 16 '05 #5

Similar topics

3232

Navigating in a textfile with StreamReader?

by: Jaga | last post by:

Hi, how can I read the same passage in a textfile several times? I'm writing a little textgenerator. It reads lines from a file, replaces the variables, and writes it in an other file. Some lines I have to read several times and write them with other values. What is wrong with this code: using (StreamWriter swr = new...

C# / C Sharp

4577

streamReader & StreamWriter understanding

by: ShadowOfTheBeast | last post by:

Hi, I have got a major headache understanding streamReader and streamWriter relationship. I know how to use the streamreader and streamwriter independently. but how do you write out using the streamwriter, what you have read into a streamReader? and also can someone explain how they work in simple terms -- The Matrix Insurrection

C# / C Sharp

4664

Problem about StreamReader and StreamWriter

by: R.L. | last post by:

See the code below, var 'content ' is suppose to be "Hello!", not "". Who knows why? Thanks ---------------------------------------- string text = "hello!"; MemoryStream stream = new MemoryStream(); StreamWriter streamWriter = new StreamWriter(stream, Encoding.ASCII); streamWriter.Write(text);

C# / C Sharp

326

Reading copyright character from file with StreamReader

by: Ed West | last post by:

Hi, I am trying to read a file, make changes, and write it to a new file. The original file has the copyright character © which is ascii 169 I believe, which is more than 7 bits. I am using typical StreamReader object to read in the file, but it is not getting it correctly. If I make the encoding type Ascii, it turns it into a question...

C# / C Sharp

2073

Discrepancies between FileGet and StreamReader

by: vvenk | last post by:

Hello: When I use either one to read a Text file, I get the same result. The length of the string that the file's content has been written into is the same. However, if the file is binary, FileGet gets me the correct content while StreamReader gives me a truncated string. Can somebody advise me why? Should I be using BinaryReader...

Visual Basic .NET

31692

StreamReader Encoding ansi problem

by: LucaJonny | last post by:

Hi, I've got a problem using StreamReader in VB.NET. I try to read a txt file that contains extended characters and theese are removed from the line that is being read. I've read a lot of articles about ANSI encoding like this http://support.microsoft.com/default.aspx?scid=kb;en-us;889835 but System.Text.Encoding.Default don't work!!

Visual Basic .NET

3096

StreamReader/StreamWriter problem

by: Thelonious Monk | last post by:

I have a problem where some data is being eliminated. The problem is that the data contains signed numeric fields (the low-order byte of a negative number uses the first 4 bits as a sign and the last 4 bits as the low-order digit. This produces byte values higher than X'7F'. To be more specific these values are hexadecimal X'B0' through...

Visual Basic .NET

6864

Junk characters when using StreamReader and StreamWriter

by: Rob | last post by:

Hi, I have a VB.Net application that parses an HTML file. This file was an MS Word document that was saved as web page. My application removes all unnecessary code generated by MS Word and does some custom formatting needed by my client. I use a StreamReader to read in the file...regular expressions to parse and clean up the file...and a...

Visual Basic .NET

2337

Streamreader unable to recognise German Characters in Ansi file (Ä / Ø)

by: rajana | last post by:

Dear All, We have Ansi file with german characters (Ä / Ø) , We are using Streamreader to read the contents of the file. But Readline() not able to read the German characters. We tried all possibilities of calling the streamreader, but nothing worked. Dim sr As StreamReader = New StreamReader(Filename, System.Text.Encoding.Default,...

Visual Basic .NET

7703

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...

General

7619

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

7983

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...

General

5514

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...

Microsoft Access / VBA

5228

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3662

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

Networking - Hardware / Configuration

3651

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1229

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

950

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

General