473,756 Members | 3,973 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

UTF-8 preamble -> Possible bug in StreamWriter(or at least strange behaviour..)

Hi,

I generate and temporary saves a text file to disk. Later I upload this file
to Microsoft MapPoint (not so important).
The file needs to be in UTF-8 encoding and I explicitly use the
"Encoding.U TF8" in the constructor like this:

StreamWriter writer = new StreamWriter(fi le, Encoding.UTF8);

When I do this the StreamWriter inserts an UTF-8 preamble "" into the
beginning of the file.
// http://www.chilkatsoft.com/faq/Utf8Preamble.html

MapPoint throws an Exception for this UTF-8 preamble and aborts the parsing
of the file.

The annoying thing is that if I don´t explicitly state the Encoding in the
constructor the documentation for StreamWriter.En coding property says:
"The Encoding specified in the constructor for the current instance, or
UTF8Encoding if an encoding was not specified."

But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).

Without the Encoding in the constructor: "Fältöversten s Teleshop"
With the Encoding in the constructor: "Fältöverst ens Teleshop"

So my question is how can I get ride of this preamble? Because if I get ride
of that everything should work...

Regards
/Oscar

Nov 17 '05 #1
10 4657
But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble is
missing.
Mattias

Nov 17 '05 #2

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöversten s Teleshop"

If I specify UTF8:
"Fältöverst ens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks ok!
//
Response.Clear( );
Response.Charse t = "iso-8859-1";
Response.Conten tEncoding = System.Text.Enc oding.GetEncodi ng("iso-8859-1");
Response.Conten tType = "text/plain";
Response.AddHea der("content-disposition", "attachment ; filename=\"" +
fileName + "\"");
Response.Write( fileData);
Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöversten s Teleshop"
//
file = new FileStream(file Path, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(fi le);
writer.Write(fi leData);

Not messed up but with the preamble...
//
file = new FileStream(file Path, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(fi le, Encoding.UTF8);
writer.Write(fi leData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@ discussions.mic rosoft.com> wrote in message
news:20******** *************** ***********@mic rosoft.com...
But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias

Nov 17 '05 #3
An other thing my fix for this is to read the file into an Byte[] buffer and
get ride of the three first bytes i.e. the preamble...
It feels akward (and very 1990) though and .NET is deemed to have a better
approach for this..

/Oscar
"Oscar Thornell" <no****@interne t.com> wrote in message
news:%2******** *******@TK2MSFT NGP12.phx.gbl.. .

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöversten s Teleshop"

If I specify UTF8:
"Fältöverst ens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks
ok!
//
Response.Clear( );
Response.Charse t = "iso-8859-1";
Response.Conten tEncoding = System.Text.Enc oding.GetEncodi ng("iso-8859-1");
Response.Conten tType = "text/plain";
Response.AddHea der("content-disposition", "attachment ; filename=\"" +
fileName + "\"");
Response.Write( fileData);
Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöversten s Teleshop"
//
file = new FileStream(file Path, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(fi le);
writer.Write(fi leData);

Not messed up but with the preamble...
//
file = new FileStream(file Path, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(fi le, Encoding.UTF8);
writer.Write(fi leData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@ discussions.mic rosoft.com> wrote in
message news:20******** *************** ***********@mic rosoft.com...
But! If I don´t specify the encoding I end up with text that is not
UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to
view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias


Nov 17 '05 #4
"Oscar Thornell" <no****@interne t.com> schrieb im Newsbeitrag
news:%2******** *******@TK2MSFT NGP12.phx.gbl.. .

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöversten s Teleshop" This looks like your text below encoded in UTF-8 and then interpreted as
iso-8859-1 or similar.
If I specify UTF8:
"Fältöverst ens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks
ok!
//
Response.Clear( );
Response.Charse t = "iso-8859-1"; This is not! UTF-8 Response.Conten tEncoding = System.Text.Enc oding.GetEncodi ng("iso-8859-1");
Response.Conten tType = "text/plain";
Response.AddHea der("content-disposition", "attachment ; filename=\"" +
fileName + "\"");
Response.Write( fileData); Here I supose, the Response.Write encodes in iso-8859-1, not in UTF-8. Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöversten s Teleshop"
// That's actually good plain UTF-8, it's only read with an other encoding. file = new FileStream(file Path, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(fi le);
writer.Write(fi leData);

Not messed up but with the preamble...
// How did you read this?
If the reader correctly interprets UTF-8, the preamble should be invisable.
That really puzzles me. file = new FileStream(file Path, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(fi le, Encoding.UTF8);
writer.Write(fi leData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@ discussions.mic rosoft.com> wrote in
message news:20******** *************** ***********@mic rosoft.com...
But! If I don´t specify the encoding I end up with text that is not
UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to
view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias


Nov 17 '05 #5
Oscar Thornell <no****@interne t.com> wrote:
I generate and temporary saves a text file to disk. Later I upload this file
to Microsoft MapPoint (not so important).
The file needs to be in UTF-8 encoding and I explicitly use the
"Encoding.U TF8" in the constructor like this:

StreamWriter writer = new StreamWriter(fi le, Encoding.UTF8);

When I do this the StreamWriter inserts an UTF-8 preamble "" into the
beginning of the file.
// http://www.chilkatsoft.com/faq/Utf8Preamble.html

MapPoint throws an Exception for this UTF-8 preamble and aborts the parsing
of the file.

The annoying thing is that if I don´t explicitly state the Encoding in the
constructor the documentation for StreamWriter.En coding property says:
"The Encoding specified in the constructor for the current instance, or
UTF8Encoding if an encoding was not specified."

But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


That sounds very unliikely. As others have suggested, it sounds like
whatever you're using to read the file is assuming the wrong thing.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

You should be able to provide an example where writing without
specifying an encoding and writing where you specify Encoding.UTF8 make
a difference to the binary output, other than in terms of the existence
of the preamble.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #6
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(file Path, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(fi le); or...
StreamWriter writer = new StreamWriter(fi le, new UTF8Encoding(fa lse));
//Does not insert the preamble
writer.Write(fi leData);

MapPoint serves my client with this: "Fältöversten s Teleshop" instead of
this: "Fältöverst ens Teleshop".

If I create a file with this instantiation of StreamWriter..
StreamWriter writer = new StreamWriter(fi le, Encoding.UTF8);

MapPoint throws an Exception telling me that it does not recognize "".
"The UTF-8 preamble!"

If I take that very file and opens it with a BinaryReader and drops the
three first bytes(the  preamble).
Then uploads it to MapPoint everything works nicely!
No errors and no messed up text!

If I instantiate StreamWriter with:
StreamWriter writer = new StreamWriter(fi le, Encoding.Defaul t);
Everything works directly!
But I do not want to use that method since it is dependent upon the current
coding page in the system.

What I rely can´t understand here is why MapPoint messes up the text with
this code:
StreamWriter writer = new StreamWriter(fi le, new UTF8Encoding(fa lse));

and works with this(if I drop the three first bytes..):
StreamWriter writer = new StreamWriter(fi le, Encoding.UTF8);
//The following code can be used to read the preamble from a file.
//In this case it recognizes UTF-8 and UTF-16.
FileStream stream = new FileStream("The _File.txt", FileMode.Open);
BinaryReader reader = new BinaryReader(st ream);

byte[] buffer = reader.ReadByte s(size);

if ( buffer[0] == 0xff && buffer[1] == 0xfe )
{
//UTF-16
Console.WriteLi ne("UTF-16");
}
else if( buffer[0] == 0xef && buffer[1] == 0xbb && buffer[2] == 0xbf)
{
//UTF-8
Console.WriteLi ne("UTF-8");
}

/Oscar
Nov 17 '05 #7
<"Oscar Thornell" <oscar.thorne ll [ xx] gmail.com>> wrote:
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(file Path, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(fi le); or...
StreamWriter writer = new StreamWriter(fi le, new UTF8Encoding(fa lse));
//Does not insert the preamble
writer.Write(fi leData);

MapPoint serves my client with this: "Fältöversten s Teleshop" instead of
this: "Fältöverst ens Teleshop".


According to what - MapPoint? What's reading the file at that point?
That's the important bit - I bet you'll find the file is actually
exactly the same, just missing the UTF-8 preamble.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #8
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in .NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and so
on...
a) Exporting a datasource in Access 2003 XML format and reading it into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".

/Oscar

"Jon Skeet [C# MVP]" <sk***@pobox.co m> wrote in message
news:MP******** *************** *@msnews.micros oft.com...
<"Oscar Thornell" <oscar.thorne ll [ xx] gmail.com>> wrote:
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts
MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(file Path, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(fi le); or...
StreamWriter writer = new StreamWriter(fi le, new UTF8Encoding(fa lse));
//Does not insert the preamble
writer.Write(fi leData);

MapPoint serves my client with this: "Fältöversten s Teleshop" instead of
this: "Fältöverst ens Teleshop".


According to what - MapPoint? What's reading the file at that point?
That's the important bit - I bet you'll find the file is actually
exactly the same, just missing the UTF-8 preamble.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #9
Oscar Thornell <no****@interne t.com> wrote:
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in .NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and so
on...
a) Exporting a datasource in Access 2003 XML format and reading it into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".


Does the MapPoint documentation not give any indication about which
encodings are supported, or any way of specifying the encoding?

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1792
by: Phil | last post by:
Hi, I don't understand this strange behaviour: I compile this code : #include <Python.h> #include"Numeric/arrayobject.h" static PyObject *
4
5613
by: Ben | last post by:
Hi all, I'm trying to figure out how how complex map, filter and reduce work based on the following piece of code from http://www-106.ibm.com/developerworks/linux/library/l-prog.html : bigmuls = lambda xs,ys: filter(lambda (x,y):x*y > 25, combine(xs,ys)) combine = lambda xs,ys: map(None, xs*len(ys), dupelms(ys,len(xs))) dupelms = lambda lst,n: reduce(lambda s,t:s+t, map(lambda l,n=n: *n, lst))
20
2331
by: Markus Sandheide | last post by:
Hello! Execute these lines: int x = 1; x = x > 2345678901; You will get: x == 1 with Borland C++ Builder
3
4874
by: Sebastian C. | last post by:
Hello everybody Since I upgraded my Office XP Professional to SP3 I got strange behaviour. Pieces of code which works for 3 years now are suddenly stop to work properly. I have Office XP Developer (SP3 for Office, SP1 for developer, JET40SP8) on Windows XP Home Edition (SP1). The same behaviour occurs on Windows 98 too.
1
1259
by: Ian | last post by:
Hi, I have noticed some strange behaviour when loading my database. I load a menu form on startup; this form contains a linked image to display. If the project is not then signed, the menu form loads normally, briefly displaying a loading picture progress window before the form appears. However, if I digitally sign the VB project, the image loading progress window flashes repeatedly many times before the menu finally loads.
31
2628
by: DeltaOne | last post by:
#include<stdio.h> typedef struct test{ int i; int j; }test; main(){ test var; var.i=10; var.j=20;
4
1457
by: ignw82 | last post by:
Hi all, I have a strange behaviour in dataview, maybe you can help me. the behaviour is like this : First I made a datatable (odt) in data set, and then I created a dataview using this datatable. I added a row to data table by using (let say the dataview created is odv) odv.table.rows.add(odr). When i saw in first datatable (odt), there was no rows and when I saw in odv.table.rows.count there was a row.
4
2100
by: Gotch | last post by:
Hi, I'm getting a very strange behaviour while running a project I've done.... Let's expose it: I've two projects. Both of them use a Form to do some Gui stuff. Other threads pack up messages this way like: public class UiMsg { public enum MsgType { StatusOk }; public MsgType Type;
8
5314
by: Dox33 | last post by:
I ran into a very strange behaviour of raw_input(). I hope somebody can tell me how to fix this. (Or is this a problem in the python source?) I will explain the problem by using 3 examples. (Sorry, long email) The first two examples are behaving normal, the thirth is strange....... I wrote the following flabbergasting code: #-------------------------------------------------------------
0
9455
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9271
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
9838
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8709
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7242
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6534
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5302
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3354
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2665
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.