By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,908 Members | 2,056 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,908 IT Pros & Developers. It's quick & easy.

UTF-8 preamble -> Possible bug in StreamWriter(or at least strange behaviour..)

P: n/a
Hi,

I generate and temporary saves a text file to disk. Later I upload this file
to Microsoft MapPoint (not so important).
The file needs to be in UTF-8 encoding and I explicitly use the
"Encoding.UTF8" in the constructor like this:

StreamWriter writer = new StreamWriter(file, Encoding.UTF8);

When I do this the StreamWriter inserts an UTF-8 preamble "" into the
beginning of the file.
// http://www.chilkatsoft.com/faq/Utf8Preamble.html

MapPoint throws an Exception for this UTF-8 preamble and aborts the parsing
of the file.

The annoying thing is that if I don´t explicitly state the Encoding in the
constructor the documentation for StreamWriter.Encoding property says:
"The Encoding specified in the constructor for the current instance, or
UTF8Encoding if an encoding was not specified."

But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).

Without the Encoding in the constructor: "Fältöverstens Teleshop"
With the Encoding in the constructor: "Fältöverstens Teleshop"

So my question is how can I get ride of this preamble? Because if I get ride
of that everything should work...

Regards
/Oscar

Nov 17 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble is
missing.
Mattias

Nov 17 '05 #2

P: n/a

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöverstens Teleshop"

If I specify UTF8:
"Fältöverstens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks ok!
//
Response.Clear();
Response.Charset = "iso-8859-1";
Response.ContentEncoding = System.Text.Encoding.GetEncoding("iso-8859-1");
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment; filename=\"" +
fileName + "\"");
Response.Write(fileData);
Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöverstens Teleshop"
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file);
writer.Write(fileData);

Not messed up but with the preamble...
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
writer.Write(fileData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@discussions.microsoft.com> wrote in message
news:20**********************************@microsof t.com...
But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias

Nov 17 '05 #3

P: n/a
An other thing my fix for this is to read the file into an Byte[] buffer and
get ride of the three first bytes i.e. the preamble...
It feels akward (and very 1990) though and .NET is deemed to have a better
approach for this..

/Oscar
"Oscar Thornell" <no****@internet.com> wrote in message
news:%2***************@TK2MSFTNGP12.phx.gbl...

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöverstens Teleshop"

If I specify UTF8:
"Fältöverstens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks
ok!
//
Response.Clear();
Response.Charset = "iso-8859-1";
Response.ContentEncoding = System.Text.Encoding.GetEncoding("iso-8859-1");
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment; filename=\"" +
fileName + "\"");
Response.Write(fileData);
Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöverstens Teleshop"
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file);
writer.Write(fileData);

Not messed up but with the preamble...
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
writer.Write(fileData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@discussions.microsoft.com> wrote in
message news:20**********************************@microsof t.com...
But! If I don´t specify the encoding I end up with text that is not
UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to
view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias


Nov 17 '05 #4

P: n/a
"Oscar Thornell" <no****@internet.com> schrieb im Newsbeitrag
news:%2***************@TK2MSFTNGP12.phx.gbl...

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöverstens Teleshop" This looks like your text below encoded in UTF-8 and then interpreted as
iso-8859-1 or similar.
If I specify UTF8:
"Fältöverstens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks
ok!
//
Response.Clear();
Response.Charset = "iso-8859-1"; This is not! UTF-8 Response.ContentEncoding = System.Text.Encoding.GetEncoding("iso-8859-1");
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment; filename=\"" +
fileName + "\"");
Response.Write(fileData); Here I supose, the Response.Write encodes in iso-8859-1, not in UTF-8. Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöverstens Teleshop"
// That's actually good plain UTF-8, it's only read with an other encoding. file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file);
writer.Write(fileData);

Not messed up but with the preamble...
// How did you read this?
If the reader correctly interprets UTF-8, the preamble should be invisable.
That really puzzles me. file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
writer.Write(fileData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@discussions.microsoft.com> wrote in
message news:20**********************************@microsof t.com...
But! If I don´t specify the encoding I end up with text that is not
UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to
view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias


Nov 17 '05 #5

P: n/a
Oscar Thornell <no****@internet.com> wrote:
I generate and temporary saves a text file to disk. Later I upload this file
to Microsoft MapPoint (not so important).
The file needs to be in UTF-8 encoding and I explicitly use the
"Encoding.UTF8" in the constructor like this:

StreamWriter writer = new StreamWriter(file, Encoding.UTF8);

When I do this the StreamWriter inserts an UTF-8 preamble "" into the
beginning of the file.
// http://www.chilkatsoft.com/faq/Utf8Preamble.html

MapPoint throws an Exception for this UTF-8 preamble and aborts the parsing
of the file.

The annoying thing is that if I don´t explicitly state the Encoding in the
constructor the documentation for StreamWriter.Encoding property says:
"The Encoding specified in the constructor for the current instance, or
UTF8Encoding if an encoding was not specified."

But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


That sounds very unliikely. As others have suggested, it sounds like
whatever you're using to read the file is assuming the wrong thing.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

You should be able to provide an example where writing without
specifying an encoding and writing where you specify Encoding.UTF8 make
a difference to the binary output, other than in terms of the existence
of the preamble.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #6

P: n/a
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file); or...
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
//Does not insert the preamble
writer.Write(fileData);

MapPoint serves my client with this: "Fältöverstens Teleshop" instead of
this: "Fältöverstens Teleshop".

If I create a file with this instantiation of StreamWriter..
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);

MapPoint throws an Exception telling me that it does not recognize "".
"The UTF-8 preamble!"

If I take that very file and opens it with a BinaryReader and drops the
three first bytes(the  preamble).
Then uploads it to MapPoint everything works nicely!
No errors and no messed up text!

If I instantiate StreamWriter with:
StreamWriter writer = new StreamWriter(file, Encoding.Default);
Everything works directly!
But I do not want to use that method since it is dependent upon the current
coding page in the system.

What I rely can´t understand here is why MapPoint messes up the text with
this code:
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));

and works with this(if I drop the three first bytes..):
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
//The following code can be used to read the preamble from a file.
//In this case it recognizes UTF-8 and UTF-16.
FileStream stream = new FileStream("The_File.txt", FileMode.Open);
BinaryReader reader = new BinaryReader(stream);

byte[] buffer = reader.ReadBytes(size);

if ( buffer[0] == 0xff && buffer[1] == 0xfe )
{
//UTF-16
Console.WriteLine("UTF-16");
}
else if( buffer[0] == 0xef && buffer[1] == 0xbb && buffer[2] == 0xbf)
{
//UTF-8
Console.WriteLine("UTF-8");
}

/Oscar
Nov 17 '05 #7

P: n/a
<"Oscar Thornell" <oscar.thornell [ xx] gmail.com>> wrote:
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file); or...
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
//Does not insert the preamble
writer.Write(fileData);

MapPoint serves my client with this: "Fältöverstens Teleshop" instead of
this: "Fältöverstens Teleshop".


According to what - MapPoint? What's reading the file at that point?
That's the important bit - I bet you'll find the file is actually
exactly the same, just missing the UTF-8 preamble.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #8

P: n/a
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in .NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and so
on...
a) Exporting a datasource in Access 2003 XML format and reading it into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".

/Oscar

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<"Oscar Thornell" <oscar.thornell [ xx] gmail.com>> wrote:
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts
MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file); or...
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
//Does not insert the preamble
writer.Write(fileData);

MapPoint serves my client with this: "Fältöverstens Teleshop" instead of
this: "Fältöverstens Teleshop".


According to what - MapPoint? What's reading the file at that point?
That's the important bit - I bet you'll find the file is actually
exactly the same, just missing the UTF-8 preamble.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #9

P: n/a
Oscar Thornell <no****@internet.com> wrote:
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in .NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and so
on...
a) Exporting a datasource in Access 2003 XML format and reading it into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".


Does the MapPoint documentation not give any indication about which
encodings are supported, or any way of specifying the encoding?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #10

P: n/a
No way of specifying...
I haven´t found any specs. for upload, only what formats a datasource can be
transformed to during an export.

Among those are: "TabDelimitedTextUTF8"...ISO 10646-1:2000 Annex D

So one could assume that UTF8 is supported for "uploads" aswell... :-(

Anyway "ISO 8859-1" seems ok for now so I stick with that...

Regards
/Oscar
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Oscar Thornell <no****@internet.com> wrote:
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data
from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in
.NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and
so
on...
a) Exporting a datasource in Access 2003 XML format and reading it
into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".


Does the MapPoint documentation not give any indication about which
encodings are supported, or any way of specifying the encoding?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.