472,374 Members | 1,551 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,374 software developers and data experts.

UTF-8 preamble -> Possible bug in StreamWriter(or at least strange behaviour..)

Hi,

I generate and temporary saves a text file to disk. Later I upload this file
to Microsoft MapPoint (not so important).
The file needs to be in UTF-8 encoding and I explicitly use the
"Encoding.UTF8" in the constructor like this:

StreamWriter writer = new StreamWriter(file, Encoding.UTF8);

When I do this the StreamWriter inserts an UTF-8 preamble "" into the
beginning of the file.
// http://www.chilkatsoft.com/faq/Utf8Preamble.html

MapPoint throws an Exception for this UTF-8 preamble and aborts the parsing
of the file.

The annoying thing is that if I don´t explicitly state the Encoding in the
constructor the documentation for StreamWriter.Encoding property says:
"The Encoding specified in the constructor for the current instance, or
UTF8Encoding if an encoding was not specified."

But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).

Without the Encoding in the constructor: "Fältöverstens Teleshop"
With the Encoding in the constructor: "Fältöverstens Teleshop"

So my question is how can I get ride of this preamble? Because if I get ride
of that everything should work...

Regards
/Oscar

Nov 17 '05 #1
10 4484
But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble is
missing.
Mattias

Nov 17 '05 #2

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöverstens Teleshop"

If I specify UTF8:
"Fältöverstens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks ok!
//
Response.Clear();
Response.Charset = "iso-8859-1";
Response.ContentEncoding = System.Text.Encoding.GetEncoding("iso-8859-1");
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment; filename=\"" +
fileName + "\"");
Response.Write(fileData);
Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöverstens Teleshop"
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file);
writer.Write(fileData);

Not messed up but with the preamble...
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
writer.Write(fileData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@discussions.microsoft.com> wrote in message
news:20**********************************@microsof t.com...
But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias

Nov 17 '05 #3
An other thing my fix for this is to read the file into an Byte[] buffer and
get ride of the three first bytes i.e. the preamble...
It feels akward (and very 1990) though and .NET is deemed to have a better
approach for this..

/Oscar
"Oscar Thornell" <no****@internet.com> wrote in message
news:%2***************@TK2MSFTNGP12.phx.gbl...

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöverstens Teleshop"

If I specify UTF8:
"Fältöverstens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks
ok!
//
Response.Clear();
Response.Charset = "iso-8859-1";
Response.ContentEncoding = System.Text.Encoding.GetEncoding("iso-8859-1");
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment; filename=\"" +
fileName + "\"");
Response.Write(fileData);
Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöverstens Teleshop"
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file);
writer.Write(fileData);

Not messed up but with the preamble...
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
writer.Write(fileData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@discussions.microsoft.com> wrote in
message news:20**********************************@microsof t.com...
But! If I don´t specify the encoding I end up with text that is not
UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to
view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias


Nov 17 '05 #4
"Oscar Thornell" <no****@internet.com> schrieb im Newsbeitrag
news:%2***************@TK2MSFTNGP12.phx.gbl...

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöverstens Teleshop" This looks like your text below encoded in UTF-8 and then interpreted as
iso-8859-1 or similar.
If I specify UTF8:
"Fältöverstens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks
ok!
//
Response.Clear();
Response.Charset = "iso-8859-1"; This is not! UTF-8 Response.ContentEncoding = System.Text.Encoding.GetEncoding("iso-8859-1");
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment; filename=\"" +
fileName + "\"");
Response.Write(fileData); Here I supose, the Response.Write encodes in iso-8859-1, not in UTF-8. Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöverstens Teleshop"
// That's actually good plain UTF-8, it's only read with an other encoding. file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file);
writer.Write(fileData);

Not messed up but with the preamble...
// How did you read this?
If the reader correctly interprets UTF-8, the preamble should be invisable.
That really puzzles me. file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
writer.Write(fileData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@discussions.microsoft.com> wrote in
message news:20**********************************@microsof t.com...
But! If I don´t specify the encoding I end up with text that is not
UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to
view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias


Nov 17 '05 #5
Oscar Thornell <no****@internet.com> wrote:
I generate and temporary saves a text file to disk. Later I upload this file
to Microsoft MapPoint (not so important).
The file needs to be in UTF-8 encoding and I explicitly use the
"Encoding.UTF8" in the constructor like this:

StreamWriter writer = new StreamWriter(file, Encoding.UTF8);

When I do this the StreamWriter inserts an UTF-8 preamble "" into the
beginning of the file.
// http://www.chilkatsoft.com/faq/Utf8Preamble.html

MapPoint throws an Exception for this UTF-8 preamble and aborts the parsing
of the file.

The annoying thing is that if I don´t explicitly state the Encoding in the
constructor the documentation for StreamWriter.Encoding property says:
"The Encoding specified in the constructor for the current instance, or
UTF8Encoding if an encoding was not specified."

But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


That sounds very unliikely. As others have suggested, it sounds like
whatever you're using to read the file is assuming the wrong thing.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

You should be able to provide an example where writing without
specifying an encoding and writing where you specify Encoding.UTF8 make
a difference to the binary output, other than in terms of the existence
of the preamble.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #6
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file); or...
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
//Does not insert the preamble
writer.Write(fileData);

MapPoint serves my client with this: "Fältöverstens Teleshop" instead of
this: "Fältöverstens Teleshop".

If I create a file with this instantiation of StreamWriter..
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);

MapPoint throws an Exception telling me that it does not recognize "".
"The UTF-8 preamble!"

If I take that very file and opens it with a BinaryReader and drops the
three first bytes(the  preamble).
Then uploads it to MapPoint everything works nicely!
No errors and no messed up text!

If I instantiate StreamWriter with:
StreamWriter writer = new StreamWriter(file, Encoding.Default);
Everything works directly!
But I do not want to use that method since it is dependent upon the current
coding page in the system.

What I rely can´t understand here is why MapPoint messes up the text with
this code:
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));

and works with this(if I drop the three first bytes..):
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
//The following code can be used to read the preamble from a file.
//In this case it recognizes UTF-8 and UTF-16.
FileStream stream = new FileStream("The_File.txt", FileMode.Open);
BinaryReader reader = new BinaryReader(stream);

byte[] buffer = reader.ReadBytes(size);

if ( buffer[0] == 0xff && buffer[1] == 0xfe )
{
//UTF-16
Console.WriteLine("UTF-16");
}
else if( buffer[0] == 0xef && buffer[1] == 0xbb && buffer[2] == 0xbf)
{
//UTF-8
Console.WriteLine("UTF-8");
}

/Oscar
Nov 17 '05 #7
<"Oscar Thornell" <oscar.thornell [ xx] gmail.com>> wrote:
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file); or...
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
//Does not insert the preamble
writer.Write(fileData);

MapPoint serves my client with this: "Fältöverstens Teleshop" instead of
this: "Fältöverstens Teleshop".


According to what - MapPoint? What's reading the file at that point?
That's the important bit - I bet you'll find the file is actually
exactly the same, just missing the UTF-8 preamble.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #8
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in .NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and so
on...
a) Exporting a datasource in Access 2003 XML format and reading it into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".

/Oscar

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<"Oscar Thornell" <oscar.thornell [ xx] gmail.com>> wrote:
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts
MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file); or...
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
//Does not insert the preamble
writer.Write(fileData);

MapPoint serves my client with this: "Fältöverstens Teleshop" instead of
this: "Fältöverstens Teleshop".


According to what - MapPoint? What's reading the file at that point?
That's the important bit - I bet you'll find the file is actually
exactly the same, just missing the UTF-8 preamble.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #9
Oscar Thornell <no****@internet.com> wrote:
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in .NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and so
on...
a) Exporting a datasource in Access 2003 XML format and reading it into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".


Does the MapPoint documentation not give any indication about which
encodings are supported, or any way of specifying the encoding?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #10
No way of specifying...
I haven´t found any specs. for upload, only what formats a datasource can be
transformed to during an export.

Among those are: "TabDelimitedTextUTF8"...ISO 10646-1:2000 Annex D

So one could assume that UTF8 is supported for "uploads" aswell... :-(

Anyway "ISO 8859-1" seems ok for now so I stick with that...

Regards
/Oscar
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Oscar Thornell <no****@internet.com> wrote:
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data
from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in
.NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and
so
on...
a) Exporting a datasource in Access 2003 XML format and reading it
into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".


Does the MapPoint documentation not give any indication about which
encodings are supported, or any way of specifying the encoding?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Phil | last post by:
Hi, I don't understand this strange behaviour: I compile this code : #include <Python.h> #include"Numeric/arrayobject.h" static PyObject *
4
by: Ben | last post by:
Hi all, I'm trying to figure out how how complex map, filter and reduce work based on the following piece of code from http://www-106.ibm.com/developerworks/linux/library/l-prog.html : ...
20
by: Markus Sandheide | last post by:
Hello! Execute these lines: int x = 1; x = x > 2345678901; You will get: x == 1 with Borland C++ Builder
3
by: Sebastian C. | last post by:
Hello everybody Since I upgraded my Office XP Professional to SP3 I got strange behaviour. Pieces of code which works for 3 years now are suddenly stop to work properly. I have Office XP...
1
by: Ian | last post by:
Hi, I have noticed some strange behaviour when loading my database. I load a menu form on startup; this form contains a linked image to display. If the project is not then signed, the menu form...
31
by: DeltaOne | last post by:
#include<stdio.h> typedef struct test{ int i; int j; }test; main(){ test var; var.i=10; var.j=20;
4
by: ignw82 | last post by:
Hi all, I have a strange behaviour in dataview, maybe you can help me. the behaviour is like this : First I made a datatable (odt) in data set, and then I created a dataview using this...
4
by: Gotch | last post by:
Hi, I'm getting a very strange behaviour while running a project I've done.... Let's expose it: I've two projects. Both of them use a Form to do some Gui stuff. Other threads pack up messages...
8
by: Dox33 | last post by:
I ran into a very strange behaviour of raw_input(). I hope somebody can tell me how to fix this. (Or is this a problem in the python source?) I will explain the problem by using 3 examples....
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
1
by: Matthew3360 | last post by:
Hi, I have been trying to connect to a local host using php curl. But I am finding it hard to do this. I am doing the curl get request from my web server and have made sure to enable curl. I get a...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
2
by: Ricardo de Mila | last post by:
Dear people, good afternoon... I have a form in msAccess with lots of controls and a specific routine must be triggered if the mouse_down event happens in any control. Than I need to discover what...
1
by: Johno34 | last post by:
I have this click event on my form. It speaks to a Datasheet Subform Private Sub Command260_Click() Dim r As DAO.Recordset Set r = Form_frmABCD.Form.RecordsetClone r.MoveFirst Do If...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.