473,404 Members | 2,213 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

UTF-8 preamble -> Possible bug in StreamWriter(or at least strange behaviour..)

Hi,

I generate and temporary saves a text file to disk. Later I upload this file
to Microsoft MapPoint (not so important).
The file needs to be in UTF-8 encoding and I explicitly use the
"Encoding.UTF8" in the constructor like this:

StreamWriter writer = new StreamWriter(file, Encoding.UTF8);

When I do this the StreamWriter inserts an UTF-8 preamble "" into the
beginning of the file.
// http://www.chilkatsoft.com/faq/Utf8Preamble.html

MapPoint throws an Exception for this UTF-8 preamble and aborts the parsing
of the file.

The annoying thing is that if I don´t explicitly state the Encoding in the
constructor the documentation for StreamWriter.Encoding property says:
"The Encoding specified in the constructor for the current instance, or
UTF8Encoding if an encoding was not specified."

But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).

Without the Encoding in the constructor: "Fältöverstens Teleshop"
With the Encoding in the constructor: "Fältöverstens Teleshop"

So my question is how can I get ride of this preamble? Because if I get ride
of that everything should work...

Regards
/Oscar

Nov 17 '05 #1
10 4597
But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble is
missing.
Mattias

Nov 17 '05 #2

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöverstens Teleshop"

If I specify UTF8:
"Fältöverstens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks ok!
//
Response.Clear();
Response.Charset = "iso-8859-1";
Response.ContentEncoding = System.Text.Encoding.GetEncoding("iso-8859-1");
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment; filename=\"" +
fileName + "\"");
Response.Write(fileData);
Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöverstens Teleshop"
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file);
writer.Write(fileData);

Not messed up but with the preamble...
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
writer.Write(fileData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@discussions.microsoft.com> wrote in message
news:20**********************************@microsof t.com...
But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias

Nov 17 '05 #3
An other thing my fix for this is to read the file into an Byte[] buffer and
get ride of the three first bytes i.e. the preamble...
It feels akward (and very 1990) though and .NET is deemed to have a better
approach for this..

/Oscar
"Oscar Thornell" <no****@internet.com> wrote in message
news:%2***************@TK2MSFTNGP12.phx.gbl...

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöverstens Teleshop"

If I specify UTF8:
"Fältöverstens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks
ok!
//
Response.Clear();
Response.Charset = "iso-8859-1";
Response.ContentEncoding = System.Text.Encoding.GetEncoding("iso-8859-1");
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment; filename=\"" +
fileName + "\"");
Response.Write(fileData);
Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöverstens Teleshop"
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file);
writer.Write(fileData);

Not messed up but with the preamble...
//
file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
writer.Write(fileData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@discussions.microsoft.com> wrote in
message news:20**********************************@microsof t.com...
But! If I don´t specify the encoding I end up with text that is not
UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to
view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias


Nov 17 '05 #4
"Oscar Thornell" <no****@internet.com> schrieb im Newsbeitrag
news:%2***************@TK2MSFTNGP12.phx.gbl...

I can´t explain it otherwise...
Signs like åäö ends up like this in the file..
"Fältöverstens Teleshop" This looks like your text below encoded in UTF-8 and then interpreted as
iso-8859-1 or similar.
If I specify UTF8:
"Fältöverstens Teleshop"

The problem is the IO write operation. If I change the behaviour and write
the data directly to the HTTP output stream and saves the file it looks
ok!
//
Response.Clear();
Response.Charset = "iso-8859-1"; This is not! UTF-8 Response.ContentEncoding = System.Text.Encoding.GetEncoding("iso-8859-1");
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment; filename=\"" +
fileName + "\"");
Response.Write(fileData); Here I supose, the Response.Write encodes in iso-8859-1, not in UTF-8. Response.End();

The following code writes "fileData" ( a String) to disk. In this case the
file would be messed up with: "Fältöverstens Teleshop"
// That's actually good plain UTF-8, it's only read with an other encoding. file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file);
writer.Write(fileData);

Not messed up but with the preamble...
// How did you read this?
If the reader correctly interprets UTF-8, the preamble should be invisable.
That really puzzles me. file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
writer.Write(fileData);
Maybee I should use the GetEncoding() method for the IO version instead of
directly going for UTF8!?

/Oscar
"Mattias Sjögren" <Ma***********@discussions.microsoft.com> wrote in
message news:20**********************************@microsof t.com...
But! If I don´t specify the encoding I end up with text that is not
UTF-8
(without the preamble..).


Are you sure about that? Perhaps it's just the application you use to
view
the output (Notepad?) that fails to recognize it as UTF-8 if the preamble
is
missing.
Mattias


Nov 17 '05 #5
Oscar Thornell <no****@internet.com> wrote:
I generate and temporary saves a text file to disk. Later I upload this file
to Microsoft MapPoint (not so important).
The file needs to be in UTF-8 encoding and I explicitly use the
"Encoding.UTF8" in the constructor like this:

StreamWriter writer = new StreamWriter(file, Encoding.UTF8);

When I do this the StreamWriter inserts an UTF-8 preamble "" into the
beginning of the file.
// http://www.chilkatsoft.com/faq/Utf8Preamble.html

MapPoint throws an Exception for this UTF-8 preamble and aborts the parsing
of the file.

The annoying thing is that if I don´t explicitly state the Encoding in the
constructor the documentation for StreamWriter.Encoding property says:
"The Encoding specified in the constructor for the current instance, or
UTF8Encoding if an encoding was not specified."

But! If I don´t specify the encoding I end up with text that is not UTF-8
(without the preamble..).


That sounds very unliikely. As others have suggested, it sounds like
whatever you're using to read the file is assuming the wrong thing.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

You should be able to provide an example where writing without
specifying an encoding and writing where you specify Encoding.UTF8 make
a difference to the binary output, other than in terms of the existence
of the preamble.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #6
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file); or...
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
//Does not insert the preamble
writer.Write(fileData);

MapPoint serves my client with this: "Fältöverstens Teleshop" instead of
this: "Fältöverstens Teleshop".

If I create a file with this instantiation of StreamWriter..
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);

MapPoint throws an Exception telling me that it does not recognize "".
"The UTF-8 preamble!"

If I take that very file and opens it with a BinaryReader and drops the
three first bytes(the  preamble).
Then uploads it to MapPoint everything works nicely!
No errors and no messed up text!

If I instantiate StreamWriter with:
StreamWriter writer = new StreamWriter(file, Encoding.Default);
Everything works directly!
But I do not want to use that method since it is dependent upon the current
coding page in the system.

What I rely can´t understand here is why MapPoint messes up the text with
this code:
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));

and works with this(if I drop the three first bytes..):
StreamWriter writer = new StreamWriter(file, Encoding.UTF8);
//The following code can be used to read the preamble from a file.
//In this case it recognizes UTF-8 and UTF-16.
FileStream stream = new FileStream("The_File.txt", FileMode.Open);
BinaryReader reader = new BinaryReader(stream);

byte[] buffer = reader.ReadBytes(size);

if ( buffer[0] == 0xff && buffer[1] == 0xfe )
{
//UTF-16
Console.WriteLine("UTF-16");
}
else if( buffer[0] == 0xef && buffer[1] == 0xbb && buffer[2] == 0xbf)
{
//UTF-8
Console.WriteLine("UTF-8");
}

/Oscar
Nov 17 '05 #7
<"Oscar Thornell" <oscar.thornell [ xx] gmail.com>> wrote:
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file); or...
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
//Does not insert the preamble
writer.Write(fileData);

MapPoint serves my client with this: "Fältöverstens Teleshop" instead of
this: "Fältöverstens Teleshop".


According to what - MapPoint? What's reading the file at that point?
That's the important bit - I bet you'll find the file is actually
exactly the same, just missing the UTF-8 preamble.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #8
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in .NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and so
on...
a) Exporting a datasource in Access 2003 XML format and reading it into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".

/Oscar

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
<"Oscar Thornell" <oscar.thornell [ xx] gmail.com>> wrote:
Hi again! I have worked some more with this and..

First, the unlikley thing that is part of my problem is Microsofts
MapPoint
Web Service.
Hosted at: https://mappoint-*****.partners.extr...soft.com/*****

If I create a file with the following code..
FileStream file = new FileStream(filePath, fileMode, fileAccess);
StreamWriter writer = new StreamWriter(file); or...
StreamWriter writer = new StreamWriter(file, new UTF8Encoding(false));
//Does not insert the preamble
writer.Write(fileData);

MapPoint serves my client with this: "Fältöverstens Teleshop" instead of
this: "Fältöverstens Teleshop".


According to what - MapPoint? What's reading the file at that point?
That's the important bit - I bet you'll find the file is actually
exactly the same, just missing the UTF-8 preamble.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #9
Oscar Thornell <no****@internet.com> wrote:
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in .NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and so
on...
a) Exporting a datasource in Access 2003 XML format and reading it into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".


Does the MapPoint documentation not give any indication about which
encodings are supported, or any way of specifying the encoding?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #10
No way of specifying...
I haven´t found any specs. for upload, only what formats a datasource can be
transformed to during an export.

Among those are: "TabDelimitedTextUTF8"...ISO 10646-1:2000 Annex D

So one could assume that UTF8 is supported for "uploads" aswell... :-(

Anyway "ISO 8859-1" seems ok for now so I stick with that...

Regards
/Oscar
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Oscar Thornell <no****@internet.com> wrote:
First the only application that reads the file is MapPoint. After that
process MapPoint creates a geocoded datasource based on the file.
The behaviour is consistent in a number different ways of reading data
from
the MapPoint datasource at that point.

1) A client utilizing the Web Service Find() method that queries the
mappoint datasource and retrieves textual descriptions...
a) The clients are in this case both dev test clients written in
.NET/C#
running on Win2003
b) J2EE production clients running on Solaris

2) MapPoint supports exports of datasources in several ways CVS, XML and
so
on...
a) Exporting a datasource in Access 2003 XML format and reading it
into
a new Access db also gives the presentation problems with
encoding/text (as described in this thread..)

My only conclusion is that MapPoint does not support UTF-8 and I am doing
tests to soly use "iso-8859-1".


Does the MapPoint documentation not give any indication about which
encodings are supported, or any way of specifying the encoding?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Phil | last post by:
Hi, I don't understand this strange behaviour: I compile this code : #include <Python.h> #include"Numeric/arrayobject.h" static PyObject *
4
by: Ben | last post by:
Hi all, I'm trying to figure out how how complex map, filter and reduce work based on the following piece of code from http://www-106.ibm.com/developerworks/linux/library/l-prog.html : ...
20
by: Markus Sandheide | last post by:
Hello! Execute these lines: int x = 1; x = x > 2345678901; You will get: x == 1 with Borland C++ Builder
3
by: Sebastian C. | last post by:
Hello everybody Since I upgraded my Office XP Professional to SP3 I got strange behaviour. Pieces of code which works for 3 years now are suddenly stop to work properly. I have Office XP...
1
by: Ian | last post by:
Hi, I have noticed some strange behaviour when loading my database. I load a menu form on startup; this form contains a linked image to display. If the project is not then signed, the menu form...
31
by: DeltaOne | last post by:
#include<stdio.h> typedef struct test{ int i; int j; }test; main(){ test var; var.i=10; var.j=20;
4
by: ignw82 | last post by:
Hi all, I have a strange behaviour in dataview, maybe you can help me. the behaviour is like this : First I made a datatable (odt) in data set, and then I created a dataview using this...
4
by: Gotch | last post by:
Hi, I'm getting a very strange behaviour while running a project I've done.... Let's expose it: I've two projects. Both of them use a Form to do some Gui stuff. Other threads pack up messages...
8
by: Dox33 | last post by:
I ran into a very strange behaviour of raw_input(). I hope somebody can tell me how to fix this. (Or is this a problem in the python source?) I will explain the problem by using 3 examples....
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.