473,324 Members | 2,178 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

System.Text.Encoding oddities

Sorry about the last... Anyway, here's the question:

I've been working on some C# routines to process strings in and out of various encodings. The hope is that I can just let the user type in the encoding they want and I'll do a pretty good job of converting. Basically, I take a string as input, write it to a byte array MemoryStream and then get the bytes of the conversion out.

The oddity in my question is that when I use System.Text.UTF8Encoding as an argument to my StreamWriter, I don't get the byteorder mark in the output, but when I use System.Text.Encoding.GetEncoding ("utf-8"); I do. Shouldn't these be the same, or am I missing something basic? Seems odd. Can someone explain why?

Thanks
-mark
Example code:
public byte [] Convert (string in, Encoding enc, out length)
{
MemoryStream out_stream = new MemoryStream(in.Length*3); // allow for encoding switch expansion
System.IO.StreamWriter writer = new System.IO.StreamWriter (out_stream, enc);
writer.Write (input);
writer.Flush(); // Flush but don't close, so we can get the MemoryStream used count

byte [] output = out_stream.GetBuffer();
length = out_stream.Length;
return output;
}

byte [] test = Convert ("test", System.Text.UTF8Encoding); // no bytemark
test = Convert ("test", System.Text.GetEncoding ("utf-8")); // bytemark
Nov 22 '05 #1
4 2785
Mark <ms********@lycos-inc.com> wrote:
Sorry about the last... Anyway, here's the question:

I've been working on some C# routines to process strings in and out
of various encodings. The hope is that I can just let the user type
in the encoding they want and I'll do a pretty good job of
converting. Basically, I take a string as input, write it to a byte
array MemoryStream and then get the bytes of the conversion out.

The oddity in my question is that when I use System.Text.UTF8Encoding
as an argument to my StreamWriter, I don't get the byteorder mark in
the output, but when I use System.Text.Encoding.GetEncoding
("utf-8"); I do. Shouldn't these be the same, or am I missing
something basic? Seems odd. Can someone explain why?


Well, it's basically not specified whether Encoding.UTF8 gives an
encoding with a byte order mark or not, or whether Encoding.GetEncoding
gives one with a BOM or not either.

If you want to make absolutely sure, you need to construct the
UTF8Encoding yourself, specifying whether or not you want a BOM as a
parameter.

However, there's a much easier way of doing conversion than creating a
StreamWriter - just call Encoding.GetBytes(string). That will never
contain a BOM.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 22 '05 #2
Mark <ms********@lycos-inc.com> wrote:
Sorry about the last... Anyway, here's the question:

I've been working on some C# routines to process strings in and out
of various encodings. The hope is that I can just let the user type
in the encoding they want and I'll do a pretty good job of
converting. Basically, I take a string as input, write it to a byte
array MemoryStream and then get the bytes of the conversion out.

The oddity in my question is that when I use System.Text.UTF8Encoding
as an argument to my StreamWriter, I don't get the byteorder mark in
the output, but when I use System.Text.Encoding.GetEncoding
("utf-8"); I do. Shouldn't these be the same, or am I missing
something basic? Seems odd. Can someone explain why?


Well, it's basically not specified whether Encoding.UTF8 gives an
encoding with a byte order mark or not, or whether Encoding.GetEncoding
gives one with a BOM or not either.

If you want to make absolutely sure, you need to construct the
UTF8Encoding yourself, specifying whether or not you want a BOM as a
parameter.

However, there's a much easier way of doing conversion than creating a
StreamWriter - just call Encoding.GetBytes(string). That will never
contain a BOM.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 22 '05 #3
Thanks..

Encoding.GetBytes() and Encoding.GetString () worked much better than my clunky approach - and the BOM (or lack thereof) is consistent...

That's a great help

-mar

Nov 22 '05 #4
Thanks..

Encoding.GetBytes() and Encoding.GetString () worked much better than my clunky approach - and the BOM (or lack thereof) is consistent...

That's a great help

-mar

Nov 22 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Mark | last post by:
Sorry about the last... Anyway, here's the question: I've been working on some C# routines to process strings in and out of various encodings. The hope is that I can just let the user type in...
2
by: Mark | last post by:
Sorry about the last... Anyway, here's the question: I've been working on some C# routines to process strings in and out of various encodings. The hope is that I can just let the user type in...
0
by: Mark | last post by:
Sorry about the last... Anyway, here's the question: I've been working on some C# routines to process strings in and out of various encodings. The hope is that I can just let the user type in...
3
by: Tom | last post by:
I don't want to re-invent the wheel and am looking for a simple implementation of a text viewer or RichTextBox in read only mode that allows rapid file positioning within large data files without...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.