473,396 Members | 1,816 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

How to UTF-8 encode a string?


hello,
i'm doing utf-8 encoding the following way.
string message;

UTF8Encoding utf8 = new UTF8Encoding();

Byte[] encodedBytes = utf8.GetBytes(message);

message = encodedBytes.ToString();

can someone correct me?

many thanks

JJ
Apr 24 '06 #1
28 95096
jens Jensen wrote:
hello,
i'm doing utf-8 encoding the following way.
string message;

UTF8Encoding utf8 = new UTF8Encoding();
Note that you can use Encoding.UTF8Encoding to avoid creating a new
instance each time.
Byte[] encodedBytes = utf8.GetBytes(message);
message = encodedBytes.ToString();

can someone correct me?


Well, what are you expecting message to be? All strings in .NET are
UTF-16 encoded. There's no way round that (and it's not really a
problem).

Could you give us more context? Normally encoding is required when you
want to convert from a text representation to a binary represenation of
that text - e.g. to write some text to a stream. What are you trying to
do here?

Jon

Apr 24 '06 #2
> Could you give us more context? Normally encoding is required when you
want to convert from a text representation to a binary represenation of
that text - e.g. to write some text to a stream. What are you trying to
do here?

Jon


I Jon,
I need to post data to web service vis HTTP POST.

The webservice requires utf-8 encoded data. My "POST" is currently failing
due to this.

Many thanks
JJ
Apr 24 '06 #3
jens Jensen wrote:
Could you give us more context? Normally encoding is required when you
want to convert from a text representation to a binary represenation of
that text - e.g. to write some text to a stream. What are you trying to
do here?


I Jon,
I need to post data to web service vis HTTP POST.

The webservice requires utf-8 encoded data. My "POST" is currently failing
due to this.


Right. So, you need to get the bytes as you did, and then write those
to the request stream.

Jon

Apr 24 '06 #4
>
Right. So, you need to get the bytes as you did, and then write those
to the request stream.

Jon


How could i actually extracted the data in a text file? The remote and is a
java platform and the want to be sur i'm posted utf-8 encoded data.

I need to send them a file showing i'm correctly utf-8 encoding the data.
Apr 24 '06 #5
Hello, jens!

jJ> I need to post data to web service vis HTTP POST.

jJ> The webservice requires utf-8 encoded data. My "POST" is currently
jJ> failing due to this.

Do you specify appropriate content-type, when doing you POST?

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
Apr 24 '06 #6

"jens Jensen" <je**@jensen.dk> skrev i en meddelelse
news:O4**************@TK2MSFTNGP05.phx.gbl...

Right. So, you need to get the bytes as you did, and then write those
to the request stream.

Jon


How could i actually extracted the data in a text file? The remote and is
a java platform and the want to be sur i'm posted utf-8 encoded data.

I need to send them a file showing i'm correctly utf-8 encoding the data.

How could i actually extract the data in a text file? The remote end is a
java platform and they want to be sur i'm posting utf-8 encoded data.

I need to send them a file showing i'm correctly utf-8 encoding the data.


Apr 24 '06 #7
Hello, jens!

jJ> How could i actually extracted the data in a text file? The remote and
jJ> is a java platform and the want to be sur i'm posted utf-8 encoded
jJ> data.

How do you perform POST, can you throw the code?

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
Apr 24 '06 #8
>
Do you specify appropriate content-type, when doing you POST?

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com


below the actual code:

//i found my client cert.
req.ClientCertificates.Add(Certificate);

Log.Write("Our certificate: " + Certificate.ToString());

req.ContentType = "text/xml";

// req.KeepAlive = false;
req.Method = "POST";

UTF8Encoding encoding = new UTF8Encoding();

byte[] postBytes = encoding.GetBytes(message);

req.ContentLength = postBytes.Length;

System.IO.Stream reqStream = req.GetRequestStream();

reqStream.Write(postBytes, 0, postBytes.Length);

reqStream.Close();

Log.Write("sending content: "+message);

System.Net.WebResponse resp = (HttpWebResponse)req.GetResponse();
System.IO.StreamReader sr = new
System.IO.StreamReader(resp.GetResponseStream());

Many Thanks

JJ

Apr 24 '06 #9
I am presuming that the problem is that the bytes are correct but the remote
service doesn't recognize them as being UTF-8. Try adding

req.ContentEncoding = encoding;

"jens Jensen" <je**@jensen.dk> wrote in message
news:eF**************@TK2MSFTNGP04.phx.gbl...

Do you specify appropriate content-type, when doing you POST?

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com


below the actual code:

//i found my client cert.
req.ClientCertificates.Add(Certificate);

Log.Write("Our certificate: " + Certificate.ToString());

req.ContentType = "text/xml";

// req.KeepAlive = false;
req.Method = "POST";

UTF8Encoding encoding = new UTF8Encoding();

byte[] postBytes = encoding.GetBytes(message);

req.ContentLength = postBytes.Length;

System.IO.Stream reqStream = req.GetRequestStream();

reqStream.Write(postBytes, 0, postBytes.Length);

reqStream.Close();

Log.Write("sending content: "+message);

System.Net.WebResponse resp = (HttpWebResponse)req.GetResponse();
System.IO.StreamReader sr = new
System.IO.StreamReader(resp.GetResponseStream());

Many Thanks

JJ

Apr 24 '06 #10

"Mike Schilling" <ap@newsgroup.nospam> skrev i en meddelelse
news:uH**************@TK2MSFTNGP02.phx.gbl...
I am presuming that the problem is that the bytes are correct but the
remote service doesn't recognize them as being UTF-8. Try adding

req.ContentEncoding = encoding;


My request object "req" does not seem to have seem to have "ContentEncoding"
property.
I'm i missing something?



Apr 24 '06 #11
Hello, jens!

try
req.ContentType = "text/xml; charset=utf-8";

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
Apr 24 '06 #12
> try
req.ContentType = "text/xml; charset=utf-8";

--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com

Hi Vadym,

I applied this, no chenge in the reponse from the server.

How can i actually dump the utf-8 encoded content to a file so i can see it
actually format following utf-8?

Many thanks

JJ
Apr 24 '06 #13
Jon Skeet wrote:
[...snip...]
Note that you can use Encoding.UTF8Encoding to avoid creating a new
instance each time.

[...snip...]

Would that code really create a new instance? I'd expect it to be a
Singleton or something like that, since it does not store any state
information.
Apr 24 '06 #14
Hello, jens!

jJ> How can i actually dump the utf-8 encoded content to a file so i can
jJ> see it actually format following utf-8?

UTF8Encoding encoding = new UTF8Encoding();
byte[] postBytes = encoding.GetBytes(message);
req.ContentLength = postBytes.Length;
System.IO.Stream reqStream = req.GetRequestStream();
reqStream.Write(postBytes, 0, postBytes.Length);
reqStream.Close();

System.IO.File.WriteAllBytes(pathHere, postBytes);


--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
Apr 24 '06 #15
Hello, jens!

jJ> I applied this, no chenge in the reponse from the server.

What is server response? what error code? any content returned?

Another question if remote end is a webservice, why can't you make web reference to it and then communicate via stub?
--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
Apr 24 '06 #16
Michael Voss wrote:
Jon Skeet wrote:
[...snip...]
Note that you can use Encoding.UTF8Encoding to avoid creating a new
instance each time.

[...snip...]

Would that code really create a new instance? I'd expect it to be a
Singleton or something like that, since it does not store any state
information.


Using Encoding.UTF8Encoding would reuse a single instance (I believe)
which is why I was suggesting using that instead of new UTF8Encoding().

Jon

Apr 24 '06 #17

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:11*********************@v46g2000cwv.googlegro ups.com...
Michael Voss wrote:
Jon Skeet wrote:
[...snip...]
> Note that you can use Encoding.UTF8Encoding to avoid creating a new
> instance each time.

[...snip...]

Would that code really create a new instance? I'd expect it to be a
Singleton or something like that, since it does not store any state
information.


Using Encoding.UTF8Encoding would reuse a single instance (I believe)
which is why I was suggesting using that instead of new UTF8Encoding().


Firstly there is no way to make "new X()" return an existing object because
unlike C++ you cannot override new so you can't do secret singletons.

Secondly the encoding does store 'state' information in the form of the
settings for BOM and exception throwing so in 1.1 there are 4 possible UTF8
encoding objects.

Worse, in 2.0 there is a settable EncoderFallback property which makes all
instances potentially unshareable. The work around for the static properties
is that they are all made readonly (IsReadOnly) and throw
InvalidOperationException if you try to set them. [It doesn't actually say
that in the documentation but it's the only sensible thing to do]

Also the static property is actually called Encoding.UTF8
Apr 24 '06 #18
There is a serious documentation issue with the static encoding properties:

BOM and exception handling are undefined and therefore these properties are
not usable without firstly investigating their actual behaviour and secondly
taking it on faith that MS wont change it.

1) The documentaion should state what these are set to.
2) There should be properties to show these settings.
3) The documentation should state that they are readonly (in fact all
documentation should state the default value for all properties but it
rarely does)

In the absense of documentation the only easily maintainable way to go is to
always create your own using UTF8Encoding(bool,bool). The cost should be
trivial.

Apr 24 '06 #19
Nick Hounsome wrote:
Using Encoding.UTF8Encoding would reuse a single instance (I believe)
which is why I was suggesting using that instead of new UTF8Encoding().
Firstly there is no way to make "new X()" return an existing object because
unlike C++ you cannot override new so you can't do secret singletons.


Unless of course you're the CLR :)

(As discussed a short while ago, the System.String constructor returns
String.Empty in certain circumstances.)
Secondly the encoding does store 'state' information in the form of the
settings for BOM and exception throwing so in 1.1 there are 4 possible UTF8
encoding objects.
True - although it's not mutable state. It's not the kind of state
which prevents something being reused, let's say. (I want a word for
this kind of state - I've been using it a lot recently.)
Worse, in 2.0 there is a settable EncoderFallback property which makes all
instances potentially unshareable.
Aargh. I take back the above :)
The work around for the static properties
is that they are all made readonly (IsReadOnly) and throw
InvalidOperationException if you try to set them. [It doesn't actually say
that in the documentation but it's the only sensible thing to do]
Not sure what you mean - the Encoding.XXX properties? Or the properties
on the objects returned by the Encoding.XXX properties?
Also the static property is actually called Encoding.UTF8


Oops, yes.

Jon

Apr 24 '06 #20
> Another question if remote end is a webservice, why can't you make web
reference to it and then communicate via stub?
--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com

Hi Vadmyn,

this is a B2B scenario and i may not publish the complete System.Net.Trace
as it contain very confidential info.

But here is the last bit . I doubt you can get any usefull hint from it.

Mnay thanks anyway

System.Net Information: 0 : [8036]
ConnectStream#50223079::ConnectStream(Buffered 24 bytes.)
System.Net Information: 0 : [8036] Associating HttpWebRequest#4878312 with
ConnectStream#50223079
System.Net Information: 0 : [8036] Associating HttpWebRequest#4878312 with
HttpWebResponse#45011781
System.Net Verbose: 0 : [8036] ConnectStream#50223079::Read()
System.Net.Sockets Verbose: 0 : [8036] Socket#13575069::Dispose()
System.Net Verbose: 0 : [8036] Data from ConnectStream#50223079::Read
System.Net Verbose: 0 : [8036] 00000000 : 3C 68 34 3E 41 63 63 65-73 73 20
44 65 6E 69 65 : <h4>Access Denie
System.Net Verbose: 0 : [8036] 00000010 : 64 3C 2F 68 34 3E 0D 0A-
: d</h4>..
System.Net Verbose: 0 : [8036] Exiting ConnectStream#50223079::Read() ->
24#24
System.Net Verbose: 0 : [8036] ConnectStream#50223079::Read()
System.Net Verbose: 0 : [8036] Exiting ConnectStream#50223079::Read() ->
0#0
System.Net Error: 0 : [8036] Exception in the
HttpWebRequest#4878312::EndGetResponse - The remote server returned an
error: (401) Unauthorized.
Apr 24 '06 #21
jens Jensen wrote:
Another question if remote end is a webservice, why can't you make web
reference to it and then communicate via stub?
this is a B2B scenario and i may not publish the complete System.Net.Trace
as it contain very confidential info.

But here is the last bit . I doubt you can get any usefull hint from it.


Well, I don't know:
System.Net Error: 0 : [8036] Exception in the
HttpWebRequest#4878312::EndGetResponse - The remote server returned an
error: (401) Unauthorized.


That seems to indicate your problem is to do with authorization more
than anything else.

Jon

Apr 24 '06 #22
>
Another question if remote end is a webservice, why can't you make web
reference to it and then communicate via stub?
--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com


This is no proper webserivce. It just does xml over https.


Apr 24 '06 #23
Hello, jens!

jJ> This is no proper webserivce. It just does xml over https.

As Jon pointed out, the problem you have is connected with authorization.

Look at the docs for
HttpWebRequest.Credentials property.
--
Regards, Vadym Stetsyak
www: http://vadmyst.blogspot.com
Apr 24 '06 #24
Thus wrote Mike,
I am presuming that the problem is that the bytes are correct but the
remote service doesn't recognize them as being UTF-8. Try adding

req.ContentEncoding = encoding;


That's HttpRequest as of ASP.NET fame and doesn't apply here ;-)
--
Joerg Jooss
ne********@joergjooss.de
Apr 24 '06 #25

"Joerg Jooss" <ne********@joergjooss.de> wrote in message
news:94**************************@msnews.microsoft .com...
Thus wrote Mike,
I am presuming that the problem is that the bytes are correct but the
remote service doesn't recognize them as being UTF-8. Try adding

req.ContentEncoding = encoding;


That's HttpRequest as of ASP.NET fame and doesn't apply here ;-)


What type is "req", then?
Apr 25 '06 #26
Mike Schilling wrote:
That's HttpRequest as of ASP.NET fame and doesn't apply here ;-)


What type is "req", then?


HttpWebRequest, I suspect.

Jon

Apr 25 '06 #27
Thus wrote Mike,
"Joerg Jooss" <ne********@joergjooss.de> wrote in message
news:94**************************@msnews.microsoft .com...
Thus wrote Mike,
I am presuming that the problem is that the bytes are correct but
the remote service doesn't recognize them as being UTF-8. Try
adding

req.ContentEncoding = encoding;

That's HttpRequest as of ASP.NET fame and doesn't apply here ;-)

What type is "req", then?


You came up with that line of code, you tell me ;-)

Must have been System.Net.HttpWebRequest -- don't have the OP at hand anymore.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Apr 26 '06 #28

"Joerg Jooss" <ne********@joergjooss.de> wrote in message
news:94**************************@msnews.microsoft .com...
Thus wrote Mike,
"Joerg Jooss" <ne********@joergjooss.de> wrote in message
news:94**************************@msnews.microsoft .com...
Thus wrote Mike,

I am presuming that the problem is that the bytes are correct but
the remote service doesn't recognize them as being UTF-8. Try
adding

req.ContentEncoding = encoding;

That's HttpRequest as of ASP.NET fame and doesn't apply here ;-)

What type is "req", then?


You came up with that line of code, you tell me ;-)
Must have been System.Net.HttpWebRequest -- don't have the OP at hand
anymore.


The OP didn't show the declaration, which is why I asked.

But I'm sure you (and Jon) are right.
Apr 26 '06 #29

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: aa | last post by:
Is it OK to include an ANSI file into a UTF-8 file?
27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
48
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at...
16
by: lawrence | last post by:
I was told in another newsgroup (about XML, I was wondering how to control user input) that most modern browsers empower the designer to cast the user created input to a particular character...
1
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the...
1
by: David Bertoni | last post by:
Hi all, I'm trying to resolve what appears to me an inconsistency in the XML 1.0 recommendation involving entities encoding in UTF-16 and the requirement for a byte order mark. Section 4.3.3...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
10
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email...
35
by: Bjoern Hoehrmann | last post by:
Hi, For a free software project, I had to write a routine that, given a Unicode scalar value U+0000 - U+10FFFF, returns an integer that holds the UTF-8 encoded form of it, for example, U+00F6...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.