473,320 Members | 2,083 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

xml utf-8 String to XPathDocument

Hi;

I have a string that is an xml file. It starts with <?xml
encoding='utf-8'... and it has the utf-8 2-byte sequences as 2 chars. How do
I get that into an XPathDocument where the 2-char sequences are not treated
as 2 characters?

--
thanks - dave
Nov 12 '05 #1
8 4719
Hi dave,

You don't need to care about the encoding, just create an XPathDocment
object with the filename as the constructor's parameter. Or you can load
the file into a stream and open the XPathDocument from the stream.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #2
It's not a file, the xml is in a String. I tried StringReader but it didn't
handle it correctly.

--
thanks - dave
"Kevin Yu [MSFT]" wrote:
Hi dave,

You don't need to care about the encoding, just create an XPathDocment
object with the filename as the constructor's parameter. Or you can load
the file into a stream and open the XPathDocument from the stream.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #3
I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
string variable, it is stored as Unicode in memory. So you needn't worry
about the encoding issue. Can you post a simple code with repro the error?

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #4
Hi;

No problem - here it is:
String data =
"<?xml version='1.0' encoding='utf-8'?>" +
"<order>" +
" <customer>" +
" <FLD>Angebot über eine neue Schmieranlage</FLD>" +
" </customer>" +
"</order>";

Please note that the ü is the 2 byte value for a utf-8 encoding that is
actually a ü. So I need those to char values to become 2 byte values when fed
to XmlDocument (new MemoryStream())

The best I have come up with is to create a byte[] and char by char assign
the String values to the byte. But there has to be a faster way (I hope).

--
thanks - dave
"Kevin Yu [MSFT]" wrote:
I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
string variable, it is stored as Unicode in memory. So you needn't worry
about the encoding issue. Can you post a simple code with repro the error?

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #5
You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
always be UTF-16. If you want to do UTF-8 you need to do it at the byte
level, not the "char" level. See
http://msdn.microsoft.com/library/en...lencodings.asp for
details.

"David Thielen" <th*****@nospam.nospam> wrote in message
news:93**********************************@microsof t.com...
Hi;

No problem - here it is:
String data =
"<?xml version='1.0' encoding='utf-8'?>" +
"<order>" +
" <customer>" +
" <FLD>Angebot über eine neue Schmieranlage</FLD>" +
" </customer>" +
"</order>";

Please note that the ü is the 2 byte value for a utf-8 encoding that is
actually a ü. So I need those to char values to become 2 byte values when
fed
to XmlDocument (new MemoryStream())

The best I have come up with is to create a byte[] and char by char assign
the String values to the byte. But there has to be a faster way (I hope).

--
thanks - dave
"Kevin Yu [MSFT]" wrote:
I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
string variable, it is stored as Unicode in memory. So you needn't worry
about the encoding issue. Can you post a simple code with repro the
error?

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Nov 12 '05 #6
Yes - but unfortunately I don't control how it is passed to me. So I have to
convert. I guess the for loop is my best solution.

--
thanks - dave
"Chris Lovett" wrote:
You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
always be UTF-16. If you want to do UTF-8 you need to do it at the byte
level, not the "char" level. See
http://msdn.microsoft.com/library/en...lencodings.asp for
details.

"David Thielen" <th*****@nospam.nospam> wrote in message
news:93**********************************@microsof t.com...
Hi;

No problem - here it is:
String data =
"<?xml version='1.0' encoding='utf-8'?>" +
"<order>" +
" <customer>" +
" <FLD>Angebot über eine neue Schmieranlage</FLD>" +
" </customer>" +
"</order>";

Please note that the ü is the 2 byte value for a utf-8 encoding that is
actually a ü. So I need those to char values to become 2 byte values when
fed
to XmlDocument (new MemoryStream())

The best I have come up with is to create a byte[] and char by char assign
the String values to the byte. But there has to be a faster way (I hope).

--
thanks - dave
"Kevin Yu [MSFT]" wrote:
I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
string variable, it is stored as Unicode in memory. So you needn't worry
about the encoding issue. Can you post a simple code with repro the
error?

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."


Nov 12 '05 #7
I would say the string you've been given is terribly messed up if it
contains UTF-8 - I would push back on the source of this string and fix it
there.

"David Thielen" <th*****@nospam.nospam> wrote in message
news:67**********************************@microsof t.com...
Yes - but unfortunately I don't control how it is passed to me. So I have
to
convert. I guess the for loop is my best solution.

--
thanks - dave
"Chris Lovett" wrote:
You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
always be UTF-16. If you want to do UTF-8 you need to do it at the byte
level, not the "char" level. See
http://msdn.microsoft.com/library/en...lencodings.asp for
details.

"David Thielen" <th*****@nospam.nospam> wrote in message
news:93**********************************@microsof t.com...
> Hi;
>
> No problem - here it is:
> String data =
> "<?xml version='1.0' encoding='utf-8'?>" +
> "<order>" +
> " <customer>" +
> " <FLD>Angebot über eine neue Schmieranlage</FLD>" +
> " </customer>" +
> "</order>";
>
> Please note that the ü is the 2 byte value for a utf-8 encoding that
> is
> actually a ü. So I need those to char values to become 2 byte values
> when
> fed
> to XmlDocument (new MemoryStream())
>
> The best I have come up with is to create a byte[] and char by char
> assign
> the String values to the byte. But there has to be a faster way (I
> hope).
>
> --
> thanks - dave
>
>
> "Kevin Yu [MSFT]" wrote:
>
>> I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
>> string variable, it is stored as Unicode in memory. So you needn't
>> worry
>> about the encoding issue. Can you post a simple code with repro the
>> error?
>>
>> Kevin Yu
>> =======
>> "This posting is provided "AS IS" with no warranties, and confers no
>> rights."
>>
>>


Nov 12 '05 #8
Hi;

Apparently what is happening is the xml file is being read in to a String.
Since they are just reading the text, they don't know the encoding. And when
I get the string, I also don't know the encoding unless I parse it to find
the encoding=.

So it is read with each byte in the original file becoming a char in the
string. And I then convert back with each char becoming a byte. It is messy -
but I'm not sure there is a better solution unless both ends parse the text
to find the encoding=, then reset the stream to then read it.

--
thanks - dave
"Chris Lovett" wrote:
I would say the string you've been given is terribly messed up if it
contains UTF-8 - I would push back on the source of this string and fix it
there.

"David Thielen" <th*****@nospam.nospam> wrote in message
news:67**********************************@microsof t.com...
Yes - but unfortunately I don't control how it is passed to me. So I have
to
convert. I guess the for loop is my best solution.

--
thanks - dave
"Chris Lovett" wrote:
You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
always be UTF-16. If you want to do UTF-8 you need to do it at the byte
level, not the "char" level. See
http://msdn.microsoft.com/library/en...lencodings.asp for
details.

"David Thielen" <th*****@nospam.nospam> wrote in message
news:93**********************************@microsof t.com...
> Hi;
>
> No problem - here it is:
> String data =
> "<?xml version='1.0' encoding='utf-8'?>" +
> "<order>" +
> " <customer>" +
> " <FLD>Angebot über eine neue Schmieranlage</FLD>" +
> " </customer>" +
> "</order>";
>
> Please note that the ü is the 2 byte value for a utf-8 encoding that
> is
> actually a ü. So I need those to char values to become 2 byte values
> when
> fed
> to XmlDocument (new MemoryStream())
>
> The best I have come up with is to create a byte[] and char by char
> assign
> the String values to the byte. But there has to be a faster way (I
> hope).
>
> --
> thanks - dave
>
>
> "Kevin Yu [MSFT]" wrote:
>
>> I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
>> string variable, it is stored as Unicode in memory. So you needn't
>> worry
>> about the encoding issue. Can you post a simple code with repro the
>> error?
>>
>> Kevin Yu
>> =======
>> "This posting is provided "AS IS" with no warranties, and confers no
>> rights."
>>
>>


Nov 12 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: aa | last post by:
Is it OK to include an ANSI file into a UTF-8 file?
38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
1
by: stevelooking41 | last post by:
Can someone explain why I don't seem unable to use document.write to produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ? I've tried everyway I've been able to find to tell the...
1
by: JJBW | last post by:
Hi I am creating some aspx files in Visual Studio 2003 for a Danish web site. The page is encoded as UTF-8 - However, when I save the the aspx file as "UTF-8 without signature" the Danish...
3
by: Richard Connamacher | last post by:
I'm new to PostgreSQL, and from the looks of it, it's a great database, and I'll be using more of it in the future. I had a quick question if anyone could clear this up. The documentation for...
1
by: David Bertoni | last post by:
Hi all, I'm trying to resolve what appears to me an inconsistency in the XML 1.0 recommendation involving entities encoding in UTF-16 and the requirement for a byte order mark. Section 4.3.3...
6
by: archana | last post by:
Hi all, can someone tell me difference between unicode and utf 8 or utf 18 and which one is supporting more character set. whic i should use to support character ucs-2. I want to use ucs-2...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
10
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email...
35
by: Bjoern Hoehrmann | last post by:
Hi, For a free software project, I had to write a routine that, given a Unicode scalar value U+0000 - U+10FFFF, returns an integer that holds the UTF-8 encoded form of it, for example, U+00F6...
0
by: DolphinDB | last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation. Take...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.