sign in | join about | help | sitemap
Connecting Tech Pros Worldwide
David Thielen's Avatar

xml utf-8 String to XPathDocument


Question posted by: David Thielen (Guest) on November 12th, 2005 05:09 AM
Hi;

I have a string that is an xml file. It starts with <?xml
encoding='utf-8'... and it has the utf-8 2-byte sequences as 2 chars. How do
I get that into an XPathDocument where the 2-char sequences are not treated
as 2 characters?

--
thanks - dave
8 Answers Posted
Kevin Yu [MSFT]'s Avatar
Guest - n/a Posts
#2: Re: xml utf-8 String to XPathDocument

Hi dave,

You don't need to care about the encoding, just create an XPathDocment
object with the filename as the constructor's parameter. Or you can load
the file into a stream and open the XPathDocument from the stream.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

David Thielen's Avatar
Guest - n/a Posts
#3: Re: xml utf-8 String to XPathDocument

It's not a file, the xml is in a String. I tried StringReader but it didn't
handle it correctly.

--
thanks - dave


"Kevin Yu [MSFT]" wrote:
[color=blue]
> Hi dave,
>
> You don't need to care about the encoding, just create an XPathDocment
> object with the filename as the constructor's parameter. Or you can load
> the file into a stream and open the XPathDocument from the stream.
>
> Kevin Yu
> =======
> "This posting is provided "AS IS" with no warranties, and confers no
> rights."
>
>[/color]
Kevin Yu [MSFT]'s Avatar
Guest - n/a Posts
#4: Re: xml utf-8 String to XPathDocument

I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
string variable, it is stored as Unicode in memory. So you needn't worry
about the encoding issue. Can you post a simple code with repro the error?

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

David Thielen's Avatar
Guest - n/a Posts
#5: Re: xml utf-8 String to XPathDocument

Hi;

No problem - here it is:
String data =
"<?xml version='1.0' encoding='utf-8'?>" +
"<order>" +
" <customer>" +
" <FLD>Angebot über eine neue Schmieranlage</FLD>" +
" </customer>" +
"</order>";

Please note that the ü is the 2 byte value for a utf-8 encoding that is
actually a ü. So I need those to char values to become 2 byte values when fed
to XmlDocument (new MemoryStream())

The best I have come up with is to create a byte[] and char by char assign
the String values to the byte. But there has to be a faster way (I hope).

--
thanks - dave


"Kevin Yu [MSFT]" wrote:
[color=blue]
> I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
> string variable, it is stored as Unicode in memory. So you needn't worry
> about the encoding issue. Can you post a simple code with repro the error?
>
> Kevin Yu
> =======
> "This posting is provided "AS IS" with no warranties, and confers no
> rights."
>
>[/color]
Chris Lovett's Avatar
Guest - n/a Posts
#6: Re: xml utf-8 String to XPathDocument

You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
always be UTF-16. If you want to do UTF-8 you need to do it at the byte
level, not the "char" level. See
http://msdn.microsoft.com/library/e...mlencodings.asp for
details.

"David Thielen" <thielen@nospam.nospam> wrote in message
news:93A4FB5F-71C0-4BA8-8C0F-1E6C1A609484@microsoft.com...[color=blue]
> Hi;
>
> No problem - here it is:
> String data =
> "<?xml version='1.0' encoding='utf-8'?>" +
> "<order>" +
> " <customer>" +
> " <FLD>Angebot über eine neue Schmieranlage</FLD>" +
> " </customer>" +
> "</order>";
>
> Please note that the ü is the 2 byte value for a utf-8 encoding that is
> actually a ü. So I need those to char values to become 2 byte values when
> fed
> to XmlDocument (new MemoryStream())
>
> The best I have come up with is to create a byte[] and char by char assign
> the String values to the byte. But there has to be a faster way (I hope).
>
> --
> thanks - dave
>
>
> "Kevin Yu [MSFT]" wrote:
>[color=green]
>> I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
>> string variable, it is stored as Unicode in memory. So you needn't worry
>> about the encoding issue. Can you post a simple code with repro the
>> error?
>>
>> Kevin Yu
>> =======
>> "This posting is provided "AS IS" with no warranties, and confers no
>> rights."
>>
>>[/color][/color]


David Thielen's Avatar
Guest - n/a Posts
#7: Re: xml utf-8 String to XPathDocument

Yes - but unfortunately I don't control how it is passed to me. So I have to
convert. I guess the for loop is my best solution.

--
thanks - dave


"Chris Lovett" wrote:
[color=blue]
> You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
> always be UTF-16. If you want to do UTF-8 you need to do it at the byte
> level, not the "char" level. See
> http://msdn.microsoft.com/library/e...mlencodings.asp for
> details.
>
> "David Thielen" <thielen@nospam.nospam> wrote in message
> news:93A4FB5F-71C0-4BA8-8C0F-1E6C1A609484@microsoft.com...[color=green]
> > Hi;
> >
> > No problem - here it is:
> > String data =
> > "<?xml version='1.0' encoding='utf-8'?>" +
> > "<order>" +
> > " <customer>" +
> > " <FLD>Angebot über eine neue Schmieranlage</FLD>" +
> > " </customer>" +
> > "</order>";
> >
> > Please note that the ü is the 2 byte value for a utf-8 encoding that is
> > actually a ü. So I need those to char values to become 2 byte values when
> > fed
> > to XmlDocument (new MemoryStream())
> >
> > The best I have come up with is to create a byte[] and char by char assign
> > the String values to the byte. But there has to be a faster way (I hope).
> >
> > --
> > thanks - dave
> >
> >
> > "Kevin Yu [MSFT]" wrote:
> >[color=darkred]
> >> I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
> >> string variable, it is stored as Unicode in memory. So you needn't worry
> >> about the encoding issue. Can you post a simple code with repro the
> >> error?
> >>
> >> Kevin Yu
> >> =======
> >> "This posting is provided "AS IS" with no warranties, and confers no
> >> rights."
> >>
> >>[/color][/color]
>
>
>[/color]
Chris Lovett's Avatar
Guest - n/a Posts
#8: Re: xml utf-8 String to XPathDocument

I would say the string you've been given is terribly messed up if it
contains UTF-8 - I would push back on the source of this string and fix it
there.

"David Thielen" <thielen@nospam.nospam> wrote in message
news:67448207-E11B-41ED-ABB8-0710AED713C8@microsoft.com...[color=blue]
> Yes - but unfortunately I don't control how it is passed to me. So I have
> to
> convert. I guess the for loop is my best solution.
>
> --
> thanks - dave
>
>
> "Chris Lovett" wrote:
>[color=green]
>> You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
>> always be UTF-16. If you want to do UTF-8 you need to do it at the byte
>> level, not the "char" level. See
>> http://msdn.microsoft.com/library/e...mlencodings.asp for
>> details.
>>
>> "David Thielen" <thielen@nospam.nospam> wrote in message
>> news:93A4FB5F-71C0-4BA8-8C0F-1E6C1A609484@microsoft.com...[color=darkred]
>> > Hi;
>> >
>> > No problem - here it is:
>> > String data =
>> > "<?xml version='1.0' encoding='utf-8'?>" +
>> > "<order>" +
>> > " <customer>" +
>> > " <FLD>Angebot über eine neue Schmieranlage</FLD>" +
>> > " </customer>" +
>> > "</order>";
>> >
>> > Please note that the ü is the 2 byte value for a utf-8 encoding that
>> > is
>> > actually a ü. So I need those to char values to become 2 byte values
>> > when
>> > fed
>> > to XmlDocument (new MemoryStream())
>> >
>> > The best I have come up with is to create a byte[] and char by char
>> > assign
>> > the String values to the byte. But there has to be a faster way (I
>> > hope).
>> >
>> > --
>> > thanks - dave
>> >
>> >
>> > "Kevin Yu [MSFT]" wrote:
>> >
>> >> I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
>> >> string variable, it is stored as Unicode in memory. So you needn't
>> >> worry
>> >> about the encoding issue. Can you post a simple code with repro the
>> >> error?
>> >>
>> >> Kevin Yu
>> >> =======
>> >> "This posting is provided "AS IS" with no warranties, and confers no
>> >> rights."
>> >>
>> >>[/color]
>>
>>
>>[/color][/color]


David Thielen's Avatar
Guest - n/a Posts
#9: Re: xml utf-8 String to XPathDocument

Hi;

Apparently what is happening is the xml file is being read in to a String.
Since they are just reading the text, they don't know the encoding. And when
I get the string, I also don't know the encoding unless I parse it to find
the encoding=.

So it is read with each byte in the original file becoming a char in the
string. And I then convert back with each char becoming a byte. It is messy -
but I'm not sure there is a better solution unless both ends parse the text
to find the encoding=, then reset the stream to then read it.

--
thanks - dave


"Chris Lovett" wrote:
[color=blue]
> I would say the string you've been given is terribly messed up if it
> contains UTF-8 - I would push back on the source of this string and fix it
> there.
>
> "David Thielen" <thielen@nospam.nospam> wrote in message
> news:67448207-E11B-41ED-ABB8-0710AED713C8@microsoft.com...[color=green]
> > Yes - but unfortunately I don't control how it is passed to me. So I have
> > to
> > convert. I guess the for loop is my best solution.
> >
> > --
> > thanks - dave
> >
> >
> > "Chris Lovett" wrote:
> >[color=darkred]
> >> You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
> >> always be UTF-16. If you want to do UTF-8 you need to do it at the byte
> >> level, not the "char" level. See
> >> http://msdn.microsoft.com/library/e...mlencodings.asp for
> >> details.
> >>
> >> "David Thielen" <thielen@nospam.nospam> wrote in message
> >> news:93A4FB5F-71C0-4BA8-8C0F-1E6C1A609484@microsoft.com...
> >> > Hi;
> >> >
> >> > No problem - here it is:
> >> > String data =
> >> > "<?xml version='1.0' encoding='utf-8'?>" +
> >> > "<order>" +
> >> > " <customer>" +
> >> > " <FLD>Angebot über eine neue Schmieranlage</FLD>" +
> >> > " </customer>" +
> >> > "</order>";
> >> >
> >> > Please note that the ü is the 2 byte value for a utf-8 encoding that
> >> > is
> >> > actually a ü. So I need those to char values to become 2 byte values
> >> > when
> >> > fed
> >> > to XmlDocument (new MemoryStream())
> >> >
> >> > The best I have come up with is to create a byte[] and char by char
> >> > assign
> >> > the String values to the byte. But there has to be a faster way (I
> >> > hope).
> >> >
> >> > --
> >> > thanks - dave
> >> >
> >> >
> >> > "Kevin Yu [MSFT]" wrote:
> >> >
> >> >> I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
> >> >> string variable, it is stored as Unicode in memory. So you needn't
> >> >> worry
> >> >> about the encoding issue. Can you post a simple code with repro the
> >> >> error?
> >> >>
> >> >> Kevin Yu
> >> >> =======
> >> >> "This posting is provided "AS IS" with no warranties, and confers no
> >> >> rights."
> >> >>
> >> >>
> >>
> >>
> >>[/color][/color]
>
>
>[/color]
 
Not the answer you were looking for? Post your question . . .
196,824 members ready to help you find a solution.
Join Bytes.com

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 196,824 network members.
Post your question now . . .
It's fast and it's free

Popular Articles

Top Community Contributors