Hi;
I have a string that is an xml file. It starts with <?xml
encoding='utf-8'... and it has the utf-8 2-byte sequences as 2 chars. How do
I get that into an XPathDocument where the 2-char sequences are not treated
as 2 characters?
--
thanks - dave 8 4719
Hi dave,
You don't need to care about the encoding, just create an XPathDocment
object with the filename as the constructor's parameter. Or you can load
the file into a stream and open the XPathDocument from the stream.
Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
It's not a file, the xml is in a String. I tried StringReader but it didn't
handle it correctly.
--
thanks - dave
"Kevin Yu [MSFT]" wrote: Hi dave,
You don't need to care about the encoding, just create an XPathDocment object with the filename as the constructor's parameter. Or you can load the file into a stream and open the XPathDocument from the stream.
Kevin Yu ======= "This posting is provided "AS IS" with no warranties, and confers no rights."
I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
string variable, it is stored as Unicode in memory. So you needn't worry
about the encoding issue. Can you post a simple code with repro the error?
Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
Hi;
No problem - here it is:
String data =
"<?xml version='1.0' encoding='utf-8'?>" +
"<order>" +
" <customer>" +
" <FLD>Angebot über eine neue Schmieranlage</FLD>" +
" </customer>" +
"</order>";
Please note that the ü is the 2 byte value for a utf-8 encoding that is
actually a ü. So I need those to char values to become 2 byte values when fed
to XmlDocument (new MemoryStream())
The best I have come up with is to create a byte[] and char by char assign
the String values to the byte. But there has to be a faster way (I hope).
--
thanks - dave
"Kevin Yu [MSFT]" wrote: I don't quite understand. The Xml doc is UTF-8 encoding. When it is a string variable, it is stored as Unicode in memory. So you needn't worry about the encoding issue. Can you post a simple code with repro the error?
Kevin Yu ======= "This posting is provided "AS IS" with no warranties, and confers no rights."
You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
always be UTF-16. If you want to do UTF-8 you need to do it at the byte
level, not the "char" level. See http://msdn.microsoft.com/library/en...lencodings.asp for
details.
"David Thielen" <th*****@nospam.nospam> wrote in message
news:93**********************************@microsof t.com... Hi;
No problem - here it is: String data = "<?xml version='1.0' encoding='utf-8'?>" + "<order>" + " <customer>" + " <FLD>Angebot über eine neue Schmieranlage</FLD>" + " </customer>" + "</order>";
Please note that the ü is the 2 byte value for a utf-8 encoding that is actually a ü. So I need those to char values to become 2 byte values when fed to XmlDocument (new MemoryStream())
The best I have come up with is to create a byte[] and char by char assign the String values to the byte. But there has to be a faster way (I hope).
-- thanks - dave
"Kevin Yu [MSFT]" wrote:
I don't quite understand. The Xml doc is UTF-8 encoding. When it is a string variable, it is stored as Unicode in memory. So you needn't worry about the encoding issue. Can you post a simple code with repro the error?
Kevin Yu ======= "This posting is provided "AS IS" with no warranties, and confers no rights."
Yes - but unfortunately I don't control how it is passed to me. So I have to
convert. I guess the for loop is my best solution.
--
thanks - dave
"Chris Lovett" wrote: You cannot do utf-8 inside CLR strings. A CLR "char" is considered to always be UTF-16. If you want to do UTF-8 you need to do it at the byte level, not the "char" level. See http://msdn.microsoft.com/library/en...lencodings.asp for details.
"David Thielen" <th*****@nospam.nospam> wrote in message news:93**********************************@microsof t.com... Hi;
No problem - here it is: String data = "<?xml version='1.0' encoding='utf-8'?>" + "<order>" + " <customer>" + " <FLD>Angebot über eine neue Schmieranlage</FLD>" + " </customer>" + "</order>";
Please note that the ü is the 2 byte value for a utf-8 encoding that is actually a ü. So I need those to char values to become 2 byte values when fed to XmlDocument (new MemoryStream())
The best I have come up with is to create a byte[] and char by char assign the String values to the byte. But there has to be a faster way (I hope).
-- thanks - dave
"Kevin Yu [MSFT]" wrote:
I don't quite understand. The Xml doc is UTF-8 encoding. When it is a string variable, it is stored as Unicode in memory. So you needn't worry about the encoding issue. Can you post a simple code with repro the error?
Kevin Yu ======= "This posting is provided "AS IS" with no warranties, and confers no rights."
I would say the string you've been given is terribly messed up if it
contains UTF-8 - I would push back on the source of this string and fix it
there.
"David Thielen" <th*****@nospam.nospam> wrote in message
news:67**********************************@microsof t.com... Yes - but unfortunately I don't control how it is passed to me. So I have to convert. I guess the for loop is my best solution.
-- thanks - dave
"Chris Lovett" wrote:
You cannot do utf-8 inside CLR strings. A CLR "char" is considered to always be UTF-16. If you want to do UTF-8 you need to do it at the byte level, not the "char" level. See http://msdn.microsoft.com/library/en...lencodings.asp for details.
"David Thielen" <th*****@nospam.nospam> wrote in message news:93**********************************@microsof t.com... > Hi; > > No problem - here it is: > String data = > "<?xml version='1.0' encoding='utf-8'?>" + > "<order>" + > " <customer>" + > " <FLD>Angebot über eine neue Schmieranlage</FLD>" + > " </customer>" + > "</order>"; > > Please note that the ü is the 2 byte value for a utf-8 encoding that > is > actually a ü. So I need those to char values to become 2 byte values > when > fed > to XmlDocument (new MemoryStream()) > > The best I have come up with is to create a byte[] and char by char > assign > the String values to the byte. But there has to be a faster way (I > hope). > > -- > thanks - dave > > > "Kevin Yu [MSFT]" wrote: > >> I don't quite understand. The Xml doc is UTF-8 encoding. When it is a >> string variable, it is stored as Unicode in memory. So you needn't >> worry >> about the encoding issue. Can you post a simple code with repro the >> error? >> >> Kevin Yu >> ======= >> "This posting is provided "AS IS" with no warranties, and confers no >> rights." >> >>
Hi;
Apparently what is happening is the xml file is being read in to a String.
Since they are just reading the text, they don't know the encoding. And when
I get the string, I also don't know the encoding unless I parse it to find
the encoding=.
So it is read with each byte in the original file becoming a char in the
string. And I then convert back with each char becoming a byte. It is messy -
but I'm not sure there is a better solution unless both ends parse the text
to find the encoding=, then reset the stream to then read it.
--
thanks - dave
"Chris Lovett" wrote: I would say the string you've been given is terribly messed up if it contains UTF-8 - I would push back on the source of this string and fix it there.
"David Thielen" <th*****@nospam.nospam> wrote in message news:67**********************************@microsof t.com... Yes - but unfortunately I don't control how it is passed to me. So I have to convert. I guess the for loop is my best solution.
-- thanks - dave
"Chris Lovett" wrote:
You cannot do utf-8 inside CLR strings. A CLR "char" is considered to always be UTF-16. If you want to do UTF-8 you need to do it at the byte level, not the "char" level. See http://msdn.microsoft.com/library/en...lencodings.asp for details.
"David Thielen" <th*****@nospam.nospam> wrote in message news:93**********************************@microsof t.com... > Hi; > > No problem - here it is: > String data = > "<?xml version='1.0' encoding='utf-8'?>" + > "<order>" + > " <customer>" + > " <FLD>Angebot über eine neue Schmieranlage</FLD>" + > " </customer>" + > "</order>"; > > Please note that the ü is the 2 byte value for a utf-8 encoding that > is > actually a ü. So I need those to char values to become 2 byte values > when > fed > to XmlDocument (new MemoryStream()) > > The best I have come up with is to create a byte[] and char by char > assign > the String values to the byte. But there has to be a faster way (I > hope). > > -- > thanks - dave > > > "Kevin Yu [MSFT]" wrote: > >> I don't quite understand. The Xml doc is UTF-8 encoding. When it is a >> string variable, it is stored as Unicode in memory. So you needn't >> worry >> about the encoding issue. Can you post a simple code with repro the >> error? >> >> Kevin Yu >> ======= >> "This posting is provided "AS IS" with no warranties, and confers no >> rights." >> >> This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: aa |
last post by:
Is it OK to include an ANSI file into a UTF-8 file?
|
by: Haines Brown |
last post by:
I'm having trouble finding the character entity for the French
abbreviation for "number" (capital N followed by a small supercript
o, period).
My references are not listing it. Where would I...
|
by: stevelooking41 |
last post by:
Can someone explain why I don't seem unable to use document.write to
produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ?
I've tried everyway I've been able to find to tell the...
|
by: JJBW |
last post by:
Hi
I am creating some aspx files in Visual Studio 2003 for a Danish web
site.
The page is encoded as UTF-8 - However, when I save the the aspx file
as "UTF-8 without signature" the Danish...
|
by: Richard Connamacher |
last post by:
I'm new to PostgreSQL, and from the looks of it, it's a great database,
and I'll be using more of it in the future.
I had a quick question if anyone could clear this up. The documentation
for...
|
by: David Bertoni |
last post by:
Hi all,
I'm trying to resolve what appears to me an inconsistency in the XML 1.0
recommendation involving entities encoding in UTF-16 and the requirement
for a byte order mark.
Section 4.3.3...
|
by: archana |
last post by:
Hi all,
can someone tell me difference between unicode and utf 8 or utf 18 and
which one is supporting more character set.
whic i should use to support character ucs-2.
I want to use ucs-2...
|
by: Jimmy Shaw |
last post by:
Hi everybody,
Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be
mixed up, but is it possible that all UTF-16 "code points" that are 16
bits long appear just the same in UTF-32,...
|
by: Jed |
last post by:
I have a form that needs to handle international characters withing the UTF-8
character set. I have tried all the recommended strategies for getting utf-8
characters from form input to email...
|
by: Bjoern Hoehrmann |
last post by:
Hi,
For a free software project, I had to write a routine that, given a
Unicode scalar value U+0000 - U+10FFFF, returns an integer that holds
the UTF-8 encoded form of it, for example, U+00F6...
|
by: DolphinDB |
last post by:
The formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. However, some formulas are complex, leading to challenges in calculation.
Take...
|
by: DolphinDB |
last post by:
Tired of spending countless mintues downsampling your data? Look no further!
In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: Vimpel783 |
last post by:
Hello!
Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: Shællîpôpï 09 |
last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
| |