473,395 Members | 1,442 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Reading UTF-8 Data from XML file

We have an XML file that contains text in various languages , ie English,
French, German and Chinese etc.
We currently have a StringWriter object that reads this in and transforms
against an XslTransform object.
the problem arises when we encounter Chinese characters; these characters
just come out as garbage in the internet explorer browser.

Setting the charset type on the .aspx page, in the web.config and in the
..xsl file to be transformed against has no effect.

Using a simple transform in classic ASP,
we can correctly display the text as its meant to be seen, however getting
the same output in c# seems a lot more tricky.

After trying various 'fixes' posted on several developer sites, nothing has
prevailed and the problem is still there.
We overloaded the StringWriter object to allow changing of the Encoding type
to force UTF-8 in, but to no avail.

When the transform is complete, we return the StringWriter objects .ToString
method.. This is where the error seems to lie,
because checking the .Encoding.EncodingName just prior to returning, its
labelled as 'Unicode (UTF-8)', however when output
to screen via a Text Literal, all we see is garbage.
Some of the charachters are replaced with ???????. We know are browser is
functioning correctly because we can see the types of text on
http://www.yahoo.com.hk

Nov 19 '05 #1
6 2473
Matt Hollingworth wrote:
We have an XML file that contains text in various languages , ie
English, French, German and Chinese etc.
We currently have a StringWriter object that reads this in and
transforms against an XslTransform object.
I really don't believe that you use a String*Writer* to *read* input
;-)
the problem arises when we encounter Chinese characters; these
characters just come out as garbage in the internet explorer browser.

Setting the charset type on the .aspx page, in the web.config and in
the .xsl file to be transformed against has no effect.

Using a simple transform in classic ASP,
we can correctly display the text as its meant to be seen, however
getting the same output in c# seems a lot more tricky.

After trying various 'fixes' posted on several developer sites,
nothing has prevailed and the problem is still there.
We overloaded the StringWriter object to allow changing of the
Encoding type to force UTF-8 in, but to no avail.

When the transform is complete, we return the StringWriter objects
.ToString method.. This is where the error seems to lie,
because checking the .Encoding.EncodingName just prior to returning,
its labelled as 'Unicode (UTF-8)', however when output
to screen via a Text Literal, all we see is garbage.
Some of the charachters are replaced with ???????. We know are
browser is functioning correctly because we can see the types of text
on http://www.yahoo.com.hk


Characters and strings in .NET are always Unicode und use UTF-16 as
internal representation. This means
a) a UTF-8 StringWriter is an oxymoron
b) truely character-based operations aren't susceptible to encoding
problems
c) encodings are only relevant when you need to transport strings using
a byte representation, i.e. when rendering a string on web page. Make
sure that your web application uses UTF-8 (or any other UTF that suits
your needs) as response encoding.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #2
Joerg,

Thanks - A developer wrote this question...

We currently have a StringWriter object that reads this in and
transforms against an XslTransform object.

Sorry - this means that the result of a transformation of an XmlDocument
object is written to a string writer to clarify.
My Webform does use uft-8 response and request encoding and I have tried
using several other different encoding types to get it to work.

I can get chinese charachters to display but some of the content is still
broken, could the fact that my transformation results in a mixture of html
code + english text + chinese text be part of the problem?

It seems I get something like "藛鈥犆b偓鈥?/P>" notice the question mark and half
a </p> tag. I have disabled output escaping in my xslt but still to no avail.

Your help appreciated,
Thanks
Matt
"Joerg Jooss" wrote:
Matt Hollingworth wrote:
We have an XML file that contains text in various languages , ie
English, French, German and Chinese etc.
We currently have a StringWriter object that reads this in and
transforms against an XslTransform object.


I really don't believe that you use a String*Writer* to *read* input
;-)
the problem arises when we encounter Chinese characters; these
characters just come out as garbage in the internet explorer browser.

Setting the charset type on the .aspx page, in the web.config and in
the .xsl file to be transformed against has no effect.

Using a simple transform in classic ASP,
we can correctly display the text as its meant to be seen, however
getting the same output in c# seems a lot more tricky.

After trying various 'fixes' posted on several developer sites,
nothing has prevailed and the problem is still there.
We overloaded the StringWriter object to allow changing of the
Encoding type to force UTF-8 in, but to no avail.

When the transform is complete, we return the StringWriter objects
.ToString method.. This is where the error seems to lie,
because checking the .Encoding.EncodingName just prior to returning,
its labelled as 'Unicode (UTF-8)', however when output
to screen via a Text Literal, all we see is garbage.
Some of the charachters are replaced with ???????. We know are
browser is functioning correctly because we can see the types of text
on http://www.yahoo.com.hk


Characters and strings in .NET are always Unicode und use UTF-16 as
internal representation. This means
a) a UTF-8 StringWriter is an oxymoron
b) truely character-based operations aren't susceptible to encoding
problems
c) encodings are only relevant when you need to transport strings using
a byte representation, i.e. when rendering a string on web page. Make
sure that your web application uses UTF-8 (or any other UTF that suits
your needs) as response encoding.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de

Nov 19 '05 #3
Matt Hollingworth wrote:
Joerg,

Thanks - A developer wrote this question...

We currently have a StringWriter object that reads this in and
transforms against an XslTransform object.

Sorry - this means that the result of a transformation of an
XmlDocument object is written to a string writer to clarify.
My Webform does use uft-8 response and request encoding and I have
tried using several other different encoding types to get it to work.

I can get chinese charachters to display but some of the content is
still broken, could the fact that my transformation results in a
mixture of html code + english text + chinese text be part of the
problem?


Only if you were not using Unicode. But since you use UTF-8 as response
encoding, and assuming you don't mistreat any string objects in your
code, that should not be a problem.
It seems I get something like "藛鈥犆b偓鈥?/P>" notice the
question mark and half a </p> tag.


What characters are missing in this string? Is it only the opening '<'?

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #4


"Joerg Jooss" wrote:
Matt Hollingworth wrote:
Joerg,

Thanks - A developer wrote this question...

We currently have a StringWriter object t hat reads this in and
> transforms against an XslTransform object.


Sorry - this means that the result of a transformation of an
XmlDocument object is written to a string writer to clarify.
My Webform does use uft-8 response and request encoding and I have
tried using several other different encoding types to get it to work.

I can get chinese charachters to display but some of the content is
still broken, could the fact that my transformation results in a
mixture of html code + english text + chinese text be part of the
problem?


Only if you were not using Unicode. But since you use UTF-8 as response
encoding, and assuming you don't mistreat any string objects in your
code, that should not be a problem.
It seems I get something like "藛鈥犆b偓鈥?/P>" notice the
question mark and half a </p> tag.


What characters are missing in this string? Is it only the opening '<'?

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de


Yes - although if i disable output escaping in my xsl i can see that ?lt; is
in the code as if the & has been replaced with a ?
Here is the code for your ref:
XmlDocument oDoc = new XmlDocument();
XslTransform oXsl = new XslTransform();

oDoc.Load(Server.MapPath(""));
oXsl.Load(Server.MapPath("xsl/x_language_test.xsl"));

StringWriter oSw = new StringWriter();

oXsl.Transform(oDoc,null,oSw);

litTestText.Text = oSw.ToString();
Thanks
Matt
Nov 19 '05 #5


"Matt Hollingworth" wrote:


"Joerg Jooss" wrote:
Matt Hollingworth wrote:
Joerg,

Thanks - A developer wrote this question...

We currently have a StringWriter object t hat reads this in and > > transforms against an XslTransform object.

Sorry - this means that the result of a transformation of an
XmlDocument object is written to a string writer to clarify.
My Webform does use uft-8 response and request encoding and I have
tried using several other different encoding types to get it to work.

I can get chinese charachters to display but some of the content is
still broken, could the fact that my transformation results in a
mixture of html code + english text + chinese text be part of the
problem?
Only if you were not using Unicode. But since you use UTF-8 as response
encoding, and assuming you don't mistreat any string objects in your
code, that should not be a problem.
It seems I get something like "藛鈥犆b偓鈥?/P>" notice the
question mark and half a </p> tag.


What characters are missing in this string? Is it only the opening '<'?

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de


Yes - although if i disable output escaping in my xsl i can see that ?lt; is
in the code as if the & has been replaced with a ?
Here is the code for your ref:
XmlDocument oDoc = new XmlDocument();
XslTransform oXsl = new XslTransform();

oDoc.Load(Server.MapPath(""));
oXsl.Load(Server.MapPath("xsl/x_language_test.xsl"));

StringWriter oSw = new StringWriter();

oXsl.Transform(oDoc,null,oSw);

litTestText.Text = oSw.ToString();
Thanks
Matt

having further investigated, i forgot to say that i only see what i do by
changing the encoding to simplified chinese in the browser, if i choose utf8
it is all still encoded like it appears in notepad if you click view source.

i did the same page in asp and it all displays correctly without issue.


Nov 19 '05 #6
Matt Hollingworth wrote:
Yes - although if i disable output escaping in my xsl i can see that
?lt; is in the code as if the & has been replaced with a ?
Here is the code for your ref:
XmlDocument oDoc = new XmlDocument();
XslTransform oXsl = new XslTransform();

oDoc.Load(Server.MapPath(""));
oXsl.Load(Server.MapPath("xsl/x_language_test.xsl"));

StringWriter oSw = new StringWriter();

oXsl.Transform(oDoc,null,oSw);

litTestText.Text = oSw.ToString();


Save for the wird Server.MapPath(""), there seems to be nothing wrong
here. I can only imagine that there's something wrong with the XSL
itself -- maybe somebody over in the XML group can help out.

Cheers,
--
http://www.joergjooss.de
mailto:ne********@joergjooss.de
Nov 19 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Chris | last post by:
I am developing an web application which needs to support all kinds of languages, like english, european character set, and other asian character set. Therefore, UTF-8 can include all those...
27
by: EU citizen | last post by:
Do web pages have to be created in unicode in order to use UTF-8 encoding? If so, can anyone name a free application which I can use under Windows 98 to create web pages?
38
by: lawrence | last post by:
I'm just now trying to give my site a character encoding of UTF-8. The site has been built in a hodge-podge way over the last 6 years. The validator tells me I've lots of characters that don't...
1
by: Ldaled | last post by:
Okay, I had a previous post called reading an XML document. Since this post I have revised my code and got it to work. Now, Like Derek had mentioned in answer to my previous post, I am getting an...
20
by: Jacky Cheung | last post by:
Hi, I am developing a vCard application which have to support UTF-8. Does the UTF-8 in char* will crash the strlen, I mean does UTF-8 have some char which treat as NULL character in strlen? ...
4
by: Just Me | last post by:
I can save a file as ANSI (I think I know what that is) or Unicode (I think I know what that is) or UTF-8, (I've no idea what that is). Can someone give me a brief into to UTF-8? I get the...
4
by: James | last post by:
I have a function that (by fluke or whatever) used to work perfectly and seems to have changed behaviour on me. The function was meant to take a string and convert it from have characters with...
3
by: David Mathog | last post by:
Does any standard C function support reading or writing UTF-8? I'm not talking about the trivial case where the text is just the ASCII subset of UTF-8. Rather, I'm referring to a hypothetical...
7
by: Elliot | last post by:
My XML is using encoding UTF-8 and its content contains Chinese character. When debug the following codes: string strXmlFile = "xml.xml"; XmlDocument objXml = new XmlDocument(); ...
2
by: SammyBar | last post by:
Hi all, I'm trying to convert the xml obtained from a XmlReader object into a UTF-8 array. My general idea is to read the XmlReader and write into a MemoryStream. Then convert the MemoryStream...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.