473,545 Members | 2,196 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

UTF-8 encoding in AJAX web application.

I hava an ajax web application where i hvae problems with UTF-8 encoding oc
chineese chars.

My Ajax webapplication runs in a HTML page that is UTF-8 Encoded.
I copy and paste some chineese chars from another HTML page viewed in IE7,
that is also UTF-8 encoded (search for "china" on google.com). I paste the
chineese chars into a content editable div.
My Ajax webservice compiles an XML where the data from the content editable
div is placed in a CDATA section and sends it to a webservice on the server.
I read the content editable div using .innerHTML. I call the webservice
using
XMLHttpRequest in the following way:
-----
req.open("POST" , strUrl, true);
req.setRequestH eader("Content-Type", "applicatio n/x-www-form-urlencoded;
charset=UTF-8");
var strSend = "";
for(var i=0; i<aParameters.l ength; i+=2)
{
if(strSend.leng th!=0) strSend += "&";
strSend += aParameters[i] + "=" + encodeURICompon ent(aParameters[i+1]);
}
req.send(strSen d);
-----
where req is the XMLHttpRequest object. and aParameteres is an array that
contians: parameterName, parameterValue, parameterName, parameterValue, ...

Before I send the XML I write it to screen and here the chineese chars are
displayed correctly.

On the server i use DotNet 2.0. The XML is transformed to SQL in a CDATA
section, that is read and executed against a MSSQL 2000 database, where the
string with the chineese chars is stored in a text column.

When I load the data again in my Ajax webapplication a webservice is called
that returns the string in an XML in a CDATA section, the data is read from
the database using a DataReader.

When the text loaded is displayed the chineese chars have turned into
questionmarks.

I've tried to change the column in the database to a image and use a
byte-array to fetch the data from the database. that didn't work, so I
changed it back.
I've added the following to my web.config:
-----
<globalizatio n
requestEncoding ="utf-8"
responseEncodin g="utf-8"
fileEncoding="u tf-8"
/>
-----
I've changed my ToXml method on my object to the following:
-----
// Define the desired encoding of the output
System.Text.Enc oding encodingOfXmlOu tput = System.Text.Enc oding.UTF8;
// Create MemoryStream to recieve our bytes
using (System.IO.Memo ryStream memoryStream = new System.IO.Memor yStream())
{
// Create XmlTextWriter using our created memoryStream and
encodingOfXmlOu tput
using (System.Xml.Xml TextWriter xmlWriter = new
System.Xml.XmlT extWriter(memor yStream, encodingOfXmlOu tput))
{
// Set formatting options for XmlTextWriter
xmlWriter.Forma tting = System.Xml.Form atting.None; // Output should not be
indented
//Write XML
xmlWriter.Write StartElement("Q uestion");
xmlWriter.Write StartElement("Q uestionText");
xmlWriter.Write CData(this.Text );
xmlWriter.Write EndElement(); //QuestionText
xmlWriter.Write EndElement(); //Question
// Force all bytes into memoryStream
xmlWriter.Flush ();
// Create buffer to recieve bytes from memoryStream
// Some encodings like UTF-8 contains a preamble (bytes to identify the
encoding)
// having this preamble in our output will invalidate our output, so we wont
be grapping that.
byte[] buffer = new byte[memoryStream.Le ngth -
encodingOfXmlOu tput.GetPreambl e().Length];
// Position cursor correct in memoryStream (which is after the preamble
memoryStream.Po sition = encodingOfXmlOu tput.GetPreambl e().Length;
// Fill data from current position of memoryStream into buffer
memoryStream.Re ad(buffer, 0, buffer.Length);
// Return string of the created Xml
return encodingOfXmlOu tput.GetString( buffer);
}
}
-----
Still the same problem.

When I transform the xml to sql I use the following function:
-----
public static string Transform(XslCo mpiledTransform compiledTransfo rm,
IXPathNavigable document)
{
if (compiledTransf orm == null) throw new
ArgumentNullExc eption("compile dTransform");
using (StringWriter writer = new StringWriter())
{
string strResult = string.Empty;
compiledTransfo rm.Transform(do cument, null, writer);
strResult = writer.ToString ();
return strResult;
}
}
-----

The XSLT has the following encoding
-----
<?xml version="1.0" encoding="UTF-8"?>
-----

So my question is the following: Where does my encoding screw up? How come I
can't save and load chineese chars correctly?

Any pointers would be greatly appreciated.

I don't know what other UTF-8 chars don't work correctly, but the danish
chars I initially had problems with (æøå) work correctly, I would like my
solution to work with any UTF-8 chars.

Kind Regards,
Allan Ebdrup
Mar 16 '07 #1
23 4994
On Mar 16, 9:01 am, "Allan Ebdrup" <ebd...@noemail .noemailwrote:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc
chineese chars.

My Ajax webapplication runs in a HTML page that is UTF-8 Encoded.
I copy and paste some chineese chars from another HTML page viewed in IE7,
that is also UTF-8 encoded (search for "china" on google.com). I paste the
chineese chars into a content editable div.
My Ajax webservice compiles an XML where the data from the content editable
div is placed in a CDATA section and sends it to a webservice on the server.
I read the content editable div using .innerHTML. I call the webservice
using
XMLHttpRequest in the following way:
-----
req.open("POST" , strUrl, true);
req.setRequestH eader("Content-Type", "applicatio n/x-www-form-urlencoded;
charset=UTF-8");
var strSend = "";
for(var i=0; i<aParameters.l ength; i+=2)
{
if(strSend.leng th!=0) strSend += "&";
strSend += aParameters[i] + "=" + encodeURICompon ent(aParameters[i+1]);}

req.send(strSen d);
-----
where req is the XMLHttpRequest object. and aParameteres is an array that
contians: parameterName, parameterValue, parameterName, parameterValue, ...

Before I send the XML I write it to screen and here the chineese chars are
displayed correctly.

On the server i use DotNet 2.0. The XML is transformed to SQL in a CDATA
section, that is read and executed against a MSSQL 2000 database, where the
string with the chineese chars is stored in a text column.

When I load the data again in my Ajax webapplication a webservice is called
that returns the string in an XML in a CDATA section, the data is read from
the database using a DataReader.

When the text loaded is displayed the chineese chars have turned into
questionmarks.

I've tried to change the column in the database to a image and use a
byte-array to fetch the data from the database. that didn't work, so I
changed it back.
I've added the following to my web.config:
-----
<globalizatio n
requestEncoding ="utf-8"
responseEncodin g="utf-8"
fileEncoding="u tf-8"
/>
-----
I've changed my ToXml method on my object to the following:
-----
// Define the desired encoding of the output
System.Text.Enc oding encodingOfXmlOu tput = System.Text.Enc oding.UTF8;
// Create MemoryStream to recieve our bytes
using (System.IO.Memo ryStream memoryStream = new System.IO.Memor yStream())
{
// Create XmlTextWriter using our created memoryStream and
encodingOfXmlOu tput
using (System.Xml.Xml TextWriter xmlWriter = new
System.Xml.XmlT extWriter(memor yStream, encodingOfXmlOu tput))
{
// Set formatting options for XmlTextWriter
xmlWriter.Forma tting = System.Xml.Form atting.None; // Output should notbe
indented
//Write XML
xmlWriter.Write StartElement("Q uestion");
xmlWriter.Write StartElement("Q uestionText");
xmlWriter.Write CData(this.Text );
xmlWriter.Write EndElement(); //QuestionText
xmlWriter.Write EndElement(); //Question
// Force all bytes into memoryStream
xmlWriter.Flush ();
// Create buffer to recieve bytes from memoryStream
// Some encodings like UTF-8 contains a preamble (bytes to identify the
encoding)
// having this preamble in our output will invalidate our output, so we wont
be grapping that.
byte[] buffer = new byte[memoryStream.Le ngth -
encodingOfXmlOu tput.GetPreambl e().Length];
// Position cursor correct in memoryStream (which is after the preamble
memoryStream.Po sition = encodingOfXmlOu tput.GetPreambl e().Length;
// Fill data from current position of memoryStream into buffer
memoryStream.Re ad(buffer, 0, buffer.Length);
// Return string of the created Xml
return encodingOfXmlOu tput.GetString( buffer);}
}

-----
Still the same problem.

When I transform the xml to sql I use the following function:
-----
public static string Transform(XslCo mpiledTransform compiledTransfo rm,
IXPathNavigable document)
{
if (compiledTransf orm == null) throw new
ArgumentNullExc eption("compile dTransform");
using (StringWriter writer = new StringWriter())
{
string strResult = string.Empty;
compiledTransfo rm.Transform(do cument, null, writer);
strResult = writer.ToString ();
return strResult;}
}

-----

The XSLT has the following encoding
-----
<?xml version="1.0" encoding="UTF-8"?>
-----

So my question is the following: Where does my encoding screw up? How come I
can't save and load chineese chars correctly?

Any pointers would be greatly appreciated.

I don't know what other UTF-8 chars don't work correctly, but the danish
chars I initially had problems with (æøå) work correctly, I would like my
solution to work with any UTF-8 chars.

Kind Regards,
Allan Ebdrup

It sounds like the problem isn't with your application, but with your
databse definition. Your web page is UTF-8, but is your databse
table?

Assuming that your databse *IS* set up to store UTF-8, is the query
tool you are using? It may be translating the extra characters into ?
between the database and your application.

It may be that your code is fine, and you should redirect your bug
search to the database level.

I know that's not a definitive answer, but I hope that helps.

--Sim

Mar 16 '07 #2
Allan Ebdrup <eb****@noemail .noemailwrote:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc
chineese chars.

My Ajax webapplication runs in a HTML page that is UTF-8 Encoded.
I copy and paste some chineese chars from another HTML page viewed in IE7,
that is also UTF-8 encoded (search for "china" on google.com). I paste the
chineese chars into a content editable div.
My Ajax webservice compiles an XML where the data from the content editable
div is placed in a CDATA section and sends it to a webservice on the server.
I read the content editable div using .innerHTML. I call the webservice
using
XMLHttpRequest in the following way:
<snip>

See http://pobox.com/~skeet/csharp/debuggingunicode.html

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 16 '07 #3
Hi Allan,

Regarding on this unicode transfer issue, I think it is likely due to the
text convertion in SQL Server database. I have performed the following test
on my local test machine:

** use an ASP.NET aspx page to render out a <textareaand use
client-script (with xmlhttp component) to send the input in <textareato
server, charset is utf-8 as you did

** at server-side, I save the xmlhttp posted data into a file(also utf-8
encoding).

Based on my test, the chinese characters are correctly saved. Therefore,
you can try checking the posted data at server-side, use debugger to break
into code and inspect the variable or write it into file for checking. If
the problem is caused by SQL Server database storage, we need to do some
further research against the database table.

Please feel free to pos there if you have any other finding or questions.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead

=============== =============== =============== =====

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscripti...t/default.aspx.

=============== =============== =============== =====

This posting is provided "AS IS" with no warranties, and confers no rights.
Mar 19 '07 #4

"SimeonArgu s" <si*********@gm ail.comwrote in message
news:11******** *************@e 65g2000hsc.goog legroups.com...
>It sounds like the problem isn't with your application, but with your
databse definition. Your web page is UTF-8, but is your databse
table?

Assuming that your databse *IS* set up to store UTF-8, is the query
tool you are using? It may be translating the extra characters into ?
between the database and your application.

It may be that your code is fine, and you should redirect your bug
search to the database level.

I know that's not a definitive answer, but I hope that helps.
Thanks for your feedback.
You might be right, but as I wrote I tried to save the data as a byte array
(image) in the database, and that didn't eliminate the problem.

When I execute my SQL command text, what encoding does SQL server expect in
the command text? Is there any way to specify what encoding to use in the
SQL command text?

Kind Regards,
Allan Ebdrup.
Mar 19 '07 #5
On Mar 19, 7:39 am, "Allan Ebdrup" <ebd...@noemail .noemailwrote:
When I execute my SQL command text, what encoding does SQL server expect in
the command text? Is there any way to specify what encoding to use in the
SQL command text?
That should all be handled for you in the driver. Just use strings,
which are already unicode.

Jon

Mar 19 '07 #6

"Steven Cheng[MSFT]" <st*****@online .microsoft.comw rote in message
news:aV******** *****@TK2MSFTNG HUB02.phx.gbl.. .
Hi Allan,

Regarding on this unicode transfer issue, I think it is likely due to the
text convertion in SQL Server database. I have performed the following
test
on my local test machine:

** use an ASP.NET aspx page to render out a <textareaand use
client-script (with xmlhttp component) to send the input in <textareato
server, charset is utf-8 as you did

** at server-side, I save the xmlhttp posted data into a file(also utf-8
encoding).

Based on my test, the chinese characters are correctly saved. Therefore,
you can try checking the posted data at server-side, use debugger to break
into code and inspect the variable or write it into file for checking. If
the problem is caused by SQL Server database storage, we need to do some
further research against the database table.

Please feel free to pos there if you have any other finding or questions.
OK, now I've tried to log the text in different places and found the
following.
The SQL executed against the database has the chars encoded correctly.
When I extract the data from the database it is converted to questionmarks.

I've tried to change the column that has the data from text to image, but
the bytes retrieved from the SQL server are not the same as what i passed
in.

Is this because I write the value inline in the SQL command text? Is the SQL
command parsed in some way using some kind of encoding, even though it's
specified to be a image coloumn?
Would passing the value to the SQL Server as a parameter perhaps solve the
problem?

Kind Regards,
Allan Ebdrup.
Mar 19 '07 #7


#Why do some SQL strings have an 'N' prefix?

Mar 20 '07 #8
Hello Allan,

Thanks for your reply.

Of course, for .net application, using the ADO.NET SqlCommand and
SqlParameter objects to supply any dynamic parameters should be the
preferred approach. For your SQL Server 2000 database table's column(that
will store the posted chinese characters), is it defined as unicode
character type(such as nchar, nvarchar or ntext ...)? If the column
datatype is of unicode, it should be able to store the chinese characters
correctly, otherwise, you need to make sure the table/column or database's
collation is correctly set as Chinese collation so that chinese chars can
be stored in non-unicode encoded format.

In addition, for SQL Server (7.0 or 2000) unicode datatype, it is stored in
UCS-2 charset (no matter the data is originally encoded in UTF-8, UTF-16
or....). Here is a good MSDN reference introducing the international
features in SQL Server 2000

http://msdn2.microsoft.com/en-us/lib...4(SQL.80).aspx

For your current code that directly use inline SQL command text to execute
the insert query, I think you can try adding a 'N' prefix in each string
parameter, this prefix is used to explicit mark the parameter value as
unicode chars.

#Why do some SQL strings have an 'N' prefix?
http://databases.aspfaq.com/general/...ve-an-n-prefix
.html

If you have any further specific questions, please feel free to let me know.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead
This posting is provided "AS IS" with no warranties, and confers no rights.

Mar 20 '07 #9
"Steven Cheng[MSFT]" <st*****@online .microsoft.comw rote in message
news:mm******** ******@TK2MSFTN GHUB02.phx.gbl. ..
Hello Allan,

Thanks for your reply.

Of course, for .net application, using the ADO.NET SqlCommand and
SqlParameter objects to supply any dynamic parameters should be the
preferred approach. For your SQL Server 2000 database table's column(that
will store the posted chinese characters), is it defined as unicode
character type(such as nchar, nvarchar or ntext ...)? If the column
datatype is of unicode, it should be able to store the chinese characters
correctly, otherwise, you need to make sure the table/column or database's
collation is correctly set as Chinese collation so that chinese chars can
be stored in non-unicode encoded format.

In addition, for SQL Server (7.0 or 2000) unicode datatype, it is stored
in
UCS-2 charset (no matter the data is originally encoded in UTF-8, UTF-16
or....). Here is a good MSDN reference introducing the international
features in SQL Server 2000

http://msdn2.microsoft.com/en-us/lib...4(SQL.80).aspx

For your current code that directly use inline SQL command text to execute
the insert query, I think you can try adding a 'N' prefix in each string
parameter, this prefix is used to explicit mark the parameter value as
unicode chars.

#Why do some SQL strings have an 'N' prefix?
http://databases.aspfaq.com/general/...ve-an-n-prefix
html
I changed my code to use parameters, but I still had the same problem,
then I also changed the database to use a image column (byte array), and
when I store the bytes to the database I use:

System.Text.Enc oding.UTF8.GetB ytes(value)

When I retrieve the data from the database I use:

System.Text.Enc oding.UTF8.GetS tring(value)

It works! Now I can store UTF-8 strings in the database. Thank you for your
help.

Kind Regards,
Allan Ebdrup
Mar 20 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
6377
by: Alban Hertroys | last post by:
Another python/psycopg question, for which the solution is probably quite simple; I just don't know where to look. I have a query that inserts data originating from an utf-8 encoded XML file. And guess what, it contains utf-8 encoded characters... Now my problem is that psycopg will only accept queries of type str, so how do I get my utf-8...
38
5696
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I find an answer to this question (don't find it in the W3C_char_entities document). -- Haines Brown brownh@hartford-hwp.com
16
6142
by: lawrence | last post by:
I was told in another newsgroup (about XML, I was wondering how to control user input) that most modern browsers empower the designer to cast the user created input to a particular character encoding. This arose in answer to my question about how to control user input. I had complained that I had users who wrote articles in Microsoft Word or...
22
11919
by: Martin Trautmann | last post by:
Hi all, is there any kind of 'hiconv' or other (unix-like) conversion tool that would convert UTF-8 to HTML (ISO-Latin-1 and Unicode)? The database output is UTF-8 or UTF-16 only - Thus almost every character starts with ^@. I've seen e.g. http://aktuell.de.selfhtml.org/artikel/javascript/utf8b64/utf8.htm#a5 as
7
3071
by: saroj.yadav | last post by:
As I understand it (correct me, if I am wrong) Unicode came into picture so that a document containing multiple language characters can be supported like somebody can write a document comparing Korean and Chinese in French language. Now, I am looking at all UNIX platforms and seems like all Unix (AIX, HP, Solaris) platforms support Unicode...
4
6903
by: Cott Lang | last post by:
ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 Running 7.4.5, I frequently get this error, and ONLY on this particular character despite seeing quite a bit of 8 bit. I don't really follow why it can't be converted, it's the same character (239) in both character sets. Databases are in ISO8859-1, JDBC driver is defaulting to...
23
8169
by: Steven T. Hatton | last post by:
This is one of the first obstacles I encountered when getting started with C++. I found that everybody had their own idea of what a string is. There was std::string, QString, xercesc::XMLString, etc. There are also char, wchar_t, QChar, XMLCh, etc., for character representation. Coming from Java where a String is a String is a String, that...
7
12108
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32, but with zero padding and hence no real conversion is necessary? If I am completely wrong and some intricate conversion operation needs to take...
3
6892
by: Jared Wiltshire | last post by:
I'm trying to convert a wstring (actually a BSTR) to UTF-8. This is what I've currently got: size_t arraySize; setlocale(LC_CTYPE,"C-UTF-8"); arraySize = wcstombs(NULL, wstr, 0); char utf8string; wcstombs(utf8string, Name, arraySize + 1);
1
14483
by: Marco Miltenburg | last post by:
While working on some multilingual code I found a rather strange thing happening with Server.HTMLEncode. While loading different languages I change the Codepage and Charset in ASP to reflect the language. This all works fine. However when I tried to use Charset UTF-8 with Codepage 65001 everywhere I found that HTMLEncode always translates...
0
7465
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7398
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
7656
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
7805
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7416
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
4944
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3449
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
1878
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1013
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.