471,570 Members | 932 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,570 software developers and data experts.

utf8-encoding

Hello all,

I am building a CMS which has 2 language: English & Traditional Chinese
my problem is all data are represent as "?????????", all pagecode are set to
utf8

do I need to encoding(-> utf8) before insert into DB?
OR do I need to do anything when content display?

Thanks in advanced.
Feb 2 '06 #1
11 3010
beachboy wrote:
I am building a CMS which has 2 language: English & Traditional Chinese
my problem is all data are represent as "?????????", all pagecode are set to
utf8

do I need to encoding(-> utf8) before insert into DB?
OR do I need to do anything when content display?


You'll need to give more information about your situation, I'm afraid.
Describe the different layers, whether it's a webapp or a Windows Forms
app etc.

You shouldn't need to do any work before putting a string into a
database though, so long as the database (and the column in question)
supports Unicode.

See http://www.pobox.com/~skeet/csharp/d...ngunicode.html for more
information on how to diagnose problems.

Jon

Feb 2 '06 #2
> Hello all,

I am building a CMS which has 2 language: English & Traditional
Chinese
my problem is all data are represent as "?????????", all pagecode are
set to
utf8
do I need to encoding(-> utf8) before insert into DB? OR do I need to
do anything when content display?

Thanks in advanced.


Additionally to what Jon listed, information about how you display those
????? values would be useful. All too often I see long discussions about
MS SQL Server not handling Unicode and then finally it turns out they are
using Enterprise Manager to look at the data, and EM doesn't handle non-english
character sets very well. The data might very well be ok though.

--
Lasse Vågsæther Karlsen
http://usinglvkblog.blogspot.com/
mailto:la***@vkarlsen.no
PGP KeyID: 0x2A42A1C2
Feb 2 '06 #3
Thanks all advise.

My project is ASP.NET c# project,
I use simple form to input the content into database
and use simple query from database, and then "?????" displayed on my
webpage.
my page encoding set to utf-8 defaultly.

Is much enough? Thanks in advanced.

"Jon Skeet [C# MVP]" <sk***@pobox.com> ???
news:11*********************@z14g2000cwz.googlegro ups.com ???...
beachboy wrote:
I am building a CMS which has 2 language: English & Traditional Chinese
my problem is all data are represent as "?????????", all pagecode are set to utf8

do I need to encoding(-> utf8) before insert into DB?
OR do I need to do anything when content display?


You'll need to give more information about your situation, I'm afraid.
Describe the different layers, whether it's a webapp or a Windows Forms
app etc.

You shouldn't need to do any work before putting a string into a
database though, so long as the database (and the column in question)
supports Unicode.

See http://www.pobox.com/~skeet/csharp/d...ngunicode.html for more
information on how to diagnose problems.

Jon

Feb 2 '06 #4
beachboy wrote:
Thanks all advise.

My project is ASP.NET c# project,
I use simple form to input the content into database
and use simple query from database, and then "?????" displayed on my
webpage.
my page encoding set to utf-8 defaultly.

Is much enough? Thanks in advanced.


That's enough to start with. Now apply the techniques talked about in
the article I linked to before, and find out where the problem is. It
sounds like it shouldn't be too hard to find. Just log the contents of
the string (as Unicode numbers) when you store it in the database and
when you retrieve it. If they're the same, the problem is in the code
communicating the server and the web browser. If they're different, the
problem is in the code communicating with the database.

Jon

Feb 2 '06 #5
Are you using nvarchar in the database? If not, the database cannot handle
these characters.

"beachboy" <st*****@javacatz.com> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Hello all,

I am building a CMS which has 2 language: English & Traditional Chinese
my problem is all data are represent as "?????????", all pagecode are set
to
utf8

do I need to encoding(-> utf8) before insert into DB?
OR do I need to do anything when content display?

Thanks in advanced.

Feb 2 '06 #6
yes. you are right.
I am using nvarchar and ntext for data store.
Thanks.
"Mats-Lennart Hansson" <ap********@hotmail.com> ¼¶¼g©ó¶l¥ó·s»D:Ob**************@TK2MSFTNGP10.phx.g bl...
Are you using nvarchar in the database? If not, the database cannot handle
these characters.

"beachboy" <st*****@javacatz.com> wrote in message
news:%2****************@TK2MSFTNGP14.phx.gbl...
Hello all,

I am building a CMS which has 2 language: English & Traditional Chinese
my problem is all data are represent as "?????????", all pagecode are set
to
utf8

do I need to encoding(-> utf8) before insert into DB?
OR do I need to do anything when content display?

Thanks in advanced.


Feb 2 '06 #7
Thanks.

use this method to convert unicode number?

static void DumpString (string value){ foreach (char c in value)
{
Console.Write ("{0:x4} ", (int)c);
}
Console.WriteLine();
}

"Jon Skeet [C# MVP]" <sk***@pobox.com> ???
news:11**********************@g43g2000cwa.googlegr oups.com ???...
beachboy wrote:
Thanks all advise.

My project is ASP.NET c# project,
I use simple form to input the content into database
and use simple query from database, and then "?????" displayed on my
webpage.
my page encoding set to utf-8 defaultly.

Is much enough? Thanks in advanced.


That's enough to start with. Now apply the techniques talked about in
the article I linked to before, and find out where the problem is. It
sounds like it shouldn't be too hard to find. Just log the contents of
the string (as Unicode numbers) when you store it in the database and
when you retrieve it. If they're the same, the problem is in the code
communicating the server and the web browser. If they're different, the
problem is in the code communicating with the database.

Jon

Feb 3 '06 #8
My project is ASP.NET c# project,
I use simple form to input the content into database
and use simple query from database, and then "?????" displayed on my
webpage.
my page encoding set to utf-8 defaultly.

Thanks in advanced.
"Lasse Vågsæther Karlsen" <la***@vkarlsen.no> ???
news:9d**************************@news.microsoft.c om ???...
Hello all,

I am building a CMS which has 2 language: English & Traditional
Chinese
my problem is all data are represent as "?????????", all pagecode are
set to
utf8
do I need to encoding(-> utf8) before insert into DB? OR do I need to
do anything when content display?

Thanks in advanced.

Additionally to what Jon listed, information about how you display those
????? values would be useful. All too often I see long discussions about
MS SQL Server not handling Unicode and then finally it turns out they are
using Enterprise Manager to look at the data, and EM doesn't handle

non-english character sets very well. The data might very well be ok though.

--
Lasse Vågsæther Karlsen
http://usinglvkblog.blogspot.com/
mailto:la***@vkarlsen.no
PGP KeyID: 0x2A42A1C2

Feb 3 '06 #9
beachboy <st*****@javacatz.com> wrote:
use this method to convert unicode number?

static void DumpString (string value){ foreach (char c in value)
{
Console.Write ("{0:x4} ", (int)c);
}
Console.WriteLine();
}


Yes.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Feb 3 '06 #10
Sorry. I am a beginner of CMS developer...
==========
private void Page_Load(object sender, System.EventArgs e)
{
// Put user code to initialize the page here
DumpString( "stan");
}

public void DumpString(string src)
{
StringBuilder sb = new StringBuilder();
foreach (char c in src)
{
Response.Write( "{0:x4}" , (int)c);
Response.Write("[]");
}
}

==========
but it has error on "Response.Write( "{0:x4}" , (int)c);"

Thank you!

"Jon Skeet [C# MVP]" <sk***@pobox.com> ???
news:MP************************@msnews.microsoft.c om ???...
beachboy <st*****@javacatz.com> wrote:
use this method to convert unicode number?

static void DumpString (string value){ foreach (char c in value)
{
Console.Write ("{0:x4} ", (int)c);
}
Console.WriteLine();
}


Yes.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Feb 3 '06 #11
beachboy wrote:
Sorry. I am a beginner of CMS developer...
==========
private void Page_Load(object sender, System.EventArgs e)
{
// Put user code to initialize the page here
DumpString( "stan");
}

public void DumpString(string src)
{
StringBuilder sb = new StringBuilder();
foreach (char c in src)
{
Response.Write( "{0:x4}" , (int)c);
Response.Write("[]");
}
}

==========
but it has error on "Response.Write( "{0:x4}" , (int)c);"


Well, you can use Response.Write (string.Format("{0:x4}", (int)c));

However, another alternative is to write the string content to a log
file instead, so that you can still see what appears on the web page,
but separately find out what that consists of.

I'd look at the database side first though, as you can do that with a
console app very easily.

Jon

Feb 3 '06 #12

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by sinasalek | last post: by
10 posts views Thread by pekka niiranen | last post: by
6 posts views Thread by Spamtrap | last post: by
reply views Thread by Sagi Bashari | last post: by
12 posts views Thread by chunhui_true | last post: by
4 posts views Thread by chris_fieldhouse | last post: by
reply views Thread by leo001 | last post: by
reply views Thread by lumer26 | last post: by
reply views Thread by lumer26 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.