Hi,
I receive an utf8 character from a database, like 田 (Japanese
Character, style: &#XXXXX).
How can I visualize the Japanese character on my application? I have found
the class System.Text.Encoding, but the input looks like \uXXXX. I don't
know how to do.
Thank you,
Bart 15 1700
Bart <ba**@bart.it> wrote: I receive an utf8 character from a database, like 田 (Japanese Character, style: &#XXXXX). How can I visualize the Japanese character on my application? I have found the class System.Text.Encoding, but the input looks like \uXXXX. I don't know how to do.
I'm not entirely sure what you mean by "the input looks like \uXXXX".
Do you mean it's stored in the database as a string with "\uXXXX" in?
Are you *sure* about that, or is that just what the debugger is
showing? (Try writing it out to the console.)
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om... Bart <ba**@bart.it> wrote: I receive an utf8 character from a database, like 田 (Japanese Character, style: &#XXXXX). How can I visualize the Japanese character on my application? I have
found the class System.Text.Encoding, but the input looks like \uXXXX. I don't know how to do. I'm not entirely sure what you mean by "the input looks like \uXXXX". Do you mean it's stored in the database as a string with "\uXXXX" in? Are you *sure* about that, or is that just what the debugger is showing? (Try writing it out to the console.)
I have looked this example at MSDN Library:
UTF8Encoding utf8 = new UTF8Encoding();
UTF8Encoding utf8ThrowException = new UTF8Encoding(false, true);
// This array contains two high surrogates in a row (\uD801,
\uD802).
// A high surrogate should be followed by a low surrogate.
Char[] chars = new Char[] {'a', 'b', 'c', '\uD801', '\uD802', 'd'};
It means that I have to write the strings as \uXXXX, but in my database the
file are stored (utf8) as &#XXXXX. I don't understand why in the example an
utf8 character has that format and in my database a different one even if
are both utf8 encoded. -- Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet If replying to the group, please do not mail me too
Bart <ba**@bart.it> wrote: "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message news:MP************************@msnews.microsoft.c om... Bart <ba**@bart.it> wrote: I receive an utf8 character from a database, like 田 (Japanese Character, style: &#XXXXX). How can I visualize the Japanese character on my application? I have found the class System.Text.Encoding, but the input looks like \uXXXX. I don't know how to do.
I'm not entirely sure what you mean by "the input looks like \uXXXX". Do you mean it's stored in the database as a string with "\uXXXX" in? Are you *sure* about that, or is that just what the debugger is showing? (Try writing it out to the console.)
I have looked this example at MSDN Library:
UTF8Encoding utf8 = new UTF8Encoding(); UTF8Encoding utf8ThrowException = new UTF8Encoding(false, true);
// This array contains two high surrogates in a row (\uD801, \uD802). // A high surrogate should be followed by a low surrogate. Char[] chars = new Char[] {'a', 'b', 'c', '\uD801', '\uD802', 'd'};
It means that I have to write the strings as \uXXXX, but in my database the file are stored (utf8) as &#XXXXX. I don't understand why in the example an utf8 character has that format and in my database a different one even if are both utf8 encoded.
I'm not sure what you mean by "stored (utf8) as &#XXXXX". Do you mean
that the actual characters '&' '#' etc are in the database - or is it
just that that's how you see non-ASCII characters when displaying them
in (say) a SQL query execution environment?
The reason Unicode characters above 0xffff have to be stored in
surrogate form in .NET is that .NET uses UTF-16 internally, effectively
- each character is 16 bits, which isn't enough to cover the whole of
Unicode.
When you write a string in a C# program, however, you *can* use
\UXXXXXXXX instead (note the capital U). Only values up to 0x10ffff are
supported, so the first two Xs will always be 0.
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
At first, I wuold like to thank you for answers.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om... It means that I have to write the strings as \uXXXX, but in my database
the file are stored (utf8) as &#XXXXX. I don't understand why in the example
an utf8 character has that format and in my database a different one even
if are both utf8 encoded.
I'm not sure what you mean by "stored (utf8) as &#XXXXX". Do you mean that the actual characters '&' '#' etc are in the database - or is it just that that's how you see non-ASCII characters when displaying them in (say) a SQL query execution environment?
I mean that the '&' '#' character are in database and if I make a query I
receive, in C#, a character as &#XXXXX.
I will make an example:
I have a form that make a query on a database written in Japanese with utf8
encoding: the result of the query on my form is:
差よ
and I don't know which method I have to call to make the conversion.
Bart <ba**@bart.it> wrote: I'm not sure what you mean by "stored (utf8) as &#XXXXX". Do you mean that the actual characters '&' '#' etc are in the database - or is it just that that's how you see non-ASCII characters when displaying them in (say) a SQL query execution environment? I mean that the '&' '#' character are in database and if I make a query I receive, in C#, a character as &#XXXXX.
Right - I think.
I will make an example: I have a form that make a query on a database written in Japanese with utf8 encoding: the result of the query on my form is:
差よ
and I don't know which method I have to call to make the conversion.
When you say "on your form" - do you mean as output on a web page? If
so, that's introducing another level of encoding - XML encoding. What
happens if you write a simple console application to search the
database and write the results to a console? Do you still get 差?
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om... Bart <ba**@bart.it> wrote:
When you say "on your form" - do you mean as output on a web page? If so, that's introducing another level of encoding - XML encoding. What happens if you write a simple console application to search the database and write the results to a console? Do you still get 差?
it is a simple windows form. the result of the query appears on a label.
I have not tried a console application yet.
Bart <ba**@bart.it> wrote: "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message news:MP************************@msnews.microsoft.c om... Bart <ba**@bart.it> wrote:
When you say "on your form" - do you mean as output on a web page? If so, that's introducing another level of encoding - XML encoding. What happens if you write a simple console application to search the database and write the results to a console? Do you still get 差?
it is a simple windows form. the result of the query appears on a label. I have not tried a console application yet.
Right, if it's a windows form, that's fine. I'm intrigued as to why the
database returns it that way. How did the data get in there in the
first place? It should be returning the data properly in UTF-8 rather
than using an XML-type encoding.
How sure are you that the problem isn't just that the program which
*submitted* the data to the database was XML-encoding it?
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om... Bart <ba**@bart.it> wrote: "Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message news:MP************************@msnews.microsoft.c om... Bart <ba**@bart.it> wrote:
When you say "on your form" - do you mean as output on a web page? If so, that's introducing another level of encoding - XML encoding. What happens if you write a simple console application to search the database and write the results to a console? Do you still get
差? it is a simple windows form. the result of the query appears on a label. I have not tried a console application yet.
Right, if it's a windows form, that's fine. I'm intrigued as to why the database returns it that way. How did the data get in there in the first place? It should be returning the data properly in UTF-8 rather than using an XML-type encoding.
How sure are you that the problem isn't just that the program which *submitted* the data to the database was XML-encoding it?
Actually I don't know how the data were stored in the database. Perhaps they
used web interface PHPMyAdmin (according to their style).
Bart <ba**@bart.it> wrote: How sure are you that the problem isn't just that the program which *submitted* the data to the database was XML-encoding it?
Actually I don't know how the data were stored in the database. Perhaps they used web interface PHPMyAdmin (according to their style).
Hmm.
What happens if you try to insert data into the database yourself? What
database is it, anyway (SQL Server, Oracle etc)?
My guess is that you'll find that whatever you put into the database,
you get the same thing out - if you put the string "田" in, you'll
get that out, but if you put in a string actually containing U+030000
(the unicode character 0x30000) you'll get that out, in which case the
data in the database is effectively corrupt to some extent. Depending
on your project, you may want to write a tool to "clean" the database,
and then write your main project without worrying about it.
Out of interest, what happens to '&' signs in the database? Do they
come out as & ?
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om... Bart <ba**@bart.it> wrote: How sure are you that the problem isn't just that the program which *submitted* the data to the database was XML-encoding it?
Actually I don't know how the data were stored in the database. Perhaps
they used web interface PHPMyAdmin (according to their style).
Hmm.
What happens if you try to insert data into the database yourself? What database is it, anyway (SQL Server, Oracle etc)?
My guess is that you'll find that whatever you put into the database, you get the same thing out - if you put the string "田" in, you'll get that out, but if you put in a string actually containing U+030000 (the unicode character 0x30000) you'll get that out, in which case the data in the database is effectively corrupt to some extent. Depending on your project, you may want to write a tool to "clean" the database, and then write your main project without worrying about it.
Out of interest, what happens to '&' signs in the database? Do they come out as & ?
I tried to access to database from web by using phpMyAdmin. The situation is
quite strange, because if I access with Japanese interface (SJIS) I can look
some names written in Japanese and some combinations as
よしだ
I'm getting quite confuse!!!!!! at last I don't know how I can make query,
utf or sjis..... I have to check it better.......
Bart <ba**@bart.it> wrote: My guess is that you'll find that whatever you put into the database, you get the same thing out - if you put the string "田" in, you'll get that out, but if you put in a string actually containing U+030000 (the unicode character 0x30000) you'll get that out, in which case the data in the database is effectively corrupt to some extent. Depending on your project, you may want to write a tool to "clean" the database, and then write your main project without worrying about it.
Out of interest, what happens to '&' signs in the database? Do they come out as & ?
I tried to access to database from web by using phpMyAdmin. The situation is quite strange, because if I access with Japanese interface (SJIS) I can look some names written in Japanese and some combinations as よしだ I'm getting quite confuse!!!!!! at last I don't know how I can make query, utf or sjis..... I have to check it better.......
Ignore the web interface for the moment - the first thing you need to
understand is what the database itself is doing. Now, what happens if
you try to insert a new record with Japanese characters in (not using
&#xxxxx at all)?
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om... Bart <ba**@bart.it> wrote: My guess is that you'll find that whatever you put into the database, you get the same thing out - if you put the string "田" in,
you'll get that out, but if you put in a string actually containing U+030000 (the unicode character 0x30000) you'll get that out, in which case the data in the database is effectively corrupt to some extent. Depending on your project, you may want to write a tool to "clean" the database, and then write your main project without worrying about it.
Out of interest, what happens to '&' signs in the database? Do they come out as & ?
I tried to access to database from web by using phpMyAdmin. The
situation is quite strange, because if I access with Japanese interface (SJIS) I can
look some names written in Japanese and some combinations as よしだ I'm getting quite confuse!!!!!! at last I don't know how I can make
query, utf or sjis..... I have to check it better.......
Ignore the web interface for the moment - the first thing you need to understand is what the database itself is doing. Now, what happens if you try to insert a new record with Japanese characters in (not using &#xxxxx at all)?
I tried to insert a japanese name (not with web interface) into a record and
I tried to visualize on the web and with windows form:
for example, I tried to insert ?????as name from a windows form that makes
an "insert" in the database.
the windows form that visualize the select query shows me "????"
the web interface shows me the same, "????".
Now, I think there is some problems with the exchange of the data. I tried
to have the code of the webservice in php:
function INSERT ($mystring, $name,$password)
{
//make a connection to DBMS
$connection = mysql_connect("localhost", $name,$password);
// select a database
mysql_select_db('DATABASE, $connection);
//Query to insert
$doquery=mysql_query("INSERT INTO Name VALUES ( , '$mystring') " )or
die("");
//close connection
mysql_close($connection);
}
in C# I simply add a web reference to database and on a windows form I have
a textbox to write the name.
Bart <ba**@bart.it> wrote: Ignore the web interface for the moment - the first thing you need to understand is what the database itself is doing. Now, what happens if you try to insert a new record with Japanese characters in (not using &#xxxxx at all)? I tried to insert a japanese name (not with web interface) into a record and I tried to visualize on the web and with windows form: for example, I tried to insert ?????as name from a windows form that makes an "insert" in the database.
the windows form that visualize the select query shows me "????"
the web interface shows me the same, "????".
Were those all meant to be question marks, or were some of them meant
to be characters?
Now, I think there is some problems with the exchange of the data. I tried to have the code of the webservice in php:
function INSERT ($mystring, $name,$password) { //make a connection to DBMS $connection = mysql_connect("localhost", $name,$password);
// select a database mysql_select_db('DATABASE, $connection);
//Query to insert $doquery=mysql_query("INSERT INTO Name VALUES ( , '$mystring') " )or die("");
//close connection mysql_close($connection); }
Ah, it's MySQL. That could well make a difference to things. For one
thing, according to http://dev.mysql.com/doc/mysql/en/Charset-Unicode.html
MySql doesn't cope with UTF-8 values which take more than three bytes -
in other words, Unicode values > 0xffff. However, it also implies that
you don't *need* Unicode values > 0xffff. Unless you know otherwise, I
suggest we assume that surrogates aren't part of the problem for the
moment.
Now, how is the database set up? Are you connecting to it with an
appropriate connection string? Either of those could be the problem -
or it could be a bug in the MySQL .NET provider you're using.
I suggest you try to isolate the problem: come up with a simple clean
database (with a single table with a single column), then a short but
complete program which demonstrates the problem.
See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
Once you've got that, we should be able to work out either how to fix
things or who to complain to :)
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
> See http://www.pobox.com/~skeet/csharp/complete.html for details of what I mean by that.
I think it is the best way :-))))))
Once you've got that, we should be able to work out either how to fix things or who to complain to :)
Thank you very much, I will continue and let you know asap.
Bart
Bart <ba**@bart.it> wrote: See http://www.pobox.com/~skeet/csharp/complete.html for details of what I mean by that.
I think it is the best way :-))))))
Once you've got that, we should be able to work out either how to fix things or who to complain to :)
Thank you very much, I will continue and let you know asap.
Righto. If you let me know which version of MySQL you've got, and which
provider you're using, I'll try to get them installed here so I can run
your code too.
--
Jon Skeet - <sk***@pobox.com> http://www.pobox.com/~skeet
If replying to the group, please do not mail me too This discussion thread is closed Replies have been disabled for this discussion. Similar topics
13 posts
views
Thread by Mattias Campe |
last post: by
|
1 post
views
Thread by learning_C++ |
last post: by
|
4 posts
views
Thread by Joel |
last post: by
|
12 posts
views
Thread by Tee |
last post: by
|
12 posts
views
Thread by Nathan Sokalski |
last post: by
|
9 posts
views
Thread by Anoj |
last post: by
| |
3 posts
views
Thread by PicO |
last post: by
| | | | | | | | | | | | |