469,609 Members | 1,666 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,609 developers. It's quick & easy.

difference /u and &#

Hi,

I receive an utf8 character from a database, like &#30000 (Japanese
Character, style: &#XXXXX).
How can I visualize the Japanese character on my application? I have found
the class System.Text.Encoding, but the input looks like \uXXXX. I don't
know how to do.

Thank you,

Bart
Nov 16 '05 #1
15 1585
Bart <ba**@bart.it> wrote:
I receive an utf8 character from a database, like &#30000 (Japanese
Character, style: &#XXXXX).
How can I visualize the Japanese character on my application? I have found
the class System.Text.Encoding, but the input looks like \uXXXX. I don't
know how to do.


I'm not entirely sure what you mean by "the input looks like \uXXXX".
Do you mean it's stored in the database as a string with "\uXXXX" in?
Are you *sure* about that, or is that just what the debugger is
showing? (Try writing it out to the console.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #2

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
I receive an utf8 character from a database, like &#30000 (Japanese
Character, style: &#XXXXX).
How can I visualize the Japanese character on my application? I have found the class System.Text.Encoding, but the input looks like \uXXXX. I don't
know how to do.
I'm not entirely sure what you mean by "the input looks like \uXXXX".
Do you mean it's stored in the database as a string with "\uXXXX" in?
Are you *sure* about that, or is that just what the debugger is
showing? (Try writing it out to the console.)


I have looked this example at MSDN Library:

UTF8Encoding utf8 = new UTF8Encoding();
UTF8Encoding utf8ThrowException = new UTF8Encoding(false, true);

// This array contains two high surrogates in a row (\uD801,
\uD802).
// A high surrogate should be followed by a low surrogate.
Char[] chars = new Char[] {'a', 'b', 'c', '\uD801', '\uD802', 'd'};

It means that I have to write the strings as \uXXXX, but in my database the
file are stored (utf8) as &#XXXXX. I don't understand why in the example an
utf8 character has that format and in my database a different one even if
are both utf8 encoded.


--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #3
Bart <ba**@bart.it> wrote:
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
I receive an utf8 character from a database, like &#30000 (Japanese
Character, style: &#XXXXX).
How can I visualize the Japanese character on my application? I have found the class System.Text.Encoding, but the input looks like \uXXXX. I don't
know how to do.


I'm not entirely sure what you mean by "the input looks like \uXXXX".
Do you mean it's stored in the database as a string with "\uXXXX" in?
Are you *sure* about that, or is that just what the debugger is
showing? (Try writing it out to the console.)


I have looked this example at MSDN Library:

UTF8Encoding utf8 = new UTF8Encoding();
UTF8Encoding utf8ThrowException = new UTF8Encoding(false, true);

// This array contains two high surrogates in a row (\uD801,
\uD802).
// A high surrogate should be followed by a low surrogate.
Char[] chars = new Char[] {'a', 'b', 'c', '\uD801', '\uD802', 'd'};

It means that I have to write the strings as \uXXXX, but in my database the
file are stored (utf8) as &#XXXXX. I don't understand why in the example an
utf8 character has that format and in my database a different one even if
are both utf8 encoded.


I'm not sure what you mean by "stored (utf8) as &#XXXXX". Do you mean
that the actual characters '&' '#' etc are in the database - or is it
just that that's how you see non-ASCII characters when displaying them
in (say) a SQL query execution environment?

The reason Unicode characters above 0xffff have to be stored in
surrogate form in .NET is that .NET uses UTF-16 internally, effectively
- each character is 16 bits, which isn't enough to cover the whole of
Unicode.

When you write a string in a C# program, however, you *can* use
\UXXXXXXXX instead (note the capital U). Only values up to 0x10ffff are
supported, so the first two Xs will always be 0.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4
At first, I wuold like to thank you for answers.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
It means that I have to write the strings as \uXXXX, but in my database the file are stored (utf8) as &#XXXXX. I don't understand why in the example an utf8 character has that format and in my database a different one even if are both utf8 encoded.


I'm not sure what you mean by "stored (utf8) as &#XXXXX". Do you mean
that the actual characters '&' '#' etc are in the database - or is it
just that that's how you see non-ASCII characters when displaying them
in (say) a SQL query execution environment?


I mean that the '&' '#' character are in database and if I make a query I
receive, in C#, a character as &#XXXXX.
I will make an example:
I have a form that make a query on a database written in Japanese with utf8
encoding: the result of the query on my form is:

差よ

and I don't know which method I have to call to make the conversion.
Nov 16 '05 #5
Bart <ba**@bart.it> wrote:
I'm not sure what you mean by "stored (utf8) as &#XXXXX". Do you mean
that the actual characters '&' '#' etc are in the database - or is it
just that that's how you see non-ASCII characters when displaying them
in (say) a SQL query execution environment?
I mean that the '&' '#' character are in database and if I make a query I
receive, in C#, a character as &#XXXXX.


Right - I think.
I will make an example:
I have a form that make a query on a database written in Japanese with utf8
encoding: the result of the query on my form is:

差よ

and I don't know which method I have to call to make the conversion.


When you say "on your form" - do you mean as output on a web page? If
so, that's introducing another level of encoding - XML encoding. What
happens if you write a simple console application to search the
database and write the results to a console? Do you still get 差?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #6

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote: When you say "on your form" - do you mean as output on a web page? If
so, that's introducing another level of encoding - XML encoding. What
happens if you write a simple console application to search the
database and write the results to a console? Do you still get 差?


it is a simple windows form. the result of the query appears on a label.
I have not tried a console application yet.
Nov 16 '05 #7
Bart <ba**@bart.it> wrote:
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:

When you say "on your form" - do you mean as output on a web page? If
so, that's introducing another level of encoding - XML encoding. What
happens if you write a simple console application to search the
database and write the results to a console? Do you still get 差?


it is a simple windows form. the result of the query appears on a label.
I have not tried a console application yet.


Right, if it's a windows form, that's fine. I'm intrigued as to why the
database returns it that way. How did the data get in there in the
first place? It should be returning the data properly in UTF-8 rather
than using an XML-type encoding.

How sure are you that the problem isn't just that the program which
*submitted* the data to the database was XML-encoding it?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #8

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:

When you say "on your form" - do you mean as output on a web page? If
so, that's introducing another level of encoding - XML encoding. What
happens if you write a simple console application to search the
database and write the results to a console? Do you still get
差?
it is a simple windows form. the result of the query appears on a label.
I have not tried a console application yet.


Right, if it's a windows form, that's fine. I'm intrigued as to why the
database returns it that way. How did the data get in there in the
first place? It should be returning the data properly in UTF-8 rather
than using an XML-type encoding.

How sure are you that the problem isn't just that the program which
*submitted* the data to the database was XML-encoding it?


Actually I don't know how the data were stored in the database. Perhaps they
used web interface PHPMyAdmin (according to their style).
Nov 16 '05 #9
Bart <ba**@bart.it> wrote:
How sure are you that the problem isn't just that the program which
*submitted* the data to the database was XML-encoding it?


Actually I don't know how the data were stored in the database. Perhaps they
used web interface PHPMyAdmin (according to their style).


Hmm.

What happens if you try to insert data into the database yourself? What
database is it, anyway (SQL Server, Oracle etc)?

My guess is that you'll find that whatever you put into the database,
you get the same thing out - if you put the string "&#30000" in, you'll
get that out, but if you put in a string actually containing U+030000
(the unicode character 0x30000) you'll get that out, in which case the
data in the database is effectively corrupt to some extent. Depending
on your project, you may want to write a tool to "clean" the database,
and then write your main project without worrying about it.

Out of interest, what happens to '&' signs in the database? Do they
come out as &amp; ?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #10

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
How sure are you that the problem isn't just that the program which
*submitted* the data to the database was XML-encoding it?


Actually I don't know how the data were stored in the database. Perhaps they used web interface PHPMyAdmin (according to their style).


Hmm.

What happens if you try to insert data into the database yourself? What
database is it, anyway (SQL Server, Oracle etc)?

My guess is that you'll find that whatever you put into the database,
you get the same thing out - if you put the string "&#30000" in, you'll
get that out, but if you put in a string actually containing U+030000
(the unicode character 0x30000) you'll get that out, in which case the
data in the database is effectively corrupt to some extent. Depending
on your project, you may want to write a tool to "clean" the database,
and then write your main project without worrying about it.

Out of interest, what happens to '&' signs in the database? Do they
come out as &amp; ?


I tried to access to database from web by using phpMyAdmin. The situation is
quite strange, because if I access with Japanese interface (SJIS) I can look
some names written in Japanese and some combinations as
よしだ
I'm getting quite confuse!!!!!! at last I don't know how I can make query,
utf or sjis..... I have to check it better.......
Nov 16 '05 #11
Bart <ba**@bart.it> wrote:
My guess is that you'll find that whatever you put into the database,
you get the same thing out - if you put the string "&#30000" in, you'll
get that out, but if you put in a string actually containing U+030000
(the unicode character 0x30000) you'll get that out, in which case the
data in the database is effectively corrupt to some extent. Depending
on your project, you may want to write a tool to "clean" the database,
and then write your main project without worrying about it.

Out of interest, what happens to '&' signs in the database? Do they
come out as &amp; ?


I tried to access to database from web by using phpMyAdmin. The situation is
quite strange, because if I access with Japanese interface (SJIS) I can look
some names written in Japanese and some combinations as
よしだ
I'm getting quite confuse!!!!!! at last I don't know how I can make query,
utf or sjis..... I have to check it better.......


Ignore the web interface for the moment - the first thing you need to
understand is what the database itself is doing. Now, what happens if
you try to insert a new record with Japanese characters in (not using
&#xxxxx at all)?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #12

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
My guess is that you'll find that whatever you put into the database,
you get the same thing out - if you put the string "&#30000" in, you'll get that out, but if you put in a string actually containing U+030000
(the unicode character 0x30000) you'll get that out, in which case the
data in the database is effectively corrupt to some extent. Depending
on your project, you may want to write a tool to "clean" the database,
and then write your main project without worrying about it.

Out of interest, what happens to '&' signs in the database? Do they
come out as &amp; ?


I tried to access to database from web by using phpMyAdmin. The situation is quite strange, because if I access with Japanese interface (SJIS) I can look some names written in Japanese and some combinations as
よしだ
I'm getting quite confuse!!!!!! at last I don't know how I can make query, utf or sjis..... I have to check it better.......


Ignore the web interface for the moment - the first thing you need to
understand is what the database itself is doing. Now, what happens if
you try to insert a new record with Japanese characters in (not using
&#xxxxx at all)?


I tried to insert a japanese name (not with web interface) into a record and
I tried to visualize on the web and with windows form:
for example, I tried to insert ?????as name from a windows form that makes
an "insert" in the database.

the windows form that visualize the select query shows me "????"

the web interface shows me the same, "????".

Now, I think there is some problems with the exchange of the data. I tried
to have the code of the webservice in php:

function INSERT ($mystring, $name,$password)
{
//make a connection to DBMS
$connection = mysql_connect("localhost", $name,$password);

// select a database
mysql_select_db('DATABASE, $connection);

//Query to insert
$doquery=mysql_query("INSERT INTO Name VALUES ( , '$mystring') " )or
die("");

//close connection
mysql_close($connection);
}

in C# I simply add a web reference to database and on a windows form I have
a textbox to write the name.

Nov 16 '05 #13
Bart <ba**@bart.it> wrote:
Ignore the web interface for the moment - the first thing you need to
understand is what the database itself is doing. Now, what happens if
you try to insert a new record with Japanese characters in (not using
&#xxxxx at all)?
I tried to insert a japanese name (not with web interface) into a record and
I tried to visualize on the web and with windows form:
for example, I tried to insert ?????as name from a windows form that makes
an "insert" in the database.

the windows form that visualize the select query shows me "????"

the web interface shows me the same, "????".


Were those all meant to be question marks, or were some of them meant
to be characters?
Now, I think there is some problems with the exchange of the data. I tried
to have the code of the webservice in php:

function INSERT ($mystring, $name,$password)
{
//make a connection to DBMS
$connection = mysql_connect("localhost", $name,$password);

// select a database
mysql_select_db('DATABASE, $connection);

//Query to insert
$doquery=mysql_query("INSERT INTO Name VALUES ( , '$mystring') " )or
die("");

//close connection
mysql_close($connection);
}


Ah, it's MySQL. That could well make a difference to things. For one
thing, according to
http://dev.mysql.com/doc/mysql/en/Charset-Unicode.html
MySql doesn't cope with UTF-8 values which take more than three bytes -
in other words, Unicode values > 0xffff. However, it also implies that
you don't *need* Unicode values > 0xffff. Unless you know otherwise, I
suggest we assume that surrogates aren't part of the problem for the
moment.

Now, how is the database set up? Are you connecting to it with an
appropriate connection string? Either of those could be the problem -
or it could be a bug in the MySQL .NET provider you're using.

I suggest you try to isolate the problem: come up with a simple clean
database (with a single table with a single column), then a short but
complete program which demonstrates the problem.

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

Once you've got that, we should be able to work out either how to fix
things or who to complain to :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #14
> See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
I think it is the best way :-))))))

Once you've got that, we should be able to work out either how to fix
things or who to complain to :)


Thank you very much, I will continue and let you know asap.

Bart
Nov 16 '05 #15
Bart <ba**@bart.it> wrote:
See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.


I think it is the best way :-))))))
Once you've got that, we should be able to work out either how to fix
things or who to complain to :)


Thank you very much, I will continue and let you know asap.


Righto. If you let me know which version of MySQL you've got, and which
provider you're using, I'll try to get them installed here so I can run
your code too.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #16

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

13 posts views Thread by Mattias Campe | last post: by
1 post views Thread by learning_C++ | last post: by
4 posts views Thread by Joel | last post: by
12 posts views Thread by Nathan Sokalski | last post: by
reply views Thread by devrayhaan | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.