473,390 Members | 1,673 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,390 software developers and data experts.

difference /u and &#

Hi,

I receive an utf8 character from a database, like &#30000 (Japanese
Character, style: &#XXXXX).
How can I visualize the Japanese character on my application? I have found
the class System.Text.Encoding, but the input looks like \uXXXX. I don't
know how to do.

Thank you,

Bart
Nov 16 '05 #1
15 1814
Bart <ba**@bart.it> wrote:
I receive an utf8 character from a database, like &#30000 (Japanese
Character, style: &#XXXXX).
How can I visualize the Japanese character on my application? I have found
the class System.Text.Encoding, but the input looks like \uXXXX. I don't
know how to do.


I'm not entirely sure what you mean by "the input looks like \uXXXX".
Do you mean it's stored in the database as a string with "\uXXXX" in?
Are you *sure* about that, or is that just what the debugger is
showing? (Try writing it out to the console.)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #2

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
I receive an utf8 character from a database, like &#30000 (Japanese
Character, style: &#XXXXX).
How can I visualize the Japanese character on my application? I have found the class System.Text.Encoding, but the input looks like \uXXXX. I don't
know how to do.
I'm not entirely sure what you mean by "the input looks like \uXXXX".
Do you mean it's stored in the database as a string with "\uXXXX" in?
Are you *sure* about that, or is that just what the debugger is
showing? (Try writing it out to the console.)


I have looked this example at MSDN Library:

UTF8Encoding utf8 = new UTF8Encoding();
UTF8Encoding utf8ThrowException = new UTF8Encoding(false, true);

// This array contains two high surrogates in a row (\uD801,
\uD802).
// A high surrogate should be followed by a low surrogate.
Char[] chars = new Char[] {'a', 'b', 'c', '\uD801', '\uD802', 'd'};

It means that I have to write the strings as \uXXXX, but in my database the
file are stored (utf8) as &#XXXXX. I don't understand why in the example an
utf8 character has that format and in my database a different one even if
are both utf8 encoded.


--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #3
Bart <ba**@bart.it> wrote:
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
I receive an utf8 character from a database, like &#30000 (Japanese
Character, style: &#XXXXX).
How can I visualize the Japanese character on my application? I have found the class System.Text.Encoding, but the input looks like \uXXXX. I don't
know how to do.


I'm not entirely sure what you mean by "the input looks like \uXXXX".
Do you mean it's stored in the database as a string with "\uXXXX" in?
Are you *sure* about that, or is that just what the debugger is
showing? (Try writing it out to the console.)


I have looked this example at MSDN Library:

UTF8Encoding utf8 = new UTF8Encoding();
UTF8Encoding utf8ThrowException = new UTF8Encoding(false, true);

// This array contains two high surrogates in a row (\uD801,
\uD802).
// A high surrogate should be followed by a low surrogate.
Char[] chars = new Char[] {'a', 'b', 'c', '\uD801', '\uD802', 'd'};

It means that I have to write the strings as \uXXXX, but in my database the
file are stored (utf8) as &#XXXXX. I don't understand why in the example an
utf8 character has that format and in my database a different one even if
are both utf8 encoded.


I'm not sure what you mean by "stored (utf8) as &#XXXXX". Do you mean
that the actual characters '&' '#' etc are in the database - or is it
just that that's how you see non-ASCII characters when displaying them
in (say) a SQL query execution environment?

The reason Unicode characters above 0xffff have to be stored in
surrogate form in .NET is that .NET uses UTF-16 internally, effectively
- each character is 16 bits, which isn't enough to cover the whole of
Unicode.

When you write a string in a C# program, however, you *can* use
\UXXXXXXXX instead (note the capital U). Only values up to 0x10ffff are
supported, so the first two Xs will always be 0.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #4
At first, I wuold like to thank you for answers.
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
It means that I have to write the strings as \uXXXX, but in my database the file are stored (utf8) as &#XXXXX. I don't understand why in the example an utf8 character has that format and in my database a different one even if are both utf8 encoded.


I'm not sure what you mean by "stored (utf8) as &#XXXXX". Do you mean
that the actual characters '&' '#' etc are in the database - or is it
just that that's how you see non-ASCII characters when displaying them
in (say) a SQL query execution environment?


I mean that the '&' '#' character are in database and if I make a query I
receive, in C#, a character as &#XXXXX.
I will make an example:
I have a form that make a query on a database written in Japanese with utf8
encoding: the result of the query on my form is:

差よ

and I don't know which method I have to call to make the conversion.
Nov 16 '05 #5
Bart <ba**@bart.it> wrote:
I'm not sure what you mean by "stored (utf8) as &#XXXXX". Do you mean
that the actual characters '&' '#' etc are in the database - or is it
just that that's how you see non-ASCII characters when displaying them
in (say) a SQL query execution environment?
I mean that the '&' '#' character are in database and if I make a query I
receive, in C#, a character as &#XXXXX.


Right - I think.
I will make an example:
I have a form that make a query on a database written in Japanese with utf8
encoding: the result of the query on my form is:

差よ

and I don't know which method I have to call to make the conversion.


When you say "on your form" - do you mean as output on a web page? If
so, that's introducing another level of encoding - XML encoding. What
happens if you write a simple console application to search the
database and write the results to a console? Do you still get 差?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #6

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote: When you say "on your form" - do you mean as output on a web page? If
so, that's introducing another level of encoding - XML encoding. What
happens if you write a simple console application to search the
database and write the results to a console? Do you still get 差?


it is a simple windows form. the result of the query appears on a label.
I have not tried a console application yet.
Nov 16 '05 #7
Bart <ba**@bart.it> wrote:
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:

When you say "on your form" - do you mean as output on a web page? If
so, that's introducing another level of encoding - XML encoding. What
happens if you write a simple console application to search the
database and write the results to a console? Do you still get 差?


it is a simple windows form. the result of the query appears on a label.
I have not tried a console application yet.


Right, if it's a windows form, that's fine. I'm intrigued as to why the
database returns it that way. How did the data get in there in the
first place? It should be returning the data properly in UTF-8 rather
than using an XML-type encoding.

How sure are you that the problem isn't just that the program which
*submitted* the data to the database was XML-encoding it?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #8

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:

When you say "on your form" - do you mean as output on a web page? If
so, that's introducing another level of encoding - XML encoding. What
happens if you write a simple console application to search the
database and write the results to a console? Do you still get
差?
it is a simple windows form. the result of the query appears on a label.
I have not tried a console application yet.


Right, if it's a windows form, that's fine. I'm intrigued as to why the
database returns it that way. How did the data get in there in the
first place? It should be returning the data properly in UTF-8 rather
than using an XML-type encoding.

How sure are you that the problem isn't just that the program which
*submitted* the data to the database was XML-encoding it?


Actually I don't know how the data were stored in the database. Perhaps they
used web interface PHPMyAdmin (according to their style).
Nov 16 '05 #9
Bart <ba**@bart.it> wrote:
How sure are you that the problem isn't just that the program which
*submitted* the data to the database was XML-encoding it?


Actually I don't know how the data were stored in the database. Perhaps they
used web interface PHPMyAdmin (according to their style).


Hmm.

What happens if you try to insert data into the database yourself? What
database is it, anyway (SQL Server, Oracle etc)?

My guess is that you'll find that whatever you put into the database,
you get the same thing out - if you put the string "&#30000" in, you'll
get that out, but if you put in a string actually containing U+030000
(the unicode character 0x30000) you'll get that out, in which case the
data in the database is effectively corrupt to some extent. Depending
on your project, you may want to write a tool to "clean" the database,
and then write your main project without worrying about it.

Out of interest, what happens to '&' signs in the database? Do they
come out as &amp; ?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #10

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
How sure are you that the problem isn't just that the program which
*submitted* the data to the database was XML-encoding it?


Actually I don't know how the data were stored in the database. Perhaps they used web interface PHPMyAdmin (according to their style).


Hmm.

What happens if you try to insert data into the database yourself? What
database is it, anyway (SQL Server, Oracle etc)?

My guess is that you'll find that whatever you put into the database,
you get the same thing out - if you put the string "&#30000" in, you'll
get that out, but if you put in a string actually containing U+030000
(the unicode character 0x30000) you'll get that out, in which case the
data in the database is effectively corrupt to some extent. Depending
on your project, you may want to write a tool to "clean" the database,
and then write your main project without worrying about it.

Out of interest, what happens to '&' signs in the database? Do they
come out as &amp; ?


I tried to access to database from web by using phpMyAdmin. The situation is
quite strange, because if I access with Japanese interface (SJIS) I can look
some names written in Japanese and some combinations as
よしだ
I'm getting quite confuse!!!!!! at last I don't know how I can make query,
utf or sjis..... I have to check it better.......
Nov 16 '05 #11
Bart <ba**@bart.it> wrote:
My guess is that you'll find that whatever you put into the database,
you get the same thing out - if you put the string "&#30000" in, you'll
get that out, but if you put in a string actually containing U+030000
(the unicode character 0x30000) you'll get that out, in which case the
data in the database is effectively corrupt to some extent. Depending
on your project, you may want to write a tool to "clean" the database,
and then write your main project without worrying about it.

Out of interest, what happens to '&' signs in the database? Do they
come out as &amp; ?


I tried to access to database from web by using phpMyAdmin. The situation is
quite strange, because if I access with Japanese interface (SJIS) I can look
some names written in Japanese and some combinations as
よしだ
I'm getting quite confuse!!!!!! at last I don't know how I can make query,
utf or sjis..... I have to check it better.......


Ignore the web interface for the moment - the first thing you need to
understand is what the database itself is doing. Now, what happens if
you try to insert a new record with Japanese characters in (not using
&#xxxxx at all)?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #12

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
Bart <ba**@bart.it> wrote:
My guess is that you'll find that whatever you put into the database,
you get the same thing out - if you put the string "&#30000" in, you'll get that out, but if you put in a string actually containing U+030000
(the unicode character 0x30000) you'll get that out, in which case the
data in the database is effectively corrupt to some extent. Depending
on your project, you may want to write a tool to "clean" the database,
and then write your main project without worrying about it.

Out of interest, what happens to '&' signs in the database? Do they
come out as &amp; ?


I tried to access to database from web by using phpMyAdmin. The situation is quite strange, because if I access with Japanese interface (SJIS) I can look some names written in Japanese and some combinations as
よしだ
I'm getting quite confuse!!!!!! at last I don't know how I can make query, utf or sjis..... I have to check it better.......


Ignore the web interface for the moment - the first thing you need to
understand is what the database itself is doing. Now, what happens if
you try to insert a new record with Japanese characters in (not using
&#xxxxx at all)?


I tried to insert a japanese name (not with web interface) into a record and
I tried to visualize on the web and with windows form:
for example, I tried to insert ?????as name from a windows form that makes
an "insert" in the database.

the windows form that visualize the select query shows me "????"

the web interface shows me the same, "????".

Now, I think there is some problems with the exchange of the data. I tried
to have the code of the webservice in php:

function INSERT ($mystring, $name,$password)
{
//make a connection to DBMS
$connection = mysql_connect("localhost", $name,$password);

// select a database
mysql_select_db('DATABASE, $connection);

//Query to insert
$doquery=mysql_query("INSERT INTO Name VALUES ( , '$mystring') " )or
die("");

//close connection
mysql_close($connection);
}

in C# I simply add a web reference to database and on a windows form I have
a textbox to write the name.

Nov 16 '05 #13
Bart <ba**@bart.it> wrote:
Ignore the web interface for the moment - the first thing you need to
understand is what the database itself is doing. Now, what happens if
you try to insert a new record with Japanese characters in (not using
&#xxxxx at all)?
I tried to insert a japanese name (not with web interface) into a record and
I tried to visualize on the web and with windows form:
for example, I tried to insert ?????as name from a windows form that makes
an "insert" in the database.

the windows form that visualize the select query shows me "????"

the web interface shows me the same, "????".


Were those all meant to be question marks, or were some of them meant
to be characters?
Now, I think there is some problems with the exchange of the data. I tried
to have the code of the webservice in php:

function INSERT ($mystring, $name,$password)
{
//make a connection to DBMS
$connection = mysql_connect("localhost", $name,$password);

// select a database
mysql_select_db('DATABASE, $connection);

//Query to insert
$doquery=mysql_query("INSERT INTO Name VALUES ( , '$mystring') " )or
die("");

//close connection
mysql_close($connection);
}


Ah, it's MySQL. That could well make a difference to things. For one
thing, according to
http://dev.mysql.com/doc/mysql/en/Charset-Unicode.html
MySql doesn't cope with UTF-8 values which take more than three bytes -
in other words, Unicode values > 0xffff. However, it also implies that
you don't *need* Unicode values > 0xffff. Unless you know otherwise, I
suggest we assume that surrogates aren't part of the problem for the
moment.

Now, how is the database set up? Are you connecting to it with an
appropriate connection string? Either of those could be the problem -
or it could be a bug in the MySQL .NET provider you're using.

I suggest you try to isolate the problem: come up with a simple clean
database (with a single table with a single column), then a short but
complete program which demonstrates the problem.

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

Once you've got that, we should be able to work out either how to fix
things or who to complain to :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #14
> See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
I think it is the best way :-))))))

Once you've got that, we should be able to work out either how to fix
things or who to complain to :)


Thank you very much, I will continue and let you know asap.

Bart
Nov 16 '05 #15
Bart <ba**@bart.it> wrote:
See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.


I think it is the best way :-))))))
Once you've got that, we should be able to work out either how to fix
things or who to complain to :)


Thank you very much, I will continue and let you know asap.


Righto. If you let me know which version of MySQL you've got, and which
provider you're using, I'll try to get them installed here so I can run
your code too.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 16 '05 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

13
by: Mattias Campe | last post by:
Hi, Depending on if I get an image or a text of a certain URL, I want to do something different. I don't know in advance whether I'll get an image or a text. This is a URL that returns an...
1
by: learning_C++ | last post by:
Hi, I compiled some code. In the function friend ostream& operator<<(ostream& os, const complex c); I use the later argument complex c and complex& c. I can get the same values and there is no...
4
by: Joel | last post by:
Run this method: public void test() { bool b; int i=0; b=false; i=0; b=(b && i++==1);
12
by: Tee | last post by:
String Builder & String, what's the difference. and when to use which ? Thanks.
12
by: Nathan Sokalski | last post by:
What is the difference between the Page_Init and Page_Load events? When I was debugging my code, they both seemed to get triggered on every postback. I am assuming that there is some difference,...
9
by: Anoj | last post by:
Hi All, is there any performance difference between + and & operator while concating string litrels. which one is better and why?? Thanx
4
by: VIKAS17786 | last post by:
COULD YOU EXPLAIN ME WHAT IS DIFFERENCE BETWEEN ORACLE 8i AND ORACLE 9i ? ........................... ALSO WHAT IS DIFFERENCE BETWEEN DBMS AND RDBMS ? ..... I...
3
by: PicO | last post by:
i need some explanation about the difference between priority queue & set & heap ... as they all sort the data in ( n log n ) ... but the only i see that priority queue only can pop the top (...
1
by: sandeep kumar shah | last post by:
Hi friends, I have a problem while parsing an xml file. When the value of an attribute contains �D; it gives error but when we use &#xD; it's ok.. Could anyone plz tell me what is the difference...
2
by: qwedster | last post by:
Folks! What is the difference between PostBack and Callback ( !IsPostBack and if(!IsCallback)) Like in the following Code Snippet: protected void Page_Load(object sender, EventArgs...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.