473,386 Members | 1,609 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Unicode/UTF-8 decoding

Below are sometext I extracted from a mySQL database. How can I decode them
so that I can read them in Unicode?
Thanks

Bill

------------

Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh hồn thơ Việt trên sân ga Tokyo chiều cuối năm

Jun 5 '07 #1
6 3776

"Bill Nguyen" <bi**********@jaco.comwrote in message
news:AF**********************************@microsof t.com...
Below are sometext I extracted from a mySQL database. How can I decode
them so that I can read them in Unicode?

If you have VS for .Net 2003 or 2005, then you can go to Help/Index/Visual
Basic and
enter Unicode UTF-8 in the Search box. It will give you the whole section
with program examples of how to do UTF-8, UTF-16, etc, etc
encoding/decoding.
Jun 5 '07 #2
Bill Nguyen wrote:
Below are sometext I extracted from a mySQL database. How can I decode
them so that I can read them in Unicode?
Thanks

Bill

------------

Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh hồn thơ Việt trên sân ga Tokyo chiều cuối năm
This text looks as it has been decoded with a different encoding than
was used to encode it. It might be possible to recreate the data if you
know what encodings was used to encode and decode it. Then you might be
able to encode it back to it's prevois state and use the proper encoding
to decode it. There is a great risk that some data has been lost,
though, and that you can't recreate the original data from this stage.

If you want to store unicode strings in the MySQL database, it has to be
set up to use unicode as character set.

--
Göran Andersson
_____
http://www.guffa.com
Jun 5 '07 #3
I set UTF-8 as the default encoding in mySQL.
I don't really know how this work, but IE or Firefox browser can decode
easily.
This is the test:
I put the lines below in an HTML document and viewed it in IE, and it
worked. (make sure to set encoding to UTF-8 in VIEW).
I include the test.htm for your testing. (The text is in Vietnamese).
So I think what I need is to find a utility that has the same function that
might already be available out there. Any help is greatly appreciated.

Bill

----------------
<html>

<head></head>

<body>
Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh há»"n thÆ¡ Việt trên sân ga Tokyo chiều cuối năm

</body>

</html>

"Göran Andersson" <gu***@guffa.comwrote in message
news:%2****************@TK2MSFTNGP05.phx.gbl...
Bill Nguyen wrote:
>Below are sometext I extracted from a mySQL database. How can I decode
them so that I can read them in Unicode?
Thanks

Bill

------------

Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh hồn thơ Việt trên sân ga Tokyo chiều cuối năm

This text looks as it has been decoded with a different encoding than was
used to encode it. It might be possible to recreate the data if you know
what encodings was used to encode and decode it. Then you might be able to
encode it back to it's prevois state and use the proper encoding to decode
it. There is a great risk that some data has been lost, though, and that
you can't recreate the original data from this stage.

If you want to store unicode strings in the MySQL database, it has to be
set up to use unicode as character set.

--
Göran Andersson
_____
http://www.guffa.com
Jun 5 '07 #4
Bill Nguyen wrote:
I set UTF-8 as the default encoding in mySQL.
I don't really know how this work, but IE or Firefox browser can decode
easily.
This is the test:
I put the lines below in an HTML document and viewed it in IE, and it
worked. (make sure to set encoding to UTF-8 in VIEW).
I include the test.htm for your testing. (The text is in Vietnamese).
So I think what I need is to find a utility that has the same function
that might already be available out there. Any help is greatly appreciated.

Bill

----------------
<html>

<head></head>

<body>
>Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh há»"n thÆ¡ Việt trên sân ga Tokyo chiều cuối năm


</body>

</html>

"Göran Andersson" <gu***@guffa.comwrote in message
news:%2****************@TK2MSFTNGP05.phx.gbl...
>Bill Nguyen wrote:
>>Below are sometext I extracted from a mySQL database. How can I
decode them so that I can read them in Unicode?
Thanks

Bill

------------

Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh hồn thơ Việt trên sân ga Tokyo chiều cuối năm

This text looks as it has been decoded with a different encoding than
was used to encode it. It might be possible to recreate the data if
you know what encodings was used to encode and decode it. Then you
might be able to encode it back to it's prevois state and use the
proper encoding to decode it. There is a great risk that some data has
been lost, though, and that you can't recreate the original data from
this stage.

If you want to store unicode strings in the MySQL database, it has to
be set up to use unicode as character set.

--
Göran Andersson
_____
http://www.guffa.com

------------------------------------------------------------------------
Virginia Hamilton Adair / Lâm Thị Mỹ Dạ Lấp lánh hồn thơ Việt trên
sân ga Tokyo chiều cuối năm
You are doing exactly what I was talking about. If you read the data
using the wrong encoding, then save it using the same encoding, you can
then open it using the corrent encoding, provided that the process
hasn't removed any data.

If you have set up your MySQL database to use unicode, and still get the
string out in that manner, the error is before you even saved the string
in the database in the first place. What you have done is basically:

unicode -bytes -wrong encoding -MySQL -wrong encoding -html ->
bytes -browser -unicode

While this gives the correct result for some strings, some byte codes
used in UTF-8 doesn't represent a single character by themselves, so if
you contine to store mis-decoded strings as unicode, you will sooner or
later experience corrupted strings.

--
Göran Andersson
_____
http://www.guffa.com
Jun 5 '07 #5
Gran ;

I think you are correct. However, not much I can do since I can not change
the host server parameters.
I am using SQLyog to access mySQL remotely. What I need is to be able to
read the data in its correct format/encoding scheme. Is it possible with
..NET ?

Thanks

Bill

"Gran Andersson" <gu***@guffa.comwrote in message
news:eW**************@TK2MSFTNGP04.phx.gbl...
Bill Nguyen wrote:
>I set UTF-8 as the default encoding in mySQL.
I don't really know how this work, but IE or Firefox browser can decode
easily.
This is the test:
I put the lines below in an HTML document and viewed it in IE, and it
worked. (make sure to set encoding to UTF-8 in VIEW).
I include the test.htm for your testing. (The text is in Vietnamese).
So I think what I need is to find a utility that has the same function
that might already be available out there. Any help is greatly
appreciated.

Bill

----------------
<html>

<head></head>

<body>
>>Virginia Hamilton Adair / Lâm Th< Mỹ Dạ
Lấp lánh h"n thơ Vi?t trên sân ga Tokyo chiều cu'i nfm


</body>

</html>

"Gran Andersson" <gu***@guffa.comwrote in message
news:%2****************@TK2MSFTNGP05.phx.gbl...
>>Bill Nguyen wrote:
Below are sometext I extracted from a mySQL database. How can I decode
them so that I can read them in Unicode?
Thanks

Bill

------------

Virginia Hamilton Adair / Lâm Th< Mỹ Dạ
Lấp lánh h"n thơ Vi?t trên sân ga Tokyo chiều cu'i nfm

This text looks as it has been decoded with a different encoding than
was used to encode it. It might be possible to recreate the data if you
know what encodings was used to encode and decode it. Then you might be
able to encode it back to it's prevois state and use the proper encoding
to decode it. There is a great risk that some data has been lost,
though, and that you can't recreate the original data from this stage.

If you want to store unicode strings in the MySQL database, it has to be
set up to use unicode as character set.

--
Gran Andersson
_____
http://www.guffa.com

------------------------------------------------------------------------
> Virginia Hamilton Adair / Lm Th? M? D? L?p lnh h?n tho Vi?t trn
sn ga Tokyo chi?u cu?i nam

You are doing exactly what I was talking about. If you read the data using
the wrong encoding, then save it using the same encoding, you can then
open it using the corrent encoding, provided that the process hasn't
removed any data.

If you have set up your MySQL database to use unicode, and still get the
string out in that manner, the error is before you even saved the string
in the database in the first place. What you have done is basically:

unicode -bytes -wrong encoding -MySQL -wrong encoding -html ->
bytes -browser -unicode

While this gives the correct result for some strings, some byte codes used
in UTF-8 doesn't represent a single character by themselves, so if you
contine to store mis-decoded strings as unicode, you will sooner or later
experience corrupted strings.

--
Gran Andersson
_____
http://www.guffa.com

Jun 5 '07 #6
Bill Nguyen wrote:
Gran ;

I think you are correct. However, not much I can do since I can not change
the host server parameters.
I am using SQLyog to access mySQL remotely. What I need is to be able to
read the data in its correct format/encoding scheme. Is it possible with
.NET ?

Thanks

Bill
Yes, it's possible in .NET.

Strictly speaking you can't read it using the correct encoding, as it's
not stored using the correct encoding. You can only read it the same way
it's stored, then you have to reverse the process by encoding it using
the same wrong encoding and decoding it using the correct encoding.

As I said earlier, this will not work for all strings, so if you want a
system that works correctly, you have to change how the data is stored
in the database.

--
Gran Andersson
_____
http://www.guffa.com
Jun 6 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...
48
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at...
4
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
2
by: hezhenjie | last post by:
Hi, all: I just need to parse a unicode file, and assume to get data one line by one line. I use _wfopen(), fgetws(), wcslen(), wcsstr(), making it work normally on Windows platform. However,...
15
by: John Salerno | last post by:
Forgive my newbieness, but I don't quite understand why Unicode is still something that needs special treatment in Python (and perhaps elsewhere). I'm reading Dive Into Python right now, and it...
12
by: damjan | last post by:
This may look like a silly question to someone, but the more I try to understand Unicode the more lost I feel. To say that I am not a beginner C++ programmer, only had no need to delve into...
6
by: archana | last post by:
Hi all, can someone tell me difference between unicode and utf 8 or utf 18 and which one is supporting more character set. whic i should use to support character ucs-2. I want to use ucs-2...
1
by: willie | last post by:
>willie wrote: wrote:
18
by: Chameleon | last post by:
I am trying to #define this: #ifdef UNICODE_STRINGS #define UC16 L typedef wstring String; #else #define UC16 typedef string String; #endif ....
5
by: Thierry | last post by:
Hello fellow pythonists, I'm a relatively new python developer, and I try to adjust my understanding about "how things works" to python, but I have hit a block, that I cannot understand. I...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.