Convert Encoding from Shift-JIS to UTF-8

DbNetLink

I am trying to convert some Japanese text encoded as Shift-JIS/ISO-2022-JP
to UTF-8 so I can store all data in my database with a common encoding.

My problem is the encoding conversion code works for Japanese characters
encoded as "iso-2022-jp" but does not for "shift-jis"

What looked straight forward is proving less so, my test code looks like
this:

<%@ Page Language="C#"%>

<script language="C#" runat="server">
////////////////////////////////////////////////////////////////////////////
/////////////////////////////
public void Page_Load()
////////////////////////////////////////////////////////////////////////////
/////////////////////////////
{
string S = Request.Form["text"];

Encoding SourceEncoding = Encoding.GetEncoding( "shift-jis" );
Encoding TargetEncoding = Encoding.UTF8;

Response.Write( SourceEncoding.GetString( TargetEncoding.GetBytes( S ) ) );
}
</script>

Thanks in advance

Nov 16 '05 #1

Subscribe Post Reply

28155

Jon Skeet [C# MVP]

DbNetLink <robin@____dbnetlink.co.uk> wrote:

I am trying to convert some Japanese text encoded as Shift-JIS/ISO-2022-JP
to UTF-8 so I can store all data in my database with a common encoding.

There's something wrong here. The request value is a unicode string -
all strings are unicode in .NET. Any encoding has already been taken
into account. You should be able to just write the string to the
database without any change.

See http://www.pobox.com/~skeet/csharp/unicode.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #2

DbNetLink

Is that true even if the web page transmitting the form has had it's
encoding set to "shift-jis".

When you say Unicode I am assuming this means UTF-16 ?

Assuming that were true then I would therefore expect to be able to convert
the page like this

////////////////////////////////////////////////////////////////////////////
/////////////////////////////
public void Page_Load()
////////////////////////////////////////////////////////////////////////////
/////////////////////////////
{

string S = Request.Form["text"];

Encoding SourceEncoding = Encoding.Unicode;
Encoding TargetEncoding = Encoding.UTF8;

Response.Write( SourceEncoding.GetString( TargetEncoding.GetBytes( S ) ) );
}

But this does not appear to work as I would expect either.

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

DbNetLink <robin@____dbnetlink.co.uk> wrote:
I am trying to convert some Japanese text encoded as Shift-JIS/ISO-2022-JP to UTF-8 so I can store all data in my database with a common encoding.

There's something wrong here. The request value is a unicode string -
all strings are unicode in .NET. Any encoding has already been taken
into account. You should be able to just write the string to the
database without any change.

See http://www.pobox.com/~skeet/csharp/unicode.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #3

Jon Skeet [C# MVP]

DbNetLink <robin@____dbnetlink.co.uk> wrote:

Is that true even if the web page transmitting the form has had it's
encoding set to "shift-jis".

When you say Unicode I am assuming this means UTF-16 ?
Yes.
Assuming that were true then I would therefore expect to be able to convert
the page like this

////////////////////////////////////////////////////////////////////////////
/////////////////////////////
public void Page_Load()
////////////////////////////////////////////////////////////////////////////
/////////////////////////////
{

string S = Request.Form["text"];

Encoding SourceEncoding = Encoding.Unicode;
Encoding TargetEncoding = Encoding.UTF8;

Response.Write( SourceEncoding.GetString( TargetEncoding.GetBytes( S ) ) );
}

But this does not appear to work as I would expect either.

No, that shouldn't work. That's trying to use the Unicode encoding of a
string as if it were a UTF-8 encoding of a string.

If you want the UTF-8 encoded bytes, just use Encoding.UTF8.GetBytes(S)

Did you read the page I linked to?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #4

DbNetLink

>> If you want the UTF-8 encoded bytes, just use Encoding.UTF8.GetBytes(S)

Is that not what I am doing in the line:

Response.Write( SourceEncoding.GetString( TargetEncoding.GetBytes(
S ) ) );

Given the earlier line:

Encoding TargetEncoding = Encoding.UTF8;

I did read the link but was unable to relate it directly to my problem of
converting one encoding to another using .Net.

If it is simply down to an error in my code perhaps you could point it out
as I have already spent 2 days on trying to understand what I am doing wrong
and would love to be put out of my misery :(
Thanks for your help BTW

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...

DbNetLink <robin@____dbnetlink.co.uk> wrote:
Is that true even if the web page transmitting the form has had it's
encoding set to "shift-jis".

When you say Unicode I am assuming this means UTF-16 ?

Yes.
Assuming that were true then I would therefore expect to be able to convert the page like this

//////////////////////////////////////////////////////////////////////////// /////////////////////////////
public void Page_Load()
//////////////////////////////////////////////////////////////////////////// /////////////////////////////
{

string S = Request.Form["text"];

Encoding SourceEncoding = Encoding.Unicode;
Encoding TargetEncoding = Encoding.UTF8;

Response.Write( SourceEncoding.GetString( TargetEncoding.GetBytes( S ) ) ); }

But this does not appear to work as I would expect either.

No, that shouldn't work. That's trying to use the Unicode encoding of a
string as if it were a UTF-8 encoding of a string.

If you want the UTF-8 encoded bytes, just use Encoding.UTF8.GetBytes(S)

Did you read the page I linked to?

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #5

Jon Skeet [C# MVP]

DbNetLink <robin@____dbnetlink.co.uk> wrote:

If you want the UTF-8 encoded bytes, just use Encoding.UTF8.GetBytes(S)

Is that not what I am doing in the line:

Response.Write( SourceEncoding.GetString( TargetEncoding.GetBytes(
S ) ) );

No. You're converting the string into UTF-8, but then using the result
as if it were a valid shift-jis-encoded byte array.
Given the earlier line:

Encoding TargetEncoding = Encoding.UTF8;

I did read the link but was unable to relate it directly to my problem of
converting one encoding to another using .Net.
It gives the fundamentals, which should explain why the line of code at
the top is a really bad idea.
If it is simply down to an error in my code perhaps you could point it out
as I have already spent 2 days on trying to understand what I am doing wrong
and would love to be put out of my misery :(

You should just be able to use the string, without venturing into
encodings at all.

If that's not working, you need to work through it step by step - see
http://www.pobox.com/~skeet/csharp/d...ngunicode.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 16 '05 #6

Similar topics

line shift? (baby steps 2)

by: Artemisio | last post by:

I have done a small currency calculator. It works and I'm very glad. But...I'd like to have a line shift if user types a wrong choice. Please, look at the code and output example down here: #...

Python

Encoding/Codepage: Can't Get There From Here

by: Christopher H. Laco | last post by:

Long story longer. I need to get web user input into a backend system that a) only grocks single byte encoding, b) expectes the data transer to be 1 bytes = 1 character, and c) uses the HP Roman-6...

.NET Framework

Read UTF8 (mixed byte) file & convert to Unicode

by: hunterb | last post by:

I have a file which has no BOM and contains mostly single byte chars. There are numerous double byte chars (Japanese) which appear throughout. I need to take the resulting Unicode and store it in a...

.NET Framework

Shift sequence

by: Vijay Kumar R. Zanvar | last post by:

I do not know much about shift sequence(7.1.1#5). Can somebody enlighten me, giving some examples? Regards, Vijay Kumar R. Zanvar -- Calvin: Hi Mom! I'm making my own newspaper to report...

C / C++

How to convert string charset?

by: Miros³aw Iwanowski | last post by:

Hello! I need to conver string (sql statement) from one Polish charset standard (ISO-8859-2) to another (Win-1250). Using help I managed to create line like this one:...

C# / C Sharp

convert letters

by: Trond Hoiberg | last post by:

I was wondering if someone in here knows if it is possible to convert a letter (a, b, c....) to the ISO Latin-1 Character Set Decimal code? a= a b=b c=c I know it is possible but i was looking...

C# / C Sharp

Bit Encoding design

by: | last post by:

I am woking on a base64 encoder and I am looking for some design help. I have a woking model but would like some input on the design. I currently read 3 bytes from a binary stream with each byte...

Visual Basic .NET

.NET: Encoding.Convert(...Encoding.Unicode...)

by: Ma³y Piotruœ | last post by:

Hello, Could you help me please with encoding transformations in .NET? I am beginner. I have some code that nearly works - but I have problem with converting from fileEncoding to Unicode (Strings...

.NET Framework

SendKeys in ProcessCmdKey: change Shift-Space to just a Space

by: John Richardson | last post by:

I'm trying to override the SHIFT-SPACE "negative feature" in the Winforms datagrid, to only be a space. The following link describes this:...

C# / C Sharp

encoding and decoding messages

by: Sadie | last post by:

please help me with the java codes for this problem i tried to do this program a week ago but even now i dont have an idea of how to go about with it. please help me it is urgent Cryptography ...

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA