467,883 Members | 1,269 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 467,883 developers. It's quick & easy.

Convert to big5 to unicode

GM
Dear all,

Could you all give me some guide on how to convert my big5 string to
unicode using python? I already knew that I might use cjkcodecs or
python 2.4 but I still don't have idea on what exactly I should do.
Please give me some sample code if you could. Thanks a lot

Regards,

Gary

Sep 7 '06 #1
  • viewed: 12323
Share:
3 Replies
Install the codecs. In Debain, you can do :
apt-get install python-cjkcodecs

Then, it is easy to encode ( I use 'gb2312' ) :

str = '我们'
u = unicode(str,'gb2312')

The convertion is done and you can get the string of UTF-8:
str_utf8 = u.encode("utf-8")

You can get the original string:
str_gb = u.encode("gb2312")
GM 写道:
Dear all,

Could you all give me some guide on how to convert my big5 string to
unicode using python? I already knew that I might use cjkcodecs or
python 2.4 but I still don't have idea on what exactly I should do.
Please give me some sample code if you could. Thanks a lot

Regards,

Gary
Sep 7 '06 #2
xiejw topposted:
Install the codecs. In Debain, you can do :
apt-get install python-cjkcodecs
With Windows & 2.4, no extra installation step is required.

| Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)]
on win32
| >>bc = '\xb1i'
| >>unicode(bc, 'big5')
| u'\u5f35'
| >>>

HTH,
John
>
Then, it is easy to encode ( I use 'gb2312' ) :

str = '我们'
u = unicode(str,'gb2312')

The convertion is done and you can get the string of UTF-8:
str_utf8 = u.encode("utf-8")

You can get the original string:
str_gb = u.encode("gb2312")
GM 写道:
Dear all,

Could you all give me some guide on how to convert my big5 string to
unicode using python? I already knew that I might use cjkcodecs or
python 2.4 but I still don't have idea on what exactly I should do.
Please give me some sample code if you could. Thanks a lot

Regards,

Gary
Sep 7 '06 #3
On 7 Sep 2006 01:27:55 -0700, "GM" <ga*******@gmail.comwrote:
>Could you all give me some guide on how to convert my big5 string to
unicode using python? I already knew that I might use cjkcodecs or
python 2.4 but I still don't have idea on what exactly I should do.
Please give me some sample code if you could. Thanks a lot
Gary, I used this Java program quite a few years ago to convert
various Big5 files to UTF-16. (Sorry it's Java not Python, but I'm a
very recent convert to the latter.) My newsgroup reader has messed the
formatting up somewhat. If this causes a problem, email me and I'll
send you the source directly.

-Richard Schulman

/* This program converts an input file of one encoding format to
an output file of
* another format. It will be mainly used to convert Big5 text
files to Unicode text files.
*/

import java.io.*;
public class ConvertEncoding
{ public static void main(String[] args)
{ String outfile = null;
try
{ convert(args[0], args[1], "BIG5",
"UTF-16LE");
}
// Or, at command line:
// convert(args[0], args[1], "GB2312",
"UTF8");
// or numerous variations thereon. Among possible
choices for input or output:
// "GB2312", "BIG5", "UTF8", "UTF-16LE".
The last named is MS UCS-2 format.
// I.e., "input file","output file",
"input encoding", "output encoding"
catch (Exception e)
{ System.out.print(e.getMessage());
System.exit(1);
}
}

public static void convert(String infile, String outfile,
String from, String to)
throws IOException, UnsupportedEncodingException
{ // set up byte streams
InputStream in;
if (infile != null)
in = new FileInputStream(infile);
else
in = System.in;

OutputStream out;
if (outfile != null)
out = new FileOutputStream(outfile);
else
out = System.out;

// Set up character stream
Reader r = new BufferedReader(new
InputStreamReader(in, from));
Writer w = new BufferedWriter(new
OutputStreamWriter(out, to));

w.write("\ufeff"); // This character signals
Unicode in the NT environment
char[] buffer = new char[4096];
int len;
while((len = r.read(buffer)) != -1)
w.write(buffer, 0, len);
r.close();
w.flush();
w.close();
}
}
Sep 7 '06 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by enrique | last post: by
2 posts views Thread by Mullin Yu | last post: by
5 posts views Thread by [Yosi] | last post: by
2 posts views Thread by Joebloggs | last post: by
10 posts views Thread by Nikolay Petrov | last post: by
reply views Thread by MrMoon | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.