469,296 Members | 2,147 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,296 developers. It's quick & easy.

encoding latin1 to utf-8

hello ,
I make one function for encoding latin1 to utf-8. but i think it is
not work proper.
plz guide me.

it is not get proper result . such that i got "Belgi�" using this
method, (Belgium) :

import codecs
import sys
# Encoding / decoding functions
def encode(filename):
file = codecs.open(filename, encoding="latin-1")
data = file.read()
file = codecs.open(filename,"wb", encoding="utf-8")
file.write(data)

file_name=sys.argv[1]
encode(file_name)

Sep 10 '07 #1
6 8966
On Mon, Sep 10, 2007 at 12:25:46PM -0000, Harshad Modi wrote regarding encoding latin1 to utf-8:
Path: news.xs4all.nl!newsspool.news.xs4all.nl!transit.ne ws.xs4all.nl!newsgate.cistron.nl!xs4all!news.glorb .com!postnews.google.com!22g2000hsm.googlegroups.c om!not-for-mail

hello ,
I make one function for encoding latin1 to utf-8. but i think it is
not work proper.
plz guide me.

it is not get proper result . such that i got "Belgi???" using this
method, (Belgium) :

import codecs
import sys
# Encoding / decoding functions
def encode(filename):
file = codecs.open(filename, encoding="latin-1")
data = file.read()
file = codecs.open(filename,"wb", encoding="utf-8")
file.write(data)

file_name=sys.argv[1]
encode(file_name)
Some tips to help you out.

1. Close your filehandles when you're done with them.
2. Don't shadow builtin names. Python uses the name file, and binding it to your own function can have ugly side effects that manifest down the road.

So perhaps try the following:

import codecs

def encode(filename):
read_handle = codecs.open(filename, encoding='latin-1')
data = read_handle.read()
read_handle.close()
write_handle = codecs.open(filename, 'wb', encoding='utf-8')
write_handle.write(data)
write_handle.close()

For what it's worth though, I couldn't reproduce your problem with either your code or mine. This is not too surprising as all the ascii characters are encoded identically in utf-8 and latin-1. So your program should output exactly the same file as it reads, if the contents of the file just read "Belgium"

Cheers,
Cliff
Sep 10 '07 #2
thx for Reply ,
but I need some basic knowledge . how to encoding ? which algorithm
use for that . bz my data has some special char , i have not
confidence this function got proper result. i want to make my own
function / script for encoding.

Sep 10 '07 #3
On Mon, 2007-09-10 at 13:11 +0000, Harshad Modi wrote:
thx for Reply ,
but I need some basic knowledge . how to encoding ? which algorithm
use for that . bz my data has some special char , i have not
confidence this function got proper result. i want to make my own
function / script for encoding.
For basic knowledge about Unicode and character encodings, I highly
recommend amk's excellent Unicode How-To here:
http://www.amk.ca/python/howto/unicode

Once you've read and understood the How-To, I suggest you examine the
following:

1) Are you *sure* that the special characters in the original file are
latin-1 encoded? (If you're not sure, try to look at the file in a HEX
editor to see what character codes it uses for the special characters).
2) Are you sure that what you were using to look at the result file
understands and uses UTF-8 encoding? How are you telling it to use UTF-8
encoding?

Hope this helps,

--
Carsten Haese
http://informixdb.sourceforge.net
Sep 10 '07 #4
>>>>Harshad Modi <mo******@gmail.com(HM) wrote:
>HMhello ,
HM I make one function for encoding latin1 to utf-8. but i think it is
HMnot work proper.
HMplz guide me.
>HMit is not get proper result . such that i got "Belgi�" using this
HMmethod, (Belgium) :
>HMimport codecs
HMimport sys
HM# Encoding / decoding functions
HMdef encode(filename):
HM file = codecs.open(filename, encoding="latin-1")
HM data = file.read()
HM file = codecs.open(filename,"wb", encoding="utf-8")
HM file.write(data)
>HMfile_name=sys.argv[1]
HMencode(file_name)
I tried this program and for me it works correctly. So you probably used a
wrong input file or you misinterpreted the output. To be sure make hex
dumps of your input/output.
--
Piet van Oostrum <pi**@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: pi**@vanoostrum.org
Sep 10 '07 #5
On Sep 10, 5:25 am, Harshad Modi <modii...@gmail.comwrote:
hello ,
I make one function for encoding latin1 to utf-8. but i think it is
not work proper.
plz guide me.
Hi, what you want is here, including complete code:

Converting a File's "Character Set" / Encoding
http://xahlee.org/perl-python/charset_encoding.html

Xah
xa*@xahlee.org
http://xahlee.org/

Sep 10 '07 #6
thx for response ,
i think, my file has wrong encoding format.
thanks for guide and advise

Sep 12 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by Joe Blow | last post: by
reply views Thread by JuanDG | last post: by
2 posts views Thread by Guillermo Rosich Capablanca | last post: by
1 post views Thread by Sin Jeong-hun | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
reply views Thread by suresh191 | last post: by
reply views Thread by harlem98 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.