On Mon, Sep 10, 2007 at 12:25:46PM -0000, Harshad Modi wrote regarding encoding latin1 to utf-8:
Path: news.xs4all.nl!newsspool.news.xs4all.nl!transit.ne ws.xs4all.nl!newsgate.cistron.nl!xs4all!news.glorb .com!postnews.google.com!22g2000hsm.googlegroups.c om!not-for-mail
hello ,
I make one function for encoding latin1 to utf-8. but i think it is
not work proper.
plz guide me.
it is not get proper result . such that i got "Belgi???" using this
method, (Belgium) :
import codecs
import sys
# Encoding / decoding functions
def encode(filename):
file = codecs.open(filename, encoding="latin-1")
data = file.read()
file = codecs.open(filename,"wb", encoding="utf-8")
file.write(data)
file_name=sys.argv[1]
encode(file_name)
Some tips to help you out.
1. Close your filehandles when you're done with them.
2. Don't shadow builtin names. Python uses the name file, and binding it to your own function can have ugly side effects that manifest down the road.
So perhaps try the following:
import codecs
def encode(filename):
read_handle = codecs.open(filename, encoding='latin-1')
data = read_handle.read()
read_handle.close()
write_handle = codecs.open(filename, 'wb', encoding='utf-8')
write_handle.write(data)
write_handle.close()
For what it's worth though, I couldn't reproduce your problem with either your code or mine. This is not too surprising as all the ascii characters are encoded identically in utf-8 and latin-1. So your program should output exactly the same file as it reads, if the contents of the file just read "Belgium"
Cheers,
Cliff