encoding latin1 to utf-8

Harshad Modi

hello ,
I make one function for encoding latin1 to utf-8. but i think it is
not work proper.
plz guide me.

it is not get proper result . such that i got "Belgiï¿½" using this
method, (Belgium) :

import codecs
import sys
# Encoding / decoding functions
def encode(filename):
file = codecs.open(filename, encoding="latin-1")
data = file.read()
file = codecs.open(filename,"wb", encoding="utf-8")
file.write(data)

file_name=sys.argv[1]
encode(file_name)

Sep 10 '07 #1

Subscribe Post Reply

9567

J. Clifford Dyer

On Mon, Sep 10, 2007 at 12:25:46PM -0000, Harshad Modi wrote regarding encoding latin1 to utf-8:

Path: news.xs4all.nl!newsspool.news.xs4all.nl!transit.ne ws.xs4all.nl!newsgate.cistron.nl!xs4all!news.glorb .com!postnews.google.com!22g2000hsm.googlegroups.c om!not-for-mail

hello ,
I make one function for encoding latin1 to utf-8. but i think it is
not work proper.
plz guide me.

it is not get proper result . such that i got "Belgi???" using this
method, (Belgium) :

import codecs
import sys
# Encoding / decoding functions
def encode(filename):
file = codecs.open(filename, encoding="latin-1")
data = file.read()
file = codecs.open(filename,"wb", encoding="utf-8")
file.write(data)

file_name=sys.argv[1]
encode(file_name)

Some tips to help you out.

1. Close your filehandles when you're done with them.
2. Don't shadow builtin names. Python uses the name file, and binding it to your own function can have ugly side effects that manifest down the road.

So perhaps try the following:

import codecs

def encode(filename):
read_handle = codecs.open(filename, encoding='latin-1')
data = read_handle.read()
read_handle.close()
write_handle = codecs.open(filename, 'wb', encoding='utf-8')
write_handle.write(data)
write_handle.close()

For what it's worth though, I couldn't reproduce your problem with either your code or mine. This is not too surprising as all the ascii characters are encoded identically in utf-8 and latin-1. So your program should output exactly the same file as it reads, if the contents of the file just read "Belgium"

Cheers,
Cliff

Sep 10 '07 #2

Harshad Modi

thx for Reply ,
but I need some basic knowledge . how to encoding ? which algorithm
use for that . bz my data has some special char , i have not
confidence this function got proper result. i want to make my own
function / script for encoding.

Sep 10 '07 #3

Carsten Haese

On Mon, 2007-09-10 at 13:11 +0000, Harshad Modi wrote:

thx for Reply ,
but I need some basic knowledge . how to encoding ? which algorithm
use for that . bz my data has some special char , i have not
confidence this function got proper result. i want to make my own
function / script for encoding.

For basic knowledge about Unicode and character encodings, I highly
recommend amk's excellent Unicode How-To here:
http://www.amk.ca/python/howto/unicode

Once you've read and understood the How-To, I suggest you examine the
following:

1) Are you *sure* that the special characters in the original file are
latin-1 encoded? (If you're not sure, try to look at the file in a HEX
editor to see what character codes it uses for the special characters).
2) Are you sure that what you were using to look at the result file
understands and uses UTF-8 encoding? How are you telling it to use UTF-8
encoding?

Hope this helps,

--
Carsten Haese
http://informixdb.sourceforge.net

Sep 10 '07 #4

Piet van Oostrum

>>>>Harshad Modi <mo******@gmail.com(HM) wrote:

>HMhello ,
HM I make one function for encoding latin1 to utf-8. but i think it is
HMnot work proper.
HMplz guide me.

>HMit is not get proper result . such that i got "Belgiï¿½" using this
HMmethod, (Belgium) :

>HMimport codecs
HMimport sys
HM# Encoding / decoding functions
HMdef encode(filename):
HM file = codecs.open(filename, encoding="latin-1")
HM data = file.read()
HM file = codecs.open(filename,"wb", encoding="utf-8")
HM file.write(data)

>HMfile_name=sys.argv[1]
HMencode(file_name)

I tried this program and for me it works correctly. So you probably used a
wrong input file or you misinterpreted the output. To be sure make hex
dumps of your input/output.
--
Piet van Oostrum <pi**@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: pi**@vanoostrum.org

Sep 10 '07 #5

Xah Lee

On Sep 10, 5:25 am, Harshad Modi <modii...@gmail.comwrote:

hello ,
I make one function for encoding latin1 to utf-8. but i think it is
not work proper.
plz guide me.

Hi, what you want is here, including complete code:

Converting a File's "Character Set" / Encoding
http://xahlee.org/perl-python/charset_encoding.html

Xah
xa*@xahlee.org
¡Æ http://xahlee.org/

Sep 10 '07 #6

Harshad Modi

thx for response ,
i think, my file has wrong encoding format.
thanks for guide and advise

Sep 12 '07 #7

by: Mike Kennedy | last post by:

I have an XML Snippet <?xml version="1.0" encoding="UTF-8"?> and when I convert the entire xml file to a DOM and then generate a new file from the DOM, results in <?xml version="1.0"?>. Any...

.NET Framework

encoding in utf-8

by: Joe Blow | last post by:

Strange problem, Web pages encoded in utf-8 are appearing on customers' browsers as iso-western european. This means that characters like the British £ symbol get messed up. No amount of...

HTML / CSS

Change encoding from UTF-8 to ISO-8859-1

by: JuanDG | last post by:

Hi, I have a .Net Web Service and the encoding of the SOAP Messages is always UTF-8, and I need to change the web service so that it encodes with ISO-8859-1 because itâ€™s the only encoding my...

.NET Framework

encoding problems (utf-8)

by: Guillermo Rosich Capablanca | last post by:

I have a problem with utf-8 enconding and I don't know what to do in order to make it work. I want to open a new window with excel data so the user can choose to save it local. Everything...

ASP.NET

change encoding from UTF-8 to ISO-8859-1

by: JuanDG | last post by:

.NET Framework

Default encoding as UTF-8 in VS.NET 2005

by: Sin Jeong-hun | last post by:

Even though I don't use Korean in my codes and comments, I often need to insert Korean literals, because the target users are Koreans. Because I have no plan to globalize those programs, I use...

C# / C Sharp

<?xml version="1.0" encoding="utf-8"?> creting problem with mozilla firefox

by: jariwaladivyesh | last post by:

Hi frnds, i have simple XML doc <?xml version="1.0" encoding="utf-8"?> <?xml-stylesheet type="text/xsl" href="test.xsl"?> <data> <name> Divyesh Jariala</name> </data>

XML

Universal String (4 Byte Canonical Encoding) and UTF-32

by: Jeffrey Walton | last post by:

Hi All, BMP Strings are a subset of Universal Strings.The BMP string uses approximately 65,000 code points from Universal String encoding. BMP Strings: ISO/IEC 10646, 2-octet canonical form,...

C# / C Sharp

Invalid PI name <?XML version="1.0" encoding="UTF-8"?>

by: autumnrrr | last post by:

This is coming up Invalid PI name <?XML version="1.0" encoding="UTF-8"?> XML declaration allowed only at the start of the document <?xml version="1.0" encoding="UTF-8"?>

HTML / CSS

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

encoding latin1 to utf-8

Similar topics