By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,095 Members | 1,586 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,095 IT Pros & Developers. It's quick & easy.

encoding misunderstanding

P: n/a
Hi, I'm beginning to understand the encode/decode string methods, but
I'd like confirmation that I'm still thinking in the right direction:

I have a file of latin1 encoded text. Let's say I put one line of that
into a string variable 'tocline', as follows:
tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'

import codecs
tocFile ='mytoc.htm','wb',encoding='utf8',error s='replace')
tocline = tocline.decode('latin1','replace')

What I think is that tocFile is wrapped to insure that anything
written to it is in utf8
I decode the latin1 string into python's internal unicode encoding and
that gets written out as utf8.

what exactly is the tocline when it's read in with that \xe9 and \xed
in the string? A latin1 encoded string?
Is my method the right way to write such a line out to a file with

If I read in the latin1 file using,encoding='latin1') and write out the utf8 file
opening with,encoding='utf8'), would I no longer have a
problem -- I could just read in latin1 and write out utf8 with no
more worries about

p.s. sorry if you see this twice--my newsreader is flaky right now.

Jul 27 '07 #1
Share this question for a faster answer!
Share on Google+

This discussion thread is closed

Replies have been disabled for this discussion.