Hi, I'm beginning to understand the encode/decode string methods, but I'd
like confirmation that I'm still thinking in the right direction:
I have a file of latin1 encoded text. Let's say I put one line of that file
into a string variable 'tocline', as follows:
tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'
import codecs
tocFile = codecs.open('mytoc.htm','wb',encoding='utf8',error s='replace')
tocline = tocline.decode('latin1','replace')
tocFile.write(tocline)
tocFile.close()
What I think is that tocFile is wrapped to insure that anything written to
it is in utf8
I decode the latin1 string into python's internal unicode encoding and that
gets written out as utf8.
Questions:
what exactly is the tocline when it's read in with that \xe9 and \xed in the
string? A latin1 encoded string?
Is my method the right way to write such a line out to a file with utf8
encoding?
If I read in the latin1 file using
codecs.open(filename,encoding='latin1') and write out the utf8 file by
opening with
codecs.open(othername,encoding='utf8'), would I no longer have a problem --
I could just read in latin1 and write out utf8 with no more worries about
encoding?
thanks,
--Tim 3 4515
If I read in the latin1 file using
codecs.open(filename,encoding='latin1') and write out the utf8 file by
opening with
codecs.open(othername,encoding='utf8'), would I no longer have a
problem -- I could just read in latin1 and write out utf8 with no more
worries about encoding?
thanks,
Replying to my own post, I feel so lonely! I guess that silence means I *am*
thinking correctly about the encoding/decoding stuff; I'll keep heading in
this direction unless someone out there sees it differently.....
--Tim
Tim Arnold schrieb:
Hi, I'm beginning to understand the encode/decode string methods, but I'd
like confirmation that I'm still thinking in the right direction:
I have a file of latin1 encoded text. Let's say I put one line of that file
into a string variable 'tocline', as follows:
tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'
import codecs
tocFile = codecs.open('mytoc.htm','wb',encoding='utf8',error s='replace')
tocline = tocline.decode('latin1','replace')
tocFile.write(tocline)
tocFile.close()
What I think is that tocFile is wrapped to insure that anything written to
it is in utf8
I decode the latin1 string into python's internal unicode encoding and that
gets written out as utf8.
Questions:
what exactly is the tocline when it's read in with that \xe9 and \xed in the
string? A latin1 encoded string?
Yes. A simple, pure byte-string, that happens to contain bytes which
under the latin1-encoding are "correct".
Is my method the right way to write such a line out to a file with utf8
encoding?
Yes.
If I read in the latin1 file using
codecs.open(filename,encoding='latin1') and write out the utf8 file by
opening with
codecs.open(othername,encoding='utf8'), would I no longer have a problem --
I could just read in latin1 and write out utf8 with no more worries about
encoding?
As long as you don't mix bytestrings and only use unicode-objects, you
should be fine, yes.
Diez
"Diez B. Roggisch" <de***@nospam.web.dewrote in message
news:5h*************@mid.uni-berlin.de...
Tim Arnold schrieb:
>Hi, I'm beginning to understand the encode/decode string methods, but I'd like confirmation that I'm still thinking in the right direction:
I have a file of latin1 encoded text. Let's say I put one line of that file into a string variable 'tocline', as follows: tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'
import codecs tocFile = codecs.open('mytoc.htm','wb',encoding='utf8',error s='replace') tocline = tocline.decode('latin1','replace') tocFile.write(tocline) tocFile.close()
What I think is that tocFile is wrapped to insure that anything written to it is in utf8 I decode the latin1 string into python's internal unicode encoding and that gets written out as utf8.
Questions: what exactly is the tocline when it's read in with that \xe9 and \xed in the string? A latin1 encoded string?
Yes. A simple, pure byte-string, that happens to contain bytes which under
the latin1-encoding are "correct".
>Is my method the right way to write such a line out to a file with utf8 encoding?
Yes.
>If I read in the latin1 file using codecs.open(filename,encoding='latin1') and write out the utf8 file by opening with codecs.open(othername,encoding='utf8'), would I no longer have a problem -- I could just read in latin1 and write out utf8 with no more worries about encoding?
As long as you don't mix bytestrings and only use unicode-objects, you
should be fine, yes.
Diez
wow, I was thinking correctly about encoding! time for a beer!
Diez, thanks very much for confirming my thoughts.
--Tim Arnold This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Newbie |
last post by:
How would I modify this form
to encode *all* the characters
in the 'source' textarea to the
'%xx' format & place result
code into the 'output' textarea?
(cross browser compatable)
Any help is...
|
by: Damir Hakimov |
last post by:
Hi *!
I found a strange bug in base64.encode and decode, when I try to encode
- decode a file 1728512 bytes lenth.
Is somebody meet with this? I don't attach the file because it big, but
can...
|
by: AR |
last post by:
I would like to know more about the Encode/Decode feature available
within MS Access.
This is what I have read from Microsoft Office OnLine:
"The simplest method of protection is to encode the...
|
by: jtfaulk |
last post by:
I need to encode some information on the server side using ASP.NET with
C#; sending via HTTP to a client side application, that needs to be
decoded in an MFC C++ application.
I'm not sure if I...
|
by: 7stud |
last post by:
s1 = "hello"
s2 = s1.encode("utf-8")
s1 = "an accented 'e': \xc3\xa9"
s2 = s1.encode("utf-8")
The last line produces the error:
---
Traceback (most recent call last):
|
by: mario |
last post by:
Hello!
i stumbled on this situation, that is if I decode some string, below
just the empty string, using the mcbs encoding, it succeeds, but if I
try to encode it back with the same encoding it...
|
by: glacier |
last post by:
I use chinese charactors as an example here.
"'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'"
My first question is : what strategy does 'decode' use to tell the way
to seperate the words. I mean since s1 is...
|
by: J Peyret |
last post by:
Well, as usual I am confused by unicode encoding errors.
I have a string with problematic characters in it which I'd like to
put into a postgresql table.
That results in a postgresql error so I...
|
by: anonymous |
last post by:
1 Objective to write little programs to help me learn German. See code
after numbered comments. //Thanks in advance for any direction or
suggestions.
tk
2 Want keyboard answer input, for...
|
by: taylorcarr |
last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: aa123db |
last post by:
Variable and constants
Use var or let for variables and const fror constants.
Var foo ='bar';
Let foo ='bar';const baz ='bar';
Functions
function $name$ ($parameters$) {
}
...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
| |