encode/decode misunderstanding

Tim Arnold

Hi, I'm beginning to understand the encode/decode string methods, but I'd
like confirmation that I'm still thinking in the right direction:

I have a file of latin1 encoded text. Let's say I put one line of that file
into a string variable 'tocline', as follows:
tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'

import codecs
tocFile = codecs.open('mytoc.htm','wb',encoding='utf8',error s='replace')
tocline = tocline.decode('latin1','replace')
tocFile.write(tocline)
tocFile.close()

What I think is that tocFile is wrapped to insure that anything written to
it is in utf8
I decode the latin1 string into python's internal unicode encoding and that
gets written out as utf8.

Questions:
what exactly is the tocline when it's read in with that \xe9 and \xed in the
string? A latin1 encoded string?
Is my method the right way to write such a line out to a file with utf8
encoding?

If I read in the latin1 file using
codecs.open(filename,encoding='latin1') and write out the utf8 file by
opening with
codecs.open(othername,encoding='utf8'), would I no longer have a problem --
I could just read in latin1 and write out utf8 with no more worries about
encoding?

thanks,
--Tim

Jul 26 '07 #1

Subscribe Post Reply

4515

Tim Arnold

If I read in the latin1 file using

codecs.open(filename,encoding='latin1') and write out the utf8 file by
opening with
codecs.open(othername,encoding='utf8'), would I no longer have a
problem -- I could just read in latin1 and write out utf8 with no more
worries about encoding?

thanks,

Replying to my own post, I feel so lonely! I guess that silence means I *am*
thinking correctly about the encoding/decoding stuff; I'll keep heading in
this direction unless someone out there sees it differently.....

--Tim

Jul 27 '07 #2

Diez B. Roggisch

Tim Arnold schrieb:

Hi, I'm beginning to understand the encode/decode string methods, but I'd
like confirmation that I'm still thinking in the right direction:

I have a file of latin1 encoded text. Let's say I put one line of that file
into a string variable 'tocline', as follows:
tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'

import codecs
tocFile = codecs.open('mytoc.htm','wb',encoding='utf8',error s='replace')
tocline = tocline.decode('latin1','replace')
tocFile.write(tocline)
tocFile.close()

What I think is that tocFile is wrapped to insure that anything written to
it is in utf8
I decode the latin1 string into python's internal unicode encoding and that
gets written out as utf8.

Questions:
what exactly is the tocline when it's read in with that \xe9 and \xed in the
string? A latin1 encoded string?

Yes. A simple, pure byte-string, that happens to contain bytes which
under the latin1-encoding are "correct".

Is my method the right way to write such a line out to a file with utf8
encoding?

Yes.

If I read in the latin1 file using
codecs.open(filename,encoding='latin1') and write out the utf8 file by
opening with
codecs.open(othername,encoding='utf8'), would I no longer have a problem --
I could just read in latin1 and write out utf8 with no more worries about
encoding?

As long as you don't mix bytestrings and only use unicode-objects, you
should be fine, yes.

Diez

Jul 29 '07 #3

Tim Arnold

"Diez B. Roggisch" <de***@nospam.web.dewrote in message
news:5h*************@mid.uni-berlin.de...

Tim Arnold schrieb:
>Hi, I'm beginning to understand the encode/decode string methods, but I'd
like confirmation that I'm still thinking in the right direction:

I have a file of latin1 encoded text. Let's say I put one line of that
file into a string variable 'tocline', as follows:
tocline = 'Ficha Datos de p\xe9rdida AND acci\xf3n'

import codecs
tocFile = codecs.open('mytoc.htm','wb',encoding='utf8',error s='replace')
tocline = tocline.decode('latin1','replace')
tocFile.write(tocline)
tocFile.close()

What I think is that tocFile is wrapped to insure that anything written
to it is in utf8
I decode the latin1 string into python's internal unicode encoding and
that gets written out as utf8.

Questions:
what exactly is the tocline when it's read in with that \xe9 and \xed in
the string? A latin1 encoded string?

Yes. A simple, pure byte-string, that happens to contain bytes which under
the latin1-encoding are "correct".

>Is my method the right way to write such a line out to a file with utf8
encoding?

Yes.

>If I read in the latin1 file using
codecs.open(filename,encoding='latin1') and write out the utf8 file by
opening with
codecs.open(othername,encoding='utf8'), would I no longer have a
problem -- I could just read in latin1 and write out utf8 with no more
worries about encoding?

As long as you don't mix bytestrings and only use unicode-objects, you
should be fine, yes.

Diez

wow, I was thinking correctly about encoding! time for a beer!
Diez, thanks very much for confirming my thoughts.

--Tim Arnold

Jul 30 '07 #4

by: Newbie | last post by:

How would I modify this form to encode *all* the characters in the 'source' textarea to the '%xx' format & place result code into the 'output' textarea? (cross browser compatable) Any help is...

Javascript

base64.encode and decode not correct

by: Damir Hakimov | last post by:

Hi *! I found a strange bug in base64.encode and decode, when I try to encode - decode a file 1728512 bytes lenth. Is somebody meet with this? I don't attach the file because it big, but can...

Python

Encode/Decode Database

by: AR | last post by:

I would like to know more about the Encode/Decode feature available within MS Access. This is what I have read from Microsoft Office OnLine: "The simplest method of protection is to encode the...

Microsoft Access / VBA

C# server side (encode) -> C++ client side (decode) ?

by: jtfaulk | last post by:

I need to encode some information on the server side using ASP.NET with C#; sending via HTTP to a client side application, that needs to be decoded in an MFC C++ application. I'm not sure if I...

C# / C Sharp

encode() question

by: 7stud | last post by:

s1 = "hello" s2 = s1.encode("utf-8") s1 = "an accented 'e': \xc3\xa9" s2 = s1.encode("utf-8") The last line produces the error: --- Traceback (most recent call last):

Python

different encodings for unicode() and u''.encode(), bug?

by: mario | last post by:

Hello! i stumbled on this situation, that is if I decode some string, below just the empty string, using the mcbs encoding, it succeeds, but if I try to encode it back with the same encoding it...

Python

Some questions about decode/encode

by: glacier | last post by:

I use chinese charactors as an example here. "'\\xc4\\xe3\\xba\\xc3\\xc2\\xf0'" My first question is : what strategy does 'decode' use to tell the way to seperate the words. I mean since s1 is...

Python

usage of <string>.encode('utf-8','xmlcharrefreplace')?

by: J Peyret | last post by:

Well, as usual I am confused by unicode encoding errors. I have a string with problematic characters in it which I'd like to put into a postgresql table. That results in a postgresql error so I...

Python

Python beginner, unicode encode/decode Q

by: anonymous | last post by:

1 Objective to write little programs to help me learn German. See code after numbered comments. //Thanks in advance for any direction or suggestions. tk 2 Want keyboard answer input, for...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

encode/decode misunderstanding

Similar topics