A question on Encoding and Decoding.

kath

Hi all,

Platform: winxp
Version: Python 2.3

I have a task of reading files in a folder and creating an one excel
file with sheets, one sheet per file, with sheet named as filename. I
am facing problem in handling special characters. I am using XLRD and
XLW package to read/write from/to file. But facing problem in handling
special characters. I am getting encode error.

UnicodeDecodeEr ror 'ascii' codec can't encode character u'\xdf' in
position 19: ordinal not in range(128)
row: 76

the cell value at rowx = 76, colx = 0 is
'Activest-Aktien-Großbritannien'

I used Latin-1 encoding, but after the file is created I get an error
'Unable to read the file'.

When I get the exception I want to format the string so that I can use
it to write to a file and also query database.

Can anybody guide how to solve this problem. what encoding I should
use, and after wring to file I should the same special character.

Also python IDLE is able to output the same character corretly when I
say print and why not I?
Any suggestions would be greatly appreciated.

thanks.
regards,
kath.

Nov 13 '06 #1

Subscribe Reply

3091

John Machin

kath wrote:

Hi all,

Platform: winxp
Version: Python 2.3

I have a task of reading files in a folder and creating an one excel
file with sheets, one sheet per file, with sheet named as filename. I
am facing problem in handling special characters. I am using XLRD and
XLW package to read/write from/to file.

The name of the first package is "xlrd", *NOT* "XLRD".

By "XLW"; do you mean python2xlw? If so, why don't you upgrade to
Python 2.4 or 2.5 so that you can use pyExcelerator (which is the most
modern and least unmaintained of the pure-Python Excel-writing
gadgets)? If you can't upgrade, consider using pyXLWriter instead.

Note that python2xlw is *NOT* intended to be a general-purpose Excel
file writer. From the top of its XLW.py:
'''Simple XLW file maker.
Can create data sheets and Scatter Charts from the data sheets.
Data sheet is defined as a list of rows.
The first row contains labels, subsequent rows are numeric.
Charts are defined by adding XY series.
Series are defined as zero-based indeces to the dataSheet, XColumn, and
YColumn.
The entire column will be plotted.'''

Are you sure that the description "The first row contains labels,
subsequent rows are numeric." fits your data?

At this stage, sentient beings who are looking for a general-purpose
Excel writer and who are not skeptics would probably turn away. Kinda
makes you believe it won't handle *any* kind of string (either str or
Unicode) at rowx == 76. Game over. "77 Sunset Row" :-)

It turns out that what happens (quite independently of row number) is:
Ints and reals are spat out as Excel NUMBER records.
For *any* other Python type, it attempts (without benefit of try/accept
or any other prophylactic) to write str(value)[:254] to an Excel LABEL
record.
Apart from silent truncation of data, this will cause:
* unicode data that contains non-ASCII characters to produce a
UnicodeDecodeEr ror
* a long item e.g. 1234567890L to appear as an Excel text type with
value"123456789 0", not an Excel number type wirth value 1234567890.0

Oh, and if you forgot to call the XLW.XLW.addChar t method, it kindly
remedies your deficiency by supplying one. What if you consciously
don't want any ferschlugginer charts?

Unfortunately, outside the Redmond Pale, knowledge of charts in .XLW
files is scant. Given such a file:
* Gnumeric 1.6.something (Windows version) crashes
* OpenOffice.org' s Calc 2.0.something (Windows version) thinks silently
for a second or two, then leaves you with an empty worksheet, and no
charts, and no message.
* xlrd grumbles that it was expecting a worksheet, and gives up -- I'll
fix this; it'll expect a chart as a possibility, and ignore it (as with
XLS files).

But facing problem in handling
special characters. I am getting encode error.

Have you read any of these:
(a) The notes on Unicode in the xlrd documentation?
(b) The Unicode howto (http://www.amk.ca/python/howto/unicode)
?

>
UnicodeDecodeEr ror 'ascii' codec can't encode character u'\xdf' in
position 19: ordinal not in range(128)
row: 76

the cell value at rowx = 76, colx = 0 is
'Activest-Aktien-Großbritannien'

:-)
Gross Brits? Must be talking about the "Barmy Army" :-)
:-)

>
I used Latin-1 encoding,

If, as it seems, you think you've found out what the problem is, fixed
it, and continued on, why bother telling us?

but after the file is created I get an error
'Unable to read the file'.

*WHAT* program is producing this error message? Under what
circumstances?

>
When I get the exception I want to format the string so that I can use
it to write to a file and also query database.

Never mind what you want; what you *need* is to encode your data with
the appropriate encoding, so that you *don't* get the exception. Then
it should be a good old legacy str instance, suitable for writing
anywhere within reason. You can query your database with str and/or
unicode data, depending on how it is stored and whether the database
interface converts on the fly -- read the database-specific docs and/or
ask a specific question.

>
Can anybody guide how to solve this problem. what encoding I should
use, and after wring to file I should the same special character.

My parser barfed on that last clause:
ParseFailure: After "wring", expected "neck"
:-)

The encoding that you should use is one that encompasses all your data.
If latin1 doesn't hack it, switch to pyExcelerator (as already
recommended); it will write the latest (only 9 years old) version of
Excel files which are recorded in utf_16_le.

What makes you think that the encoding problem (which you appear to
have fixed (at least temporarily)) is anything at all to do with the
mystery program saying 'Unable to read the file"? Have you tried a test
input file which has *only* ASCII text in it?

>
Also python IDLE is able to output the same character corretly when I
say print and why not I?

I have to go out now, so I'll leave something for someone else to
answer.

>

Any suggestions would be greatly appreciated.

If it all becomes too hard, try sending me (1) a two-sheet input test
file for your app (2) the corresponding error-message-causing output
file and (3) your code, and I'll have a look at it.

Cheers,
John

Nov 14 '06 #2

Fredrik Lundh

"kath" wrote:

Also python IDLE is able to output the same character corretly when I
say print and why not I?

probably because IDLE's interactive window is Unicode-aware, but your terminal
is not.

</F>

Nov 14 '06 #3

rbsharp

Hello,
I think this remark is more to the point. In my experience, the general
problem is that python operates with the default encoding "ascii" as in
sys.getdefaulte ncoding(). It is possible to set the defaultencoding in
sitecustomize.p y, with sys.setdefaulte ncoding('latin1 '). I have placed
sitecustomize.p y in Lib/site-packages. It is not possible to set the
encoding once python has started. Setting the encoding only works if
you can bind yourself to this one encoding and is therefore no general
fix.
The only reasonable way to work is to get your strings into unicode
(and sometimes back out again).
If for instance you type:
s = "äÄöÖüÜß" and then try
us = unicode(s) you will get a traceback identical to yours.
However:
us = unicode(s,'lati n1')
will work. If however to try:
print us
you will get another traceback.

try:

>>print "%r" % us

u'\x84\x8e\x94\ x99\x81\x9a\xe1 '
Even if it is not pretty, at least the program won't crash.
You can get a better result with:

>>print us.encode('lati n1')

äÄöÖüÜß

The whole topic can get tedious at times, but be assured,
PyExcelerator, for writing, and xlrd for reading Excel files do work,
with unicode, both in the content and in the sheetnames. PyExcelerator
may be able to read Excel Sheets but not nearly as well as xlrd, which
works wonderfully.

regards,
Richard Sharp
Fredrik Lundh wrote:

"kath" wrote:

Also python IDLE is able to output the same character corretly when I
say print and why not I?

probably because IDLE's interactive window is Unicode-aware, but your terminal
is not.

</F>

Nov 15 '06 #4

Johan von Boisman

rb*****@gmx.de wrote:

Hello,
I think this remark is more to the point. In my experience, the general
problem is that python operates with the default encoding "ascii" as in
sys.getdefaulte ncoding(). It is possible to set the defaultencoding in
sitecustomize.p y, with sys.setdefaulte ncoding('latin1 '). I have placed
sitecustomize.p y in Lib/site-packages. It is not possible to set the
encoding once python has started. Setting the encoding only works if
you can bind yourself to this one encoding and is therefore no general
fix.
The only reasonable way to work is to get your strings into unicode
(and sometimes back out again).
If for instance you type:
s = "äÄöÖüÜß" and then try
us = unicode(s) you will get a traceback identical to yours.

I missed the beginning of this thread, but why not write

s = u"äÄöÖüÜß"

Is there ever a reason _not_ to exclusively use the unicode stringtype
throughout your Python program? (of course you may need to encode/decode
when interfacing with the world outside your program)

/johan

Nov 17 '06 #5

John Machin

On Nov 17, 9:18 pm, Johan von Boisman <do-not-re...@by-mail.comwrote:

I missed the beginning of this thread, but why not write

s = u"äÄöÖüÜß"

Is there ever a reason _not_ to exclusively use the unicode stringtype
throughout your Python program? (of course you may need to encode/decode
when interfacing with the world outside your program)

Please consider going back and finding the beginning of the thread.

Cheers,
John

Nov 17 '06 #6

Fredrik Lundh

Johan von Boisman wrote:

Is there ever a reason _not_ to exclusively use the unicode stringtype
throughout your Python program?

speed and memory use. in 2.5, the unicode datatype is almost often as
fast as the string type at the algorithm level, but it's still limited
by memory bandwidth for certain operations. it simply takes a bit more
time to copy two or four times as much data.

but requiring four times as much memory use can be really hurting, in
some applications, e.g.

http://effbot.org/zone/celementtree.htm#benchmarks

a good compromise is to use 8-bit strings for ASCII-only strings, and
Unicode strings for everything else. ASCII strings mix well with
Unicode, and you can use (almost) all string methods and other string
operations on both kind of strings, without problems.

</F>

Nov 17 '06 #7

Similar topics

5383

Python Huffman encoding

by: dot | last post by:

Hi all, I have written a Python huffman Encoding Module, for my own amusement. I thought it might be educational/entertaining for other people, so I've put it on my website and wrote about it a little. http://gumuz.looze.net/wordpress/index.php/archives/2004/11/25/huffman-encoding/ Your comments are highly appreciated! cheers,

Python

6111

serious encoding problem

by: timtos | last post by:

I want to save text in a file and after that I want to display this textfile using the internet explorer. If I am displaying "html text" everything is fine but if I want to display plain text all characters from the extended ascii are looking weird - are not properly encoded! Using the options in View -> Encoding -> ... in the internet explorer I can switch to another encoding and it is displayed correct. With the same way, I can make...

C# / C Sharp

9641

.NET (-compatible) JPEG 2000 encoding / decoding library

by: Laszlo Szijarto | last post by:

anyone know of a JPEG 2000 encoding / decoding library that works with .NET? Thank you, Laszlo

C# / C Sharp

2674

Problem encoding/decoding image

by: Slade | last post by:

Hi, I'm trying to use POST an image to a web page with WebRequest/WebResponse. Only problem is that I must be making an error somewhere in the encoding/decoding process. I've pasted below a bit of sample code that basically shows how I am encoding and then decoding the binary image. Many thanks if you can point out what I am doing wrong... thanks, Slade Smith Image bmp =context.GetImage();

ASP.NET

23697

query string encoding/decoding

by: Mark | last post by:

I've run a few simple tests looking at how query string encoding/decoding gets handled in asp.net, and it seems like the situation is even messier than it was in asp... Can't say I think much of the "improvements", but maybe someone here can point me in the right direction... First, it looks like asp.net will automatically read and recognize query strings encoded in utf8 and 16-bit unicode, only the latter is some mutant, non-standard...

ASP.NET

3367

Crazy with character encoding

by: Zhiv Kurilka | last post by:

Hi, I have a text file with following content: "((^)|(.* +))§§§§§§§§" if I read it with: k=System.IO.StreamReader( "file.txt",System.Text.Encoding.ASCII); k.readtotheend()

C# / C Sharp

5755

How to read html files AS IS. Encoding seems to change the characters.

by: Zoro | last post by:

My task is to read html files from disk and save them onto SQL Server database field. I have created an nvarchar(max) field to hold them. The problem is that some characters, particularly html entities, and French/German special characters are lost and/or replaced by a question mark. This is really frustrating. I have tried using StreamReader with ALL the encodings available and none work correctly. Each encoding handles some characters...

C# / C Sharp

8801

Hamming Encoding / Decoding

by: KWSW | last post by:

Having settled the huffman encoding/decoding and channel modeling(thanks to the previous part on bitwise operation), the last part would be hamming encoding/decoding. Did some research as usual on hamming codes and how they work(well sort of) I got a general idea how to start constucting a (7,4) hamming code. Unfortunately I have no idea how to start on the decoding/error correcting part and some direction would be nice. --- for...

Java

1815

Problems with encoding/decoding locales

by: Michele | last post by:

Hi there, I'm using a python script in conjunction with a JPype, to run java classes. So, here's the code: from jpype import * import os import random import math import sys

Python

4288

Seperating encoding from decoding

by: mviuk | last post by:

Hi, I'm looking for a system which detaches the decoding process from the encoding process. That is, I would like a system for encoding data, but even if both the encoded data and the encoding process is known I would like it to be possible to decode the data with a seperate algorithm. I realise I didn't explain that very well so I'll say what I want to use it for. On a website I want to store details and for them to be encoded, since this...

Algorithms / Advanced Math

8375

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

8290

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

8707

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

8482

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

8593

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7306

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

5622

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4294

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1916

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP