mysterious unicode

Gerry

I'm using pyExcelerator and xlrd to read and write data from and to
two spreadsheets.

I created the "read" spreadsheet by importing a text file - and I had
no unicode aspirations.

When I read a cell, it appears to be unicode u'Q1", say.

I can try cleaning it, like this:
try:
s.encode("ascii", "replace")
except AttributeError:
pass
which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

Can somebody help me out?

Mar 20 '07 #1

Subscribe Reply

1940

Gabriel Genellina

En Tue, 20 Mar 2007 19:35:00 -0300, Gerry <ge**********@gmail.com>
escribió:

which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

I bet qno was unicode from start. When you print an unicode object, you
get the "unadorned" contents. When you print a tuple, it uses repr() on
each item.

pyqno = u"Q1"
pyqno
u'Q1'
pyprint qno
Q1
pyprint (qno,2)
(u'Q1', 2)

--
Gabriel Genellina

Mar 20 '07 #2

Gerry

On Mar 20, 7:29 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:

En Tue, 20 Mar 2007 19:35:00 -0300, Gerry <gerard.bl...@gmail.com>
escribió:

Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why
qno.encode("ascii", "replace") is still unicode.

Gerry

>

pyqno = u"Q1"
pyqno
u'Q1'
pyprint qno
Q1
pyprint (qno,2)
(u'Q1', 2)

--
Gabriel Genellina

Mar 20 '07 #3

Gabriel Genellina

En Tue, 20 Mar 2007 20:47:22 -0300, Gerry <ge**********@gmail.com>
escribió:

Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why

I can't tell...

qno.encode("ascii", "replace") is still unicode.

That *returns* a string, but you are discarding the return value. Should
be qno = qno.encode(...)
It's similar to lower(), by example.

--
Gabriel Genellina

Mar 21 '07 #4

jim-on-linux

On Tuesday 20 March 2007 18:35, Gerry wrote:

I'm using pyExcelerator and xlrd to read and
write data from and to two spreadsheets.

I created the "read" spreadsheet by importing a
text file - and I had no unicode aspirations.

When I read a cell, it appears to be unicode
u'Q1", say.

I can try cleaning it, like this:
try:
s.encode("ascii", "replace")
except AttributeError:
pass
which seems to work. Here's the mysterious
part (aside from why anything was unicode in
the first place):

print >debug, "c=", col,
"r=", row, "v=", value, "qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col,
row, qno, family, tuple, value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple
using qno is (u'Q1', ...).

Can somebody help me out?

I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database. I
take the database info which is in a list and do

name = str.record[0]
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.

jim-on-linux
http://www.inqvista.com

Mar 21 '07 #5

Carsten Haese

On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:

I'm still mystified why:
qno was ever unicode,

Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."

-Carsten

Mar 21 '07 #6

Carsten Haese

On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux wrote:

I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database.

Many APIs that exchange data choose to exchange text in Unicode because
that eliminates encoding uncertainty. Whether an API uses Unicode would
probably be noted somewhere in its documentation.

I take the database info which is in a list and do

name = str.record[0]

You probably mean str(record[0]) .

rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.

As a software engineer, I'd get worried if I didn't know why the code I
wrote works. Maybe that's just me.

Unicode is not rocket science. I suggest you read
http://www.amk.ca/python/howto/unicode to demystify what Unicode objects
are and do.

With str(), you're asking the Unicode object for its byte string
interpretation, which causes the Unicode object to give you its encoding
in the system default encoding. The default encoding is normally ascii.
That can be tweaked for your particular Python installation, but if you
need an encoding other than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you risk writing
non-portable code.

Using str() coercion of Unicode objects will work well enough until you
run into a string that contains characters that can't be represented in
the default encoding. Once that happens, you're better off explicitly
encoding the Unicode object into a well-defined encoding on input, or,
even better, just work with Unicode objects internally and only encode
to byte strings when absolutely necessary, such as when outputting to a
file or to the console.

Hope this helps,

Carsten.

Mar 21 '07 #7

jim-on-linux

On Tuesday 20 March 2007 21:17, Carsten Haese
wrote:

On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux

wrote:

I have been getting the same thing using
SQLite3 when extracting data fron an SQLite3
database.

Many APIs that exchange data choose to exchange
text in Unicode because that eliminates
encoding uncertainty. Whether an API uses
Unicode would probably be noted somewhere in
its documentation.

I take the database info which is in a list
and do

name = str.record[0]

You probably mean str(record[0]) .

Yes,

>
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure
out why.

As a software engineer, I'd get worried if I
didn't know why the code I wrote works. Maybe
that's just me.

I don't disagree, but sometime depending on the
situation, time to investigate is a luxury.
However,
( If you don't have the time to do it right the
first time when will you have the time to fix
it.)

>
Unicode is not rocket science. I suggest you
read http://www.amk.ca/python/howto/unicode to
demystify what Unicode objects are and do.

With str(), you're asking the Unicode object
for its byte string interpretation, which
causes the Unicode object to give you its
encoding in the system default encoding. The
default encoding is normally ascii. That can be
tweaked for your particular Python
installation, but if you need an encoding other
than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you
risk writing non-portable code.

Using str() coercion of Unicode objects will
work well enough until you run into a string
that contains characters that can't be
represented in the default encoding.

Right,
even though None or null are not strings they are
common enough to cause a problem.
Try to run a loop through a list with None or
null in it.
Example,
x = str(list[2])
when list[2] = null or None, problems.
Easy to fix but more work.

I'll check the web site out.

Thanks for the update,
Jim-on-linux

Once that
happens, you're better off explicitly encoding
the Unicode object into a well-defined encoding
on input, or, even better, just work with
Unicode objects internally and only encode to
byte strings when absolutely necessary, such as
when outputting to a file or to the console.

Hope this helps,

Carsten.

Mar 21 '07 #8

John Machin

On Mar 21, 11:37 am, Carsten Haese <cars...@uniqsys.comwrote:

On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,

Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."

And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John

Mar 21 '07 #9

Gerry

On Mar 21, 6:07 am, "John Machin" <sjmac...@lexicon.netwrote:

On Mar 21, 11:37 am, Carsten Haese <cars...@uniqsys.comwrote:

On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,

Thus quothhttp://www.lexicon.net/sjmachin/xlrd.html"This module
presents all text strings as Python unicode objects."

And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John

John,

That helps a lot. Thanks again!

Gerry

Mar 21 '07 #10

Similar topics

17594

Writing UTF-8 string to UNICODE file

by: Michael Weir | last post by:

I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...

Python

5251

Unicode from Web to MySQL

by: Bill Eldridge | last post by:

I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...

Python

3642

Unicode BOM marks

by: Francis Girard | last post by:

Hi, For the first time in my programmer life, I have to take care of character encoding. I have a question about the BOM marks. If I understand well, into the UTF-8 unicode binary...

Python

3403

Mysterious functions of text encoding....

by: Viorel | last post by:

For me is a little bit mysterious how work encoding and decoding functions, what is underneath of their calling? Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1) ...

.NET Framework

4578

Adobe GoLive 6 - Nasty feature with UTF-8 encoding

by: Zenobia | last post by:

Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at...

HTML / CSS

6040

minidom xml & non ascii / unicode & files

by: webdev | last post by:

lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...

Python

2610

Revised PEP 349: Allow str() to return unicode strings

by: Neil Schemenauer | last post by:

python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is...

Python

1774

Designer: mysterious behavior with form constructers and static methods

by: Sebastian Bargmann | last post by:

Hi, I've run into a weird problem with the form designer. I have three classes: a messagebox class and two forms (Base and Derived which is derived from Base): (note: only relevant code...

C# / C Sharp

8993

Convertion of Unicode to ASCII NIGHTMARE

by: ChaosKCW | last post by:

Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special...

Python

7153

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7371

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7093

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7517

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

4743

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

3230

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

3218

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1583

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

791

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP