469,266 Members | 1,803 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,266 developers. It's quick & easy.

mysterious unicode

I'm using pyExcelerator and xlrd to read and write data from and to
two spreadsheets.

I created the "read" spreadsheet by importing a text file - and I had
no unicode aspirations.

When I read a cell, it appears to be unicode u'Q1", say.

I can try cleaning it, like this:
try:
s.encode("ascii", "replace")
except AttributeError:
pass
which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

Can somebody help me out?

Mar 20 '07 #1
9 1765
En Tue, 20 Mar 2007 19:35:00 -0300, Gerry <ge**********@gmail.com>
escribió:
which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).
I bet qno was unicode from start. When you print an unicode object, you
get the "unadorned" contents. When you print a tuple, it uses repr() on
each item.

pyqno = u"Q1"
pyqno
u'Q1'
pyprint qno
Q1
pyprint (qno,2)
(u'Q1', 2)

--
Gabriel Genellina

Mar 20 '07 #2
On Mar 20, 7:29 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 20 Mar 2007 19:35:00 -0300, Gerry <gerard.bl...@gmail.com>
escribió:
Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why
qno.encode("ascii", "replace") is still unicode.

Gerry

>

pyqno = u"Q1"
pyqno
u'Q1'
pyprint qno
Q1
pyprint (qno,2)
(u'Q1', 2)

--
Gabriel Genellina

Mar 20 '07 #3
En Tue, 20 Mar 2007 20:47:22 -0300, Gerry <ge**********@gmail.com>
escribió:
Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why
I can't tell...
qno.encode("ascii", "replace") is still unicode.
That *returns* a string, but you are discarding the return value. Should
be qno = qno.encode(...)
It's similar to lower(), by example.

--
Gabriel Genellina

Mar 21 '07 #4
On Tuesday 20 March 2007 18:35, Gerry wrote:
I'm using pyExcelerator and xlrd to read and
write data from and to two spreadsheets.

I created the "read" spreadsheet by importing a
text file - and I had no unicode aspirations.

When I read a cell, it appears to be unicode
u'Q1", say.

I can try cleaning it, like this:
try:
s.encode("ascii", "replace")
except AttributeError:
pass
which seems to work. Here's the mysterious
part (aside from why anything was unicode in
the first place):

print >debug, "c=", col,
"r=", row, "v=", value, "qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col,
row, qno, family, tuple, value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple
using qno is (u'Q1', ...).

Can somebody help me out?

I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database. I
take the database info which is in a list and do

name = str.record[0]
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.

jim-on-linux
http://www.inqvista.com






Mar 21 '07 #5
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,
Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."

-Carsten
Mar 21 '07 #6
On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux wrote:
I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database.
Many APIs that exchange data choose to exchange text in Unicode because
that eliminates encoding uncertainty. Whether an API uses Unicode would
probably be noted somewhere in its documentation.
I take the database info which is in a list and do

name = str.record[0]
You probably mean str(record[0]) .
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.
As a software engineer, I'd get worried if I didn't know why the code I
wrote works. Maybe that's just me.

Unicode is not rocket science. I suggest you read
http://www.amk.ca/python/howto/unicode to demystify what Unicode objects
are and do.

With str(), you're asking the Unicode object for its byte string
interpretation, which causes the Unicode object to give you its encoding
in the system default encoding. The default encoding is normally ascii.
That can be tweaked for your particular Python installation, but if you
need an encoding other than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you risk writing
non-portable code.

Using str() coercion of Unicode objects will work well enough until you
run into a string that contains characters that can't be represented in
the default encoding. Once that happens, you're better off explicitly
encoding the Unicode object into a well-defined encoding on input, or,
even better, just work with Unicode objects internally and only encode
to byte strings when absolutely necessary, such as when outputting to a
file or to the console.

Hope this helps,

Carsten.
Mar 21 '07 #7
On Tuesday 20 March 2007 21:17, Carsten Haese
wrote:
On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux
wrote:
I have been getting the same thing using
SQLite3 when extracting data fron an SQLite3
database.

Many APIs that exchange data choose to exchange
text in Unicode because that eliminates
encoding uncertainty. Whether an API uses
Unicode would probably be noted somewhere in
its documentation.
I take the database info which is in a list
and do

name = str.record[0]

You probably mean str(record[0]) .
Yes,

>
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure
out why.

As a software engineer, I'd get worried if I
didn't know why the code I wrote works. Maybe
that's just me.
I don't disagree, but sometime depending on the
situation, time to investigate is a luxury.
However,
( If you don't have the time to do it right the
first time when will you have the time to fix
it.)
>
Unicode is not rocket science. I suggest you
read http://www.amk.ca/python/howto/unicode to
demystify what Unicode objects are and do.

With str(), you're asking the Unicode object
for its byte string interpretation, which
causes the Unicode object to give you its
encoding in the system default encoding. The
default encoding is normally ascii. That can be
tweaked for your particular Python
installation, but if you need an encoding other
than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you
risk writing non-portable code.

Using str() coercion of Unicode objects will
work well enough until you run into a string
that contains characters that can't be
represented in the default encoding.
Right,
even though None or null are not strings they are
common enough to cause a problem.
Try to run a loop through a list with None or
null in it.
Example,
x = str(list[2])
when list[2] = null or None, problems.
Easy to fix but more work.

I'll check the web site out.

Thanks for the update,
Jim-on-linux
Once that
happens, you're better off explicitly encoding
the Unicode object into a well-defined encoding
on input, or, even better, just work with
Unicode objects internally and only encode to
byte strings when absolutely necessary, such as
when outputting to a file or to the console.

Hope this helps,

Carsten.
Mar 21 '07 #8
On Mar 21, 11:37 am, Carsten Haese <cars...@uniqsys.comwrote:
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,

Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."
And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John

Mar 21 '07 #9
On Mar 21, 6:07 am, "John Machin" <sjmac...@lexicon.netwrote:
On Mar 21, 11:37 am, Carsten Haese <cars...@uniqsys.comwrote:
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,
Thus quothhttp://www.lexicon.net/sjmachin/xlrd.html"This module
presents all text strings as Python unicode objects."

And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John
John,

That helps a lot. Thanks again!

Gerry

Mar 21 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

3 posts views Thread by Michael Weir | last post: by
8 posts views Thread by Bill Eldridge | last post: by
8 posts views Thread by Francis Girard | last post: by
4 posts views Thread by webdev | last post: by
2 posts views Thread by Neil Schemenauer | last post: by
24 posts views Thread by ChaosKCW | last post: by
1 post views Thread by CARIGAR | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.