473,512 Members | 15,363 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

mysterious unicode

I'm using pyExcelerator and xlrd to read and write data from and to
two spreadsheets.

I created the "read" spreadsheet by importing a text file - and I had
no unicode aspirations.

When I read a cell, it appears to be unicode u'Q1", say.

I can try cleaning it, like this:
try:
s.encode("ascii", "replace")
except AttributeError:
pass
which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

Can somebody help me out?

Mar 20 '07 #1
9 1940
En Tue, 20 Mar 2007 19:35:00 -0300, Gerry <ge**********@gmail.com>
escribió:
which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).
I bet qno was unicode from start. When you print an unicode object, you
get the "unadorned" contents. When you print a tuple, it uses repr() on
each item.

pyqno = u"Q1"
pyqno
u'Q1'
pyprint qno
Q1
pyprint (qno,2)
(u'Q1', 2)

--
Gabriel Genellina

Mar 20 '07 #2
On Mar 20, 7:29 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
En Tue, 20 Mar 2007 19:35:00 -0300, Gerry <gerard.bl...@gmail.com>
escribió:
Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why
qno.encode("ascii", "replace") is still unicode.

Gerry

>

pyqno = u"Q1"
pyqno
u'Q1'
pyprint qno
Q1
pyprint (qno,2)
(u'Q1', 2)

--
Gabriel Genellina

Mar 20 '07 #3
En Tue, 20 Mar 2007 20:47:22 -0300, Gerry <ge**********@gmail.com>
escribió:
Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why
I can't tell...
qno.encode("ascii", "replace") is still unicode.
That *returns* a string, but you are discarding the return value. Should
be qno = qno.encode(...)
It's similar to lower(), by example.

--
Gabriel Genellina

Mar 21 '07 #4
On Tuesday 20 March 2007 18:35, Gerry wrote:
I'm using pyExcelerator and xlrd to read and
write data from and to two spreadsheets.

I created the "read" spreadsheet by importing a
text file - and I had no unicode aspirations.

When I read a cell, it appears to be unicode
u'Q1", say.

I can try cleaning it, like this:
try:
s.encode("ascii", "replace")
except AttributeError:
pass
which seems to work. Here's the mysterious
part (aside from why anything was unicode in
the first place):

print >debug, "c=", col,
"r=", row, "v=", value, "qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col,
row, qno, family, tuple, value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple
using qno is (u'Q1', ...).

Can somebody help me out?

I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database. I
take the database info which is in a list and do

name = str.record[0]
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.

jim-on-linux
http://www.inqvista.com






Mar 21 '07 #5
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,
Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."

-Carsten
Mar 21 '07 #6
On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux wrote:
I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database.
Many APIs that exchange data choose to exchange text in Unicode because
that eliminates encoding uncertainty. Whether an API uses Unicode would
probably be noted somewhere in its documentation.
I take the database info which is in a list and do

name = str.record[0]
You probably mean str(record[0]) .
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.
As a software engineer, I'd get worried if I didn't know why the code I
wrote works. Maybe that's just me.

Unicode is not rocket science. I suggest you read
http://www.amk.ca/python/howto/unicode to demystify what Unicode objects
are and do.

With str(), you're asking the Unicode object for its byte string
interpretation, which causes the Unicode object to give you its encoding
in the system default encoding. The default encoding is normally ascii.
That can be tweaked for your particular Python installation, but if you
need an encoding other than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you risk writing
non-portable code.

Using str() coercion of Unicode objects will work well enough until you
run into a string that contains characters that can't be represented in
the default encoding. Once that happens, you're better off explicitly
encoding the Unicode object into a well-defined encoding on input, or,
even better, just work with Unicode objects internally and only encode
to byte strings when absolutely necessary, such as when outputting to a
file or to the console.

Hope this helps,

Carsten.
Mar 21 '07 #7
On Tuesday 20 March 2007 21:17, Carsten Haese
wrote:
On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux
wrote:
I have been getting the same thing using
SQLite3 when extracting data fron an SQLite3
database.

Many APIs that exchange data choose to exchange
text in Unicode because that eliminates
encoding uncertainty. Whether an API uses
Unicode would probably be noted somewhere in
its documentation.
I take the database info which is in a list
and do

name = str.record[0]

You probably mean str(record[0]) .
Yes,

>
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure
out why.

As a software engineer, I'd get worried if I
didn't know why the code I wrote works. Maybe
that's just me.
I don't disagree, but sometime depending on the
situation, time to investigate is a luxury.
However,
( If you don't have the time to do it right the
first time when will you have the time to fix
it.)
>
Unicode is not rocket science. I suggest you
read http://www.amk.ca/python/howto/unicode to
demystify what Unicode objects are and do.

With str(), you're asking the Unicode object
for its byte string interpretation, which
causes the Unicode object to give you its
encoding in the system default encoding. The
default encoding is normally ascii. That can be
tweaked for your particular Python
installation, but if you need an encoding other
than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you
risk writing non-portable code.

Using str() coercion of Unicode objects will
work well enough until you run into a string
that contains characters that can't be
represented in the default encoding.
Right,
even though None or null are not strings they are
common enough to cause a problem.
Try to run a loop through a list with None or
null in it.
Example,
x = str(list[2])
when list[2] = null or None, problems.
Easy to fix but more work.

I'll check the web site out.

Thanks for the update,
Jim-on-linux
Once that
happens, you're better off explicitly encoding
the Unicode object into a well-defined encoding
on input, or, even better, just work with
Unicode objects internally and only encode to
byte strings when absolutely necessary, such as
when outputting to a file or to the console.

Hope this helps,

Carsten.
Mar 21 '07 #8
On Mar 21, 11:37 am, Carsten Haese <cars...@uniqsys.comwrote:
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,

Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."
And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John

Mar 21 '07 #9
On Mar 21, 6:07 am, "John Machin" <sjmac...@lexicon.netwrote:
On Mar 21, 11:37 am, Carsten Haese <cars...@uniqsys.comwrote:
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,
Thus quothhttp://www.lexicon.net/sjmachin/xlrd.html"This module
presents all text strings as Python unicode objects."

And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write() method.

HTH,
John
John,

That helps a lot. Thanks again!

Gerry

Mar 21 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
17594
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code...
8
5251
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5,...
8
3642
by: Francis Girard | last post by:
Hi, For the first time in my programmer life, I have to take care of character encoding. I have a question about the BOM marks. If I understand well, into the UTF-8 unicode binary...
4
3403
by: Viorel | last post by:
For me is a little bit mysterious how work encoding and decoding functions, what is underneath of their calling? Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1) ...
48
4578
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at...
4
6040
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3...
2
2610
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is...
4
1774
by: Sebastian Bargmann | last post by:
Hi, I've run into a weird problem with the form designer. I have three classes: a messagebox class and two forms (Base and Derived which is derived from Base): (note: only relevant code...
24
8993
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special...
0
7153
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7371
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
7093
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7517
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
4743
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3230
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3218
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1583
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
791
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.