473,836 Members | 1,599 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

mysterious unicode

I'm using pyExcelerator and xlrd to read and write data from and to
two spreadsheets.

I created the "read" spreadsheet by importing a text file - and I had
no unicode aspirations.

When I read a cell, it appears to be unicode u'Q1", say.

I can try cleaning it, like this:
try:
s.encode("ascii ", "replace")
except AttributeError:
pass
which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).

Can somebody help me out?

Mar 20 '07 #1
9 1957
En Tue, 20 Mar 2007 19:35:00 -0300, Gerry <ge**********@g mail.com>
escribió:
which seems to work. Here's the mysterious part (aside from why
anything was unicode in the first place):

print >debug, "c=", col, "r=", row, "v=", value,
"qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col, row, qno, family, tuple,
value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple using qno is
(u'Q1', ...).
I bet qno was unicode from start. When you print an unicode object, you
get the "unadorned" contents. When you print a tuple, it uses repr() on
each item.

pyqno = u"Q1"
pyqno
u'Q1'
pyprint qno
Q1
pyprint (qno,2)
(u'Q1', 2)

--
Gabriel Genellina

Mar 20 '07 #2
On Mar 20, 7:29 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.a r>
wrote:
En Tue, 20 Mar 2007 19:35:00 -0300, Gerry <gerard.bl...@g mail.com>
escribió:
Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why
qno.encode("asc ii", "replace") is still unicode.

Gerry

>

pyqno = u"Q1"
pyqno
u'Q1'
pyprint qno
Q1
pyprint (qno,2)
(u'Q1', 2)

--
Gabriel Genellina

Mar 20 '07 #3
En Tue, 20 Mar 2007 20:47:22 -0300, Gerry <ge**********@g mail.com>
escribió:
Thanks! - that helps a lot.

I'm still mystified why:
qno was ever unicode, and why
I can't tell...
qno.encode("asc ii", "replace") is still unicode.
That *returns* a string, but you are discarding the return value. Should
be qno = qno.encode(...)
It's similar to lower(), by example.

--
Gabriel Genellina

Mar 21 '07 #4
On Tuesday 20 March 2007 18:35, Gerry wrote:
I'm using pyExcelerator and xlrd to read and
write data from and to two spreadsheets.

I created the "read" spreadsheet by importing a
text file - and I had no unicode aspirations.

When I read a cell, it appears to be unicode
u'Q1", say.

I can try cleaning it, like this:
try:
s.encode("ascii ", "replace")
except AttributeError:
pass
which seems to work. Here's the mysterious
part (aside from why anything was unicode in
the first place):

print >debug, "c=", col,
"r=", row, "v=", value, "qno=", qno
tuple = (qno, family)
try:
data[tuple].append(value)
except:
data[tuple] = [value]
print >debug, "!!!", col,
row, qno, family, tuple, value, data[tuple]

which produces:

c= 1 r= 3 v= 4 qno= Q1
!!! 1 3 Q1 O (u'Q1', 'O') 4 [1, u' ', 4]

where qno seems to be a vanilla Q1, but a tuple
using qno is (u'Q1', ...).

Can somebody help me out?

I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database. I
take the database info which is in a list and do

name = str.record[0]
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.

jim-on-linux
http://www.inqvista.com






Mar 21 '07 #5
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,
Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."

-Carsten
Mar 21 '07 #6
On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux wrote:
I have been getting the same thing using SQLite3
when extracting data fron an SQLite3 database.
Many APIs that exchange data choose to exchange text in Unicode because
that eliminates encoding uncertainty. Whether an API uses Unicode would
probably be noted somewhere in its documentation.
I take the database info which is in a list and do

name = str.record[0]
You probably mean str(record[0]) .
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure out
why.
As a software engineer, I'd get worried if I didn't know why the code I
wrote works. Maybe that's just me.

Unicode is not rocket science. I suggest you read
http://www.amk.ca/python/howto/unicode to demystify what Unicode objects
are and do.

With str(), you're asking the Unicode object for its byte string
interpretation, which causes the Unicode object to give you its encoding
in the system default encoding. The default encoding is normally ascii.
That can be tweaked for your particular Python installation, but if you
need an encoding other than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you risk writing
non-portable code.

Using str() coercion of Unicode objects will work well enough until you
run into a string that contains characters that can't be represented in
the default encoding. Once that happens, you're better off explicitly
encoding the Unicode object into a well-defined encoding on input, or,
even better, just work with Unicode objects internally and only encode
to byte strings when absolutely necessary, such as when outputting to a
file or to the console.

Hope this helps,

Carsten.
Mar 21 '07 #7
On Tuesday 20 March 2007 21:17, Carsten Haese
wrote:
On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux
wrote:
I have been getting the same thing using
SQLite3 when extracting data fron an SQLite3
database.

Many APIs that exchange data choose to exchange
text in Unicode because that eliminates
encoding uncertainty. Whether an API uses
Unicode would probably be noted somewhere in
its documentation.
I take the database info which is in a list
and do

name = str.record[0]

You probably mean str(record[0]) .
Yes,

>
rather than
name = record[0]

So far, I havn't had any problems.
For some reason the unicode u is removed.
I havn't wanted to spend the time to figure
out why.

As a software engineer, I'd get worried if I
didn't know why the code I wrote works. Maybe
that's just me.
I don't disagree, but sometime depending on the
situation, time to investigate is a luxury.
However,
( If you don't have the time to do it right the
first time when will you have the time to fix
it.)
>
Unicode is not rocket science. I suggest you
read http://www.amk.ca/python/howto/unicode to
demystify what Unicode objects are and do.

With str(), you're asking the Unicode object
for its byte string interpretation, which
causes the Unicode object to give you its
encoding in the system default encoding. The
default encoding is normally ascii. That can be
tweaked for your particular Python
installation, but if you need an encoding other
than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you
risk writing non-portable code.

Using str() coercion of Unicode objects will
work well enough until you run into a string
that contains characters that can't be
represented in the default encoding.
Right,
even though None or null are not strings they are
common enough to cause a problem.
Try to run a loop through a list with None or
null in it.
Example,
x = str(list[2])
when list[2] = null or None, problems.
Easy to fix but more work.

I'll check the web site out.

Thanks for the update,
Jim-on-linux
Once that
happens, you're better off explicitly encoding
the Unicode object into a well-defined encoding
on input, or, even better, just work with
Unicode objects internally and only encode to
byte strings when absolutely necessary, such as
when outputting to a file or to the console.

Hope this helps,

Carsten.
Mar 21 '07 #8
On Mar 21, 11:37 am, Carsten Haese <cars...@uniqsy s.comwrote:
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,

Thus quoth http://www.lexicon.net/sjmachin/xlrd.html "This module
presents all text strings as Python unicode objects."
And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write () method.

HTH,
John

Mar 21 '07 #9
On Mar 21, 6:07 am, "John Machin" <sjmac...@lexic on.netwrote:
On Mar 21, 11:37 am, Carsten Haese <cars...@uniqsy s.comwrote:
On Tue, 2007-03-20 at 16:47 -0700, Gerry wrote:
I'm still mystified why:
qno was ever unicode,
Thus quothhttp://www.lexicon.net/sjmachin/xlrd.html"This module
presents all text strings as Python unicode objects."

And why would that be? As the next sentence in the referenced docs
says, "From Excel 97 onwards, text in Excel spreadsheets has been
stored as Unicode."

Gerry, your "Q1" string was converted to Unicode when you wrote it
using pyExcelerator's Worksheet.write () method.

HTH,
John
John,

That helps a lot. Thanks again!

Gerry

Mar 21 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
17626
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code that - opens a file appropriately for output - writes to this file Thanks very much. Michael Weir
8
5284
by: Bill Eldridge | last post by:
I'm trying to grab a document off the Web and toss it into a MySQL database, but I keep running into the various encoding problems with Unicode (that aren't a problem for me with GB2312, BIG 5, etc.) What I'd like is something as simple as: CREATE TABLE junk (junklet VARCHAR(2500) CHARACTER SET UTF8)); import MySQLdb, re,urllib
8
3670
by: Francis Girard | last post by:
Hi, For the first time in my programmer life, I have to take care of character encoding. I have a question about the BOM marks. If I understand well, into the UTF-8 unicode binary representation, some systems add at the beginning of the file a BOM mark (Windows?), some don't. (Linux?). Therefore, the exact same text encoded in the same UTF-8 will result in two different binary files, and of a slightly different length. Right ?
4
3424
by: Viorel | last post by:
For me is a little bit mysterious how work encoding and decoding functions, what is underneath of their calling? Encoding1.GetBytes(string1); in particularly ASCII.GetBytes(string1) Encoding1.GetChars(string1); Encoding1.GetChars(arrayofbytes1);
48
4659
by: Zenobia | last post by:
Recently I was editing a document in GoLive 6. I like GoLive because it has some nice features such as: * rewrite source code * check syntax * global search & replace (through several files at once) * regular expression search & replace. Normally my documents are encoded with the ISO setting. Recently I was writing an XHTML document. After changing the encoding to UTF-8 I used the
4
6076
by: webdev | last post by:
lo all, some of the questions i'll ask below have most certainly been discussed already, i just hope someone's kind enough to answer them again to help me out.. so i started a python 2.3 script that grabs some web pages from the web, regex parse the data and stores it localy to xml file for further use.. at first i had no problem using python minidom and everything concerning
2
2638
by: Neil Schemenauer | last post by:
python-dev@python.org.] The PEP has been rewritten based on a suggestion by Guido to change str() rather than adding a new built-in function. Based on my testing, I believe the idea is feasible. It would be helpful if people could test the patched Python with their own applications and report any incompatibilities. PEP: 349
4
1800
by: Sebastian Bargmann | last post by:
Hi, I've run into a weird problem with the form designer. I have three classes: a messagebox class and two forms (Base and Derived which is derived from Base): (note: only relevant code below)
24
9081
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special charcaters from an ASCII perspective. I get the following error: > SQLiteCur.execute(sql, row)
0
9825
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10852
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10553
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9382
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6980
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5829
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4459
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4021
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3116
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.