Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old March 16th, 2007, 05:05 PM
Robin Becker
Guest
 
Posts: n/a
Default mySQLdb versus platform problem

I am seeing different outcomes from simple requests against a common database
when run from a freebsd machine and a win32 box.


The test script is
#######################
import MySQLdb, sys
print sys.version
print MySQLdb.__version__
db=MySQLdb.connect(host='appx',db='sc_0',user='use r',passwd='secret',use_unicode=True)
cur=db.cursor()
cur.execute('select * from sc_accomodation where id=31')
data=cur.fetchall()

for i,t in enumerate(data[0]):
if isinstance(t,(str,unicode)): print i,repr(t)
#######################

The table in question is charset='latin1', however the original owners put some
special windows characters in eg 0x92 (a quote).

in the windows version I see this kind of string in the output

2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)]
1.2.1_p2
..........
14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of
living in the midst of nature on the C\xf4te d\x92Or beach which stretches
along the island\x92s northern coast.\r\n\r\nThe hotel\x92s 24 standard
and 4 superior ro........

the freeBSD machine produces

2.4.3 (#2, Sep 7 2006, 09:34:29)
[GCC 3.4.4 [FreeBSD] 20050518]
............
14 u'Built entirely of mahogany, Acajou seeks to introduce a new concept of
living in the midst of nature on the C\xf4te d\u2019Or beach which stretches
along the island\u2019s northern coast.\r\n\r\nThe hotel\u2019s 24 standard
and 4 superior rooms.......

so the windows version seems to leave the \x92 as is and the freebsd version
converts it to its correct value.

This is already bad enough as I expected the outcomes to be the same, but given
that the encoding of the database is wrong I expected some problems.

However, if I don't have use_unicode=True in the above script I get back
strings, but this time the difference is larger.

windows
2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)]
1.2.1_p2
........
2 "C\xf4te d'Or\r\nPraslin"
........

unix

2.4.3 (#2, Sep 7 2006, 09:34:29)
[GCC 3.4.4 [FreeBSD] 20050518]
1.2.1_p2
.......
2 "C\xc3\xb4te d'Or\r\nPraslin"
........

so here the returned string appears to have been automatically converted to utf8.


My questions are

1) why the difference in the unicode version?
2) why does the unix version convert to utf8?

The database being common it seems it's either the underlying libraries or the
compiled extension or python that causes these differences, but which?
--
Robin Becker

  #2  
Old March 16th, 2007, 08:35 PM
John Nagle
Guest
 
Posts: n/a
Default Re: mySQLdb versus platform problem

Try:

db=MySQLdb.connect(host='appx',db='sc_0',user='use r',passwd='secret',
use_unicode=True, charset = "utf8")

The distinction is that "use_unicode" tells Python to convert to Unicode,
but Python doesn't know the MySQL table type. 'charset="utf8"' tells
MySQL to do the conversion to UTF8, which can be reliably converted
to Unicode.

John Nagle

Robin Becker wrote:
Quote:
I am seeing different outcomes from simple requests against a common
database when run from a freebsd machine and a win32 box.
>
>
The test script is
....
Quote:
db=MySQLdb.connect(host='appx',db='sc_0',user='use r',passwd='secret',use_unicode=True)
  #3  
Old March 16th, 2007, 11:15 PM
Robin Becker
Guest
 
Posts: n/a
Default Re: mySQLdb versus platform problem

John Nagle wrote:
Quote:
Try:
>
db=MySQLdb.connect(host='appx',db='sc_0',user='use r',passwd='secret',
use_unicode=True, charset = "utf8")
>
The distinction is that "use_unicode" tells Python to convert to Unicode,
but Python doesn't know the MySQL table type. 'charset="utf8"' tells
MySQL to do the conversion to UTF8, which can be reliably converted
to Unicode.
>
John Nagle
>
>.......
OK that seems to help. However, my database has tables with different
encodings. Does MySQLdb ignore the table encoding? That would be a bit lame.

Also it still doesn't explain the different behaviours between unix &
win32 (or perhaps different defaults are somehow magically decided upon).
-things were so much easier when bytes were bytes-ly yrs-
Robin Becker
  #4  
Old March 17th, 2007, 06:45 AM
John Nagle
Guest
 
Posts: n/a
Default Re: mySQLdb versus platform problem

Robin Becker wrote:
Quote:
John Nagle wrote:
>
Quote:
>Try:
>>
> db=MySQLdb.connect(host='appx',db='sc_0',user='use r',passwd='secret',
> use_unicode=True, charset = "utf8")
>>
>The distinction is that "use_unicode" tells Python to convert to Unicode,
>but Python doesn't know the MySQL table type. 'charset="utf8"' tells
>MySQL to do the conversion to UTF8, which can be reliably converted
>to Unicode.
>>
> John Nagle
>>
>.......
>
>
OK that seems to help. However, my database has tables with different
encodings. Does MySQLdb ignore the table encoding? That would be a bit
lame.
MySQLdb, the client, doesn't know the table encoding. The
server end does.
Quote:
Also it still doesn't explain the different behaviours between unix &
win32 (or perhaps different defaults are somehow magically decided upon).
The default encoding is an environment thing. It comes, somehow, from
the locale your system thinks it is in.
Quote:
-things were so much easier when bytes were bytes-ly yrs-
Robin Becker
So convert the database to Unicode/UTF-8 and have everything
be consistent. MySQL 5 can do that dynamically with an ALTER
TABLE statement.

John Nagle
 

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles