pyodbc data corruption problem

Joe Salmeri

I have found a data corruption problem with pyodbc.

OS = Windows XP SP2
DB = Microsoft Access XP

PROBLEM:

When selecting columns from a table that are of type Memo the value
returned is padded with a bunch of null characters at the end.

The problems does not seem to occur until the length of the Memo column
exceeds 2048 bytes.

I have attached several scripts to help demonstrate the problem.

To recreate the problem:

1. Create a blank Access database named test.
2. Create a ODBC DSN named test for that database
3. Run the createtable.py script to create the table
and load it with the dummy data
4. Run the broke.py script to show the problem.

The issue is when the data value is 2048 bytes.

The size in the createtable.py is 2046 bytes plus 3 bytes at the end that
contain "JOE" for a total of 2049 bytes.

If you change it from 2046 to 2045 (or less) then the problem does not
occur.

#
# createtable.py script
#

import pyodbc

dbs = pyodbc.connect('dsn=test')

c = dbs.cursor()

try:
sql = 'drop table test_memo'
c.execute(sql)
dbs.commit()
except:
# ignore drop table failure
pass

sql = 'create table test_memo (c1 int not null, c2 memo not null)'

c.execute(sql)

dbs.commit()

sql = 'insert into test_memo values(1, ?)'

c2_value = '1' * 2046

c2_value = '%sJOE' % (c2_value)

c.execute(sql, (c2_value,))

dbs.commit()

c.close()
dbs.close()

#
# broke.py script
#
import pyodbc

dbs = pyodbc.connect('dsn=test')

c = dbs.cursor()

sql = 'select c2, len(c2) as c2_db_len from test_memo where c1 = 1'

c.execute(sql)

row = c.fetchone()

(
c2,
c2_db_len
) = row

print repr(c2)

print 'c2 length :', len(c2)
print 'c2_db_length :', c2_db_len

print 'before nul length:', len(c2[0:c2.find('\x00')])

c.close()
dbs.close()

May 18 '07 #1

Subscribe Post Reply

3007

mensanator

On May 18, 6:46 pm, "Joe Salmeri" <JoeSalm...@hotmail.comwrote:

I have found a data corruption problem with pyodbc.

OS = Windows XP SP2
DB = Microsoft Access XP

PROBLEM:

When selecting columns from a table that are of type Memo the value
returned is padded with a bunch of null characters at the end.

The problems does not seem to occur until the length of the Memo column
exceeds 2048 bytes.

Interesting. MS-Access has had a bug about Memo fields
ever since Version 2.0 (the last time it worked). I was trying
to use a Memo field on a form and append short status messages.
This mysteriously stopped working in the version following 2.0
(Access 95?) and has never worked since. Seems there is some
kind of 2048 buffer involved that was created by some punk
MicroSoft gave the lowly job of software revision to while
those more educated did more important things.

My guess is you won't find a Python solution short of
accepting what Access gives you and dealing with it.

>
I have attached several scripts to help demonstrate the problem.

To recreate the problem:

1. Create a blank Access database named test.
2. Create a ODBC DSN named test for that database
3. Run the createtable.py script to create the table
and load it with the dummy data
4. Run the broke.py script to show the problem.

The issue is when the data value is 2048 bytes.

The size in the createtable.py is 2046 bytes plus 3 bytes at the end that
contain "JOE" for a total of 2049 bytes.

If you change it from 2046 to 2045 (or less) then the problem does not
occur.

#
# createtable.py script
#

import pyodbc

dbs = pyodbc.connect('dsn=test')

c = dbs.cursor()

try:
sql = 'drop table test_memo'
c.execute(sql)
dbs.commit()
except:
# ignore drop table failure
pass

sql = 'create table test_memo (c1 int not null, c2 memo not null)'

c.execute(sql)

dbs.commit()

sql = 'insert into test_memo values(1, ?)'

c2_value = '1' * 2046

c2_value = '%sJOE' % (c2_value)

c.execute(sql, (c2_value,))

dbs.commit()

c.close()
dbs.close()

#
# broke.py script
#

import pyodbc

dbs = pyodbc.connect('dsn=test')

c = dbs.cursor()

sql = 'select c2, len(c2) as c2_db_len from test_memo where c1 = 1'

c.execute(sql)

row = c.fetchone()

(
c2,
c2_db_len
) = row

print repr(c2)

print 'c2 length :', len(c2)
print 'c2_db_length :', c2_db_len

print 'before nul length:', len(c2[0:c2.find('\x00')])

c.close()
dbs.close()

May 19 '07 #2

Joe Salmeri

Thank you for your response but this is not an Access problem.

The exact same code using mx.ODBC or using the old odbc.py that comes with
the win32 files works fine.

It only fails with pyodbc.

<me********@aol.comwrote in message
news:11*********************@n59g2000hsh.googlegro ups.com...

On May 18, 6:46 pm, "Joe Salmeri" <JoeSalm...@hotmail.comwrote:
>I have found a data corruption problem with pyodbc.

OS = Windows XP SP2
DB = Microsoft Access XP

PROBLEM:

When selecting columns from a table that are of type Memo the value
returned is padded with a bunch of null characters at the end.

The problems does not seem to occur until the length of the Memo column
exceeds 2048 bytes.

Interesting. MS-Access has had a bug about Memo fields
ever since Version 2.0 (the last time it worked). I was trying
to use a Memo field on a form and append short status messages.
This mysteriously stopped working in the version following 2.0
(Access 95?) and has never worked since. Seems there is some
kind of 2048 buffer involved that was created by some punk
MicroSoft gave the lowly job of software revision to while
those more educated did more important things.

My guess is you won't find a Python solution short of
accepting what Access gives you and dealing with it.

>>
I have attached several scripts to help demonstrate the problem.

To recreate the problem:

1. Create a blank Access database named test.
2. Create a ODBC DSN named test for that database
3. Run the createtable.py script to create the table
and load it with the dummy data
4. Run the broke.py script to show the problem.

The issue is when the data value is 2048 bytes.

The size in the createtable.py is 2046 bytes plus 3 bytes at the end that
contain "JOE" for a total of 2049 bytes.

If you change it from 2046 to 2045 (or less) then the problem does not
occur.

#
# createtable.py script
#

import pyodbc

dbs = pyodbc.connect('dsn=test')

c = dbs.cursor()

try:
sql = 'drop table test_memo'
c.execute(sql)
dbs.commit()
except:
# ignore drop table failure
pass

sql = 'create table test_memo (c1 int not null, c2 memo not null)'

c.execute(sql)

dbs.commit()

sql = 'insert into test_memo values(1, ?)'

c2_value = '1' * 2046

c2_value = '%sJOE' % (c2_value)

c.execute(sql, (c2_value,))

dbs.commit()

c.close()
dbs.close()

#
# broke.py script
#

import pyodbc

dbs = pyodbc.connect('dsn=test')

c = dbs.cursor()

sql = 'select c2, len(c2) as c2_db_len from test_memo where c1 = 1'

c.execute(sql)

row = c.fetchone()

(
c2,
c2_db_len
) = row

print repr(c2)

print 'c2 length :', len(c2)
print 'c2_db_length :', c2_db_len

print 'before nul length:', len(c2[0:c2.find('\x00')])

c.close()
dbs.close()

May 19 '07 #3

Joe Salmeri

I have done some additiona investigate into this problem and found the
following:

As described below the problem does not begin to appear until the return
value size is 2048.

Once the return value is greater than 2048 bytes the value returned by
pyodbc is 2 times the actual size of the return value data.

The return value data is padded by exactly the same number of null
characters as there are in the actual data size.

In other words if the actual database value is 4868 bytes, then pyodbc will
return a value that is 9736 bytes long. The first 4868 bytes will be the
real data, followed by 4868 bytes of nulls.

I did a second test where the actual data size was 11,109 bytes. In that
case pyodbc returned a value that was 22,218 bytes long. The first 11,109
bytes are the real data, followed by 11,109 null bytes.

This seems to confirm the bug.

"Joe Salmeri" <Jo********@hotmail.comwrote in message
news:EP******************************@comcast.com. ..

>I have found a data corruption problem with pyodbc.

OS = Windows XP SP2
DB = Microsoft Access XP

PROBLEM:

When selecting columns from a table that are of type Memo the value
returned is padded with a bunch of null characters at the end.

The problems does not seem to occur until the length of the Memo column
exceeds 2048 bytes.

I have attached several scripts to help demonstrate the problem.

To recreate the problem:

1. Create a blank Access database named test.
2. Create a ODBC DSN named test for that database
3. Run the createtable.py script to create the table
and load it with the dummy data
4. Run the broke.py script to show the problem.

The issue is when the data value is 2048 bytes.

The size in the createtable.py is 2046 bytes plus 3 bytes at the end that
contain "JOE" for a total of 2049 bytes.

If you change it from 2046 to 2045 (or less) then the problem does not
occur.

#
# createtable.py script
#

import pyodbc

dbs = pyodbc.connect('dsn=test')

c = dbs.cursor()

try:
sql = 'drop table test_memo'
c.execute(sql)
dbs.commit()
except:
# ignore drop table failure
pass

sql = 'create table test_memo (c1 int not null, c2 memo not null)'

c.execute(sql)

dbs.commit()

sql = 'insert into test_memo values(1, ?)'

c2_value = '1' * 2046

c2_value = '%sJOE' % (c2_value)

c.execute(sql, (c2_value,))

dbs.commit()

c.close()
dbs.close()

#
# broke.py script
#
import pyodbc

dbs = pyodbc.connect('dsn=test')

c = dbs.cursor()

sql = 'select c2, len(c2) as c2_db_len from test_memo where c1 = 1'

c.execute(sql)

row = c.fetchone()

(
c2,
c2_db_len
) = row

print repr(c2)

print 'c2 length :', len(c2)
print 'c2_db_length :', c2_db_len

print 'before nul length:', len(c2[0:c2.find('\x00')])

c.close()
dbs.close()

May 24 '07 #4

Similar topics

installing pyodbc

by: timw.google | last post by:

I just downloaded the pyodbc source to try and install on my Linux FC3 box. I see that there is a setup.py file, but when I try to do a 'python setup.py build' (or just 'python setup.py') I get ...

Python

PyODBC Stored proc calling

by: king kikapu | last post by:

Hi to all, can anyone give me a jump-start about how to call Stored Procedures from PyODBC ?? I want to execute a very simple testing Stored Procedure on an Sql Server database. I started...

Python

list of datasaources with pyodbc

by: timw.google | last post by:

How do I get a list of datasources with pyodbc? I know that with mx.ODBC.Windows I can use the DataSources method to get a dictionay containing the datasources. Is there a similar way to do this...

Python

pyodbc.Error Crash

by: Joe Salmeri | last post by:

I believe this bug is also related to the other problem I just reported. OS = Windows XP SP2 DB = Microsoft Access XP PROBLEM: When you use + (or &) to concatenation columns together and...

Python

Database Access using pyodbc. I've a problem

by: Rajendran | last post by:

Hi all, I've installed pyodbc module to access my database (MS Access). I've setup a User level DSN to the database.mdb file. When I run my python code in the command prompt it is retrieving the...

Python

DB Connection Manager now supports mxODBC, win32ODBC and PyODBC

by: bartonc | last post by:

This is a work in progress (current and active). There are some issues to resolve in supporting multiple connection types and I plan to add PySQLite to the mix. The this update is due to my...

Python

Chinese character become ??? by pymssql or pyodbc

by: James Su | last post by:

I tried to use pymssql to access MSSQL 2000, with a table, I store Chinese Character in NVarchar field, Chinese Character display normally when I query them by MS SQL Query Analyzer under Windows...

Python

Problem with pyodbc, Python?

by: john.goodleaf | last post by:

I've just built pyodbc 2.0.58 against freetds and unixodbc. When I attempt to invoke it, either from the test script or from the interpreter, I get: ImportError:...

Python

Pyodbc and minimock with doctest

by: dj | last post by:

Hello, I have just started working with minimock in doctest. I want to create a mock pyodbc object which returns a string value when the method execute is called. Here is my doctest: ...

Python

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++