By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,496 Members | 1,527 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,496 IT Pros & Developers. It's quick & easy.

Storing files in a BLOB field via SQL

P: n/a
Hello Python fans,

I'm trying and searching for many days for an acceptable solution...
without success. I want to store files in a database using BLOB
fields. The database table has an ID field (INT, auto_increment), an
ORDER field (INT, for knowing the right order) and a "normal" BLOB
field. It is planned to split large files in 64k-parts and sort these
parts by the ORDER field.

Here's some pseudo code how I wanted to implement this in my app:

file = file_open(myFileName, read_only)
order = 0
data = file.read(65535)
while (data):
query = "INSERT INTO table (order,data) VALUES (%i,%s)", order, data
mysql_exec(query)
order = order + 1
data = file.readBlock(65535)

The main problem is the handling of the binary data. There might be
errors in the SQL syntax if some special chars (quotas etc.) appear,
or the SQL statement is incorrect because of strange chars that can't
be encoded by the current codeset. Another problem is, you can't strip
these chars because you would change the binary data or make it bigger
than 64k.

Does anybody of you have an idea?
Any suggestions would be very helpful.

Additionally, I want to compress the data and store a checksum
somewhere. Any hint (links, sites, ...) is welcome...!

Thanks in advance,
Juergen
Jul 18 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Juergen Gerner wrote:
field. It is planned to split large files in 64k-parts and sort these
parts by the ORDER field.
Is there a special reason why you can't store the whole file in a
single BLOB? That's what it's a BLOB for, after all... L=Large :-)

Additionally, I want to compress the data and store a checksum
somewhere. Any hint (links, sites, ...) is welcome...!


Compress: look at the zlib module (gzip-style compression)
Checksum: what kind? 32 bit-crc: zlib.crc32 md5: md5.md5

--Irmen
Jul 18 '05 #2

P: n/a
Hi Irmen,

first of all, thanks for your help about compression & checksum!
Is there a special reason why you can't store the whole file in a
single BLOB? That's what it's a BLOB for, after all... L=Large :-)


Yes, there's a special reason. After reading a lot of documentation I
think it's better splitting large files in little blobs. It doesn't
matter if the SQL server is on the same machine as the application,
but if both parts are on different machines, large files have to be
transmitted over the network. During this transfer the application
isn't responding, I guess. So splitting would be much more flexible.
Additionally I think, splitting files makes the database more scalable
and the space on the harddrive better used.

But the splitting isn't my main problem. It's the way I transmit the
binary data to the database via an SQL syntax. Today I saw how
PhpMyAdmin handles binary data: it codes each byte in hexadecimal
values ("\0x..."). Is there any way to do something (or similar) with
Python, or maybe with PyQt/QString/QByteArray?

Thanks in advance!
Juergen
Jul 18 '05 #3

P: n/a
Juergen Gerner wrote:
Is there a special reason why you can't store the whole file in a
single BLOB? That's what it's a BLOB for, after all... L=Large :-)

Yes, there's a special reason. After reading a lot of documentation I
think it's better splitting large files in little blobs. It doesn't
matter if the SQL server is on the same machine as the application,
but if both parts are on different machines, large files have to be
transmitted over the network. During this transfer the application
isn't responding, I guess. So splitting would be much more flexible.


How would splitting the file in chunks improve the responsiveness
of the application? This would only work if your app needs only
a specific chunk of the larger file to work on. If you need to read
the full file, reading 10 chunks will take even longer than reading
one big BLOB.
You may decide to do it 'in the background' using a thread, but
then again, you could just as well load the single big BLOB inside
that separate thread.
Additionally I think, splitting files makes the database more scalable
and the space on the harddrive better used.
In my humble opinion these kind of assumptions are generally false.
Let the database decide what the most efficient storage method is
for your 100 Mb BLOB. I don't want to make these kind of assumptions
about the inner workings of my database server, and I certainly don't
want to wire them into my application code... what happens when you
switch platforms/DBMS? Is your code still 'the most efficient' then?
Just my 0.02
But the splitting isn't my main problem. It's the way I transmit the
binary data to the database via an SQL syntax.


Sorry can't help you with this. I would expect the database driver module
to do the 'right' escaping.
--Irmen
Jul 18 '05 #4

P: n/a
"Juergen Gerner" <J.******@GernerOnline.de> wrote in message
news:5a**************************@posting.google.c om...
Hello Python fans,

The main problem is the handling of the binary data. There might be
errors in the SQL syntax if some special chars (quotas etc.) appear,
or the SQL statement is incorrect because of strange chars that can't
be encoded by the current codeset. Another problem is, you can't strip
these chars because you would change the binary data or make it bigger
than 64k.

Does anybody of you have an idea?
Any suggestions would be very helpful.
You can either use the MySQL hex literal format (x'AABBCC'...) or use the
Python DB API which will handle the parameter conversion for you...

In the first case your query becomes somethings like:

query = "INSERT INTO table (order,data) VALUES (%i,x'%s')" % (order,
data.encode('hex'))

In the second, preferable version you use something like:

cur = conn.cursor()
cur.execute("INSERT INTO table (order,data) VALUES (?,?)", (order, data))

and the DBAPI/Database driver takes care of the rest.

Additionally, I want to compress the data and store a checksum
somewhere. Any hint (links, sites, ...) is welcome...!


compressed = data.encode('zip') # Compress the data
Mike.
Jul 18 '05 #5

P: n/a
Juergen Gerner wrote:

But the splitting isn't my main problem. It's the way I transmit the
binary data to the database via an SQL syntax. Today I saw how
PhpMyAdmin handles binary data: it codes each byte in hexadecimal
values ("\0x..."). Is there any way to do something (or similar) with
Python, or maybe with PyQt/QString/QByteArray?


Don't know if this is of help to you, but i use the following
to insert binary data into a MS-SQL Db with ADO.
Suppose var "rawdata" contains the binary data:

def bcd2str(bcs):
""" converts a BCD coded string to a ascii coded string

Note: does also work for all hex values, ie. '\x2d' """

out = ''
for c in bcs:
out = out + (hex(ord(c))[2:]).zfill(2)
return out
def str2hex(s):
""" converts binary byte (hex 0x00 - 0xff)
data in a python string into format needed to
insert into binary datatype on sql server """

return '0x' + bcd2str(s)
insertstring = "insert into foo (sID, RawData) VALUES (%s, %s)" \
% (str2hex(rawdata))
adoconn.Execute(insertstring)
regards,
Bruno
Jul 18 '05 #6

P: n/a
[J.******@GernerOnline.de (Juergen Gerner)]
I want to store files in a database using BLOB [...]Does anybody of you have an idea?


Maybe its a version issue. I recently grabbed the new version 1.0.0 of
MySQLdb. In the readme.html of the Win binary package you will find
this note:

"""
MySQL-Python 1.0.0 for win32 Notes:
June 28 2004
I needed to get mysql-python working for win32, so I compiled it. I
know a lot of people are looking for this, so enjoy... With 0.9.2,
BLOBs weren't working properly for me, ...
"""

Second:
Skimming over the docs I noticed that the Python API converts BLOBs to
array. Don't know if this hint is of significance in your case.

Hope it helps,
Martin
Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.