473,406 Members | 2,273 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Literal Escaped Octets

I am trying to convert raw binary data to data with escaped octets in
order to store it in a bytea field on postgresql server. I could do this
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python. I read some ways to use unpack from the
struct module, but i really couldn't understand where that would help. I
looked at the MIMIEncode module but I don't know how to convert the object
to a string. Is there a module that will convert the data? It seems to me
that this question must have been answered a million times before but I
can't find anything.

See http://www.postgresql.org/docs/8.1/i...pe-binary.html
for a description of the problem domain.
Feb 6 '06 #1
10 1858
Chason Hayes <ch*****@hotmail.com> wrote:
...
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python.


If you have a bytestring (AKA plain string) s, the binary value of its
k-th byte is ord(s[k]).
Alex
Feb 6 '06 #2
Chason Hayes wrote:
I am trying to convert raw binary data to data with escaped octets in
order to store it in a bytea field on postgresql server. I could do this
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python. I read some ways to use unpack from the
struct module, but i really couldn't understand where that would help. I
looked at the MIMIEncode module but I don't know how to convert the object
to a string. Is there a module that will convert the data? It seems to me
that this question must have been answered a million times before but I
can't find anything.

See http://www.postgresql.org/docs/8.1/i...pe-binary.html
for a description of the problem domain.

The URL you reference is discussing how you represent arbitrary values
in string literals. If you already have the data in a Python string the
best advise is to use a parameterized query - that way your Python DB
API module will do the escaping for you!

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Feb 6 '06 #3
On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:
Chason Hayes wrote:
I am trying to convert raw binary data to data with escaped octets in
order to store it in a bytea field on postgresql server. I could do this
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python. I read some ways to use unpack from the
struct module, but i really couldn't understand where that would help. I
looked at the MIMIEncode module but I don't know how to convert the object
to a string. Is there a module that will convert the data? It seems to me
that this question must have been answered a million times before but I
can't find anything.

See http://www.postgresql.org/docs/8.1/i...pe-binary.html
for a description of the problem domain.

The URL you reference is discussing how you represent arbitrary values
in string literals. If you already have the data in a Python string the
best advise is to use a parameterized query - that way your Python DB
API module will do the escaping for you!

regards
Steve


Thanks for the input. I tried that with a format string and a
dictionary, but I still received a database error indicating illegal
string values. This error went away completely when I used a test file
consisting only of text, but reproduced everytime with a true binary file.
If you can let me know where I am wrong or show me a code snippet with a
sql insert that contains a variable with raw binary data that works,
I would greatly appreciate it.

Chason

Feb 6 '06 #4
On Sun, 05 Feb 2006 21:07:23 -0800, Alex Martelli wrote:
Chason Hayes <ch*****@hotmail.com> wrote:
...
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python.


If you have a bytestring (AKA plain string) s, the binary value of its
k-th byte is ord(s[k]).
Alex


Thank you very much, That is the function that I was looking for to write
a filter.

Chason

Feb 6 '06 #5
Chason Hayes wrote:
On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:

[...]

The URL you reference is discussing how you represent arbitrary values
in string literals. If you already have the data in a Python string the
best advise is to use a parameterized query - that way your Python DB
API module will do the escaping for you!

regards
Steve

Thanks for the input. I tried that with a format string and a
dictionary, but I still received a database error indicating illegal
string values. This error went away completely when I used a test file
consisting only of text, but reproduced everytime with a true binary file.
If you can let me know where I am wrong or show me a code snippet with a
sql insert that contains a variable with raw binary data that works,
I would greatly appreciate it.

I tried and my experience was exactly the same, which made me think less
of PostgreSQL.

They don't seem to implement the SQL BLOB type properly, so it looks as
though that rebarbative syntax with all the backslashes is necessary. Sorry.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Feb 7 '06 #6
On Mon, 06 Feb 2006 04:40:31 GMT, Chason Hayes <ch*****@hotmail.com> wrote:
I am trying to convert raw binary data to data with escaped octets in
order to store it in a bytea field on postgresql server. I could do this
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python. I read some ways to use unpack from the
struct module, but i really couldn't understand where that would help. I
looked at the MIMIEncode module but I don't know how to convert the object
to a string. Is there a module that will convert the data? It seems to me
that this question must have been answered a million times before but I
can't find anything.

Have you considered just encoding the data as text in hex or base64, e.g.,
import binascii
s = '\x00\x01\x02\x03ABCD0123'
binascii.hexlify(s) '000102034142434430313233' binascii.b2a_base64(s) 'AAECA0FCQ0QwMTIz\n'

which is also reversible later of course: h = binascii.hexlify(s)
binascii.unhexlify(h) '\x00\x01\x02\x03ABCD0123' b64 = binascii.b2a_base64(s)
binascii.a2b_base64(b64)

'\x00\x01\x02\x03ABCD0123'

Regards,
Bengt Richter
Feb 7 '06 #7
On Tue, 07 Feb 2006 15:06:49 +0000, Bengt Richter wrote:
On Mon, 06 Feb 2006 04:40:31 GMT, Chason Hayes <ch*****@hotmail.com> wrote:
I am trying to convert raw binary data to data with escaped octets in
order to store it in a bytea field on postgresql server. I could do this
easily in c/c++ but I need to do it in python. I am not sure how to read
and evaluate the binary value of a byte in a long string when it is a non
printable ascii value in python. I read some ways to use unpack from the
struct module, but i really couldn't understand where that would help. I
looked at the MIMIEncode module but I don't know how to convert the object
to a string. Is there a module that will convert the data? It seems to me
that this question must have been answered a million times before but I
can't find anything.

Have you considered just encoding the data as text in hex or base64, e.g.,
>>> import binascii
>>> s = '\x00\x01\x02\x03ABCD0123'
>>> binascii.hexlify(s) '000102034142434430313233' >>> binascii.b2a_base64(s) 'AAECA0FCQ0QwMTIz\n'

which is also reversible later of course: >>> h = binascii.hexlify(s)
>>> binascii.unhexlify(h) '\x00\x01\x02\x03ABCD0123' >>> b64 = binascii.b2a_base64(s)
>>> binascii.a2b_base64(b64)

'\x00\x01\x02\x03ABCD0123'

Regards,
Bengt Richter


I had just about come to that conclusion last night while I was working on
it. I was going to use
import base64
base64.stringencode(binarydata)
and
base64.stringdecode(stringdata)

I then wasn't sure if I should still use the bytea field or just use a
text field.

Do you have a suggestion?

Feb 8 '06 #8
On Tue, 07 Feb 2006 01:58:00 +0000, Steve Holden wrote:
Chason Hayes wrote:
On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:

[...]

The URL you reference is discussing how you represent arbitrary values
in string literals. If you already have the data in a Python string the
best advise is to use a parameterized query - that way your Python DB
API module will do the escaping for you!

regards
Steve

Thanks for the input. I tried that with a format string and a
dictionary, but I still received a database error indicating illegal
string values. This error went away completely when I used a test file
consisting only of text, but reproduced everytime with a true binary file.
If you can let me know where I am wrong or show me a code snippet with a
sql insert that contains a variable with raw binary data that works,
I would greatly appreciate it.

I tried and my experience was exactly the same, which made me think less
of PostgreSQL.

They don't seem to implement the SQL BLOB type properly, so it looks as
though that rebarbative syntax with all the backslashes is necessary. Sorry.

regards
Steve


with regards to escaping data parameters I have found that I have to
specifically add quotes to my strings for them to be understood by
pstgresql. For example

ifs=open("binarydatafile","r")
binarydata=ifs.read()
stringdata=base64.encodestring(binarydata)

#does not work
cursor.execute("insert into binarytable values(%s)" % stringdata)

#need to do this first
newstringdata = "'" + stringdata + "'"

then the select statment works.
Is this expected behavior? Is there a better way of doing this?

thanks for any insight
Chason
Feb 8 '06 #9
Chason Hayes wrote:
On Tue, 07 Feb 2006 01:58:00 +0000, Steve Holden wrote:

Chason Hayes wrote:
On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:


[...]
The URL you reference is discussing how you represent arbitrary values
in string literals. If you already have the data in a Python string the
best advise is to use a parameterized query - that way your Python DB
API module will do the escaping for you!

regards
Steve
Thanks for the input. I tried that with a format string and a
dictionary, but I still received a database error indicating illegal
string values. This error went away completely when I used a test file
consisting only of text, but reproduced everytime with a true binary file.
If you can let me know where I am wrong or show me a code snippet with a
sql insert that contains a variable with raw binary data that works,
I would greatly appreciate it.


I tried and my experience was exactly the same, which made me think less
of PostgreSQL.

They don't seem to implement the SQL BLOB type properly, so it looks as
though that rebarbative syntax with all the backslashes is necessary. Sorry.

regards
Steve

with regards to escaping data parameters I have found that I have to
specifically add quotes to my strings for them to be understood by
pstgresql. For example

ifs=open("binarydatafile","r")
binarydata=ifs.read()
stringdata=base64.encodestring(binarydata)

#does not work
cursor.execute("insert into binarytable values(%s)" % stringdata)

#need to do this first
newstringdata = "'" + stringdata + "'"

then the select statment works.
Is this expected behavior? Is there a better way of doing this?

thanks for any insight


Yes, parameterize your queries. I assume you are using psycopg or
something similar to create the database connection (i.e. I something
that expects the "%s" parameter style - there are other options, but we
needn't discuss them here).

The magic incantation you seek is:

cursor.execute("insert into binarytable values(%s)", (stringdata, ))

Note that here there are TWO arguments to the .execute() method. The
first is a parameterized SQL statement, and the second is a tuple of
data items, one for each parameter mark in the SQL.

Using this technique all necessary quoting (and even data conversion
with a good database module) is performed inside the database driver,
meaning (among other things) that your program is no longer vulnerable
to the dreaded SQL injection errors.

This is the technique I was hoping would work with the bytea datatype,
but alas it doesn't. ISTM that PostgreSQL needs a bit of work there,
even though it is otherwise a very polished product.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Feb 8 '06 #10
On Wed, 08 Feb 2006 00:57:45 -0500, Steve Holden wrote:
Chason Hayes wrote:
On Tue, 07 Feb 2006 01:58:00 +0000, Steve Holden wrote:

Chason Hayes wrote:

On Mon, 06 Feb 2006 13:39:17 +0000, Steve Holden wrote:

[...]

>The URL you reference is discussing how you represent arbitrary values
>in string literals. If you already have the data in a Python string the
>best advise is to use a parameterized query - that way your Python DB
>API module will do the escaping for you!
>
>regards
> Steve
Thanks for the input. I tried that with a format string and a
dictionary, but I still received a database error indicating illegal
string values. This error went away completely when I used a test file
consisting only of text, but reproduced everytime with a true binary file.
If you can let me know where I am wrong or show me a code snippet with a
sql insert that contains a variable with raw binary data that works,
I would greatly appreciate it.
I tried and my experience was exactly the same, which made me think less
of PostgreSQL.

They don't seem to implement the SQL BLOB type properly, so it looks as
though that rebarbative syntax with all the backslashes is necessary. Sorry.

regards
Steve

with regards to escaping data parameters I have found that I have to
specifically add quotes to my strings for them to be understood by
pstgresql. For example

ifs=open("binarydatafile","r")
binarydata=ifs.read()
stringdata=base64.encodestring(binarydata)

#does not work
cursor.execute("insert into binarytable values(%s)" % stringdata)

#need to do this first
newstringdata = "'" + stringdata + "'"

then the select statment works.
Is this expected behavior? Is there a better way of doing this?

thanks for any insight


Yes, parameterize your queries. I assume you are using psycopg or
something similar to create the database connection (i.e. I something
that expects the "%s" parameter style - there are other options, but we
needn't discuss them here).

The magic incantation you seek is:

cursor.execute("insert into binarytable values(%s)", (stringdata, ))

Note that here there are TWO arguments to the .execute() method. The
first is a parameterized SQL statement, and the second is a tuple of
data items, one for each parameter mark in the SQL.

Using this technique all necessary quoting (and even data conversion
with a good database module) is performed inside the database driver,
meaning (among other things) that your program is no longer vulnerable
to the dreaded SQL injection errors.

This is the technique I was hoping would work with the bytea datatype,
but alas it doesn't. ISTM that PostgreSQL needs a bit of work there,
even though it is otherwise a very polished product.

regards
Steve


That was it. Thanks for your great help.

Chason

Feb 9 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Justin Koivisto | last post by:
I am replacing a string in a text block that has a literal $ in it, and preg_replace is seeing it as a backreference. Here is what I am using: foreach($price_lists as $list)...
6
by: Walter L. Preuninger II | last post by:
I need to convert escape sequences entered into my program to the actual code. For example, \r becomes 0x0d I have looked over the FAQ, and searched the web, with no results. Is there a...
21
by: gary | last post by:
How would one make the ECMA-262 String.replace method work with a string literal? For example, if my string was "HELLO" how would I make it work in this instance. Please note my square...
11
by: emailscotta | last post by:
Below I declared a basic object literal with 2 methods. The "doSomething" method is call from the "useDoSomething" method but the call is only sucessful if I use the "this" keyword or qualify the...
4
by: Trev | last post by:
Hi everyone, Thanks to all who have helped with various issues in the past. I've come up with a new one though: I've run some html through a javascript converter; basically it takes the html and...
0
by: terence.parker | last post by:
This should be simple, but i've looked and looked and it seems all anyone wants to do is get the percent-encoding or convert FROM utf8 octets. But I want the octets themselves. As in, I want...
4
by: -Lost | last post by:
For example: var newlines = 'a\n\nb\n\nc'; alert(newlines); Yet, if I get that *exact* same line from an XMLHttpRequest's responseText, it is always alerted as: a\n\nb\n\nc
12
by: Torsten Bronger | last post by:
Hallöchen! I need some help with finding matches in a string that has some characters which are marked as escaped (in a separate list of indices). Escaped means that they must not be part of...
10
by: =?Utf-8?B?Qm9iQWNoZ2lsbA==?= | last post by:
How can I use a quote as a literal so it does get confused as not a literal? Thanks! Bob
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.