468,292 Members | 1,493 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,292 developers. It's quick & easy.

sql escaping module

Being new to pgdb, I'm finding there are lot of things I don't understand
when I read the PEP and the sparse documentation on pgdb.

I was hoping there would be a module that would properly escape longer text
strings to prevent sql injection -- and other things just make sure the
python string object ends up being a properly type for postgresql. I've
bought 3 books on postgresql and none of th code samples demonstrate this.

web searchs for 'python sql escape string' yeild way too many results.

Any pointers would be greatly appreciated.

--
David Bear
-- let me buy your intellectual property, I want to own your thoughts --
Dec 8 '05 #1
10 6337
David Bear wrote:
Being new to pgdb, I'm finding there are lot of things I don't understand
when I read the PEP and the sparse documentation on pgdb.

I was hoping there would be a module that would properly escape longer text
strings to prevent sql injection -- and other things just make sure the
python string object ends up being a properly type for postgresql. I've
bought 3 books on postgresql and none of th code samples demonstrate this.

web searchs for 'python sql escape string' yeild way too many results.

Any pointers would be greatly appreciated.


for x in range(1000000):
print "USE PARAMETERS TO PASS VALUES TO THE DATABASE"

</F>

Dec 8 '05 #2
Fredrik Lundh wrote:
web searchs for 'python sql escape string' yeild way too many results.

Any pointers would be greatly appreciated.


for x in range(1000000):
print "USE PARAMETERS TO PASS VALUES TO THE DATABASE"


for an example, see "listing 2" in the following article:

http://www.amk.ca/python/writing/DB-API.html

(the database used in that example uses the "?" parameter style. your database
may prefer another style; check the paramstyle variable. see the DB API spec
for a descriptoin)

(a linuxjournal version of that article is linked from the pygresql site)

</F>

Dec 8 '05 #3
David Bear <da********@asu.edu> wrote:
Being new to pgdb, I'm finding there are lot of things I don't understand
when I read the PEP and the sparse documentation on pgdb.

I was hoping there would be a module that would properly escape longer text
strings to prevent sql injection -- and other things just make sure the
python string object ends up being a properly type for postgresql. I've
bought 3 books on postgresql and none of th code samples demonstrate this.


All of the Python database modules will do this protection for you.
Example:

db = psycopg2.connect(database='dbname')
c = db.cursor()
c.execute( "INSERT INTO table1 VALUES (%s,%s,%s);", (var1, var2, var3) )

Note that I have used a "comma", not the Python % operator, and I have not
done any quoting in the query. By doing that, I am instructing the
database module to do whatever protective quoting may be required for the
values I have passed, and substitute the quoted values into the string.

As long as you use that scheme, you should be safe from injection. It's
only when people try to do it themselves that they get in trouble, as in:

c.execute( "INSERT INTO table1 VALUES ('%s','%s','%s');" % (var1, var2,
var3) ) # THIS IS WRONG

--
- Tim Roberts, ti**@probo.com
Providenza & Boekelheide, Inc.
Dec 8 '05 #4
David Bear wrote:
Being new to pgdb, I'm finding there are lot of things I don't understand
when I read the PEP and the sparse documentation on pgdb.

I was hoping there would be a module that would properly escape longer text
strings to prevent sql injection -- and other things just make sure the
python string object ends up being a properly type for postgresql. I've
bought 3 books on postgresql and none of th code samples demonstrate this.

web searchs for 'python sql escape string' yeild way too many results.

Any pointers would be greatly appreciated.


I think I know where David is coming from, as I also battled to
understand this. I think that I have now 'got it', so I would like to
offer my explanation.

I used to think that each DB-API module transformed the 'string +
parameters' into a valid SQL command before passing it to the db.
However, this is not what is happening.

Every modern database provides an API to allow applications to interact
with the database programmatically. Typically these are intended for C
programs, but other languages may be supported. The authors of the
various DB-API modules provide a python wrapper around this to allow
use from a python program.

Each of the API's includes the capability of passing commands in the
form of 'string + parameters' directly into the database. This means
that the data values are never embedded into the SQL command at all,
and therefore there is no possibility of injection attacks.

The various API's use different syntaxes for passing the parameters. It
would have been nice if the DB-API had specified one method, and left
it to the author of each module to transform this into the form
required by the underlying API. Unfortunately the DB-API allows a
choice of 'paramstyles'. There may be technical reasons for this, but
it does make supporting multiple databases awkward.

Frank Millman

Dec 8 '05 #5
Frank Millman wrote:
Each of the API's includes the capability of passing commands in the
form of 'string + parameters' directly into the database. This means
that the data values are never embedded into the SQL command at all,
and therefore there is no possibility of injection attacks.
another advantage with parameters is that if you do multiple operations which
differ only in parameters, the database may skip the SQL compilation and query
optimization passes.
The various API's use different syntaxes for passing the parameters. It
would have been nice if the DB-API had specified one method, and left
it to the author of each module to transform this into the form
required by the underlying API. Unfortunately the DB-API allows a
choice of 'paramstyles'. There may be technical reasons for this, but
it does make supporting multiple databases awkward.


agreed.

on the other hand, it shouldn't be that hard to create a function does this mapping
on the fly, so that drivers can be updated support any paramstyle... time for a DB
API 3.0 specification, perhaps ?

(I'd also like to see a better granularity; the current connection/cursor model is a
bit limited; a connection/cursor/query/result set model would be nicer, but I guess
ODBC gets in the way here...)

</F>

Dec 8 '05 #6
Fredrik Lundh wrote:
Frank Millman wrote:

Each of the API's includes the capability of passing commands in the
form of 'string + parameters' directly into the database. This means
that the data values are never embedded into the SQL command at all,
and therefore there is no possibility of injection attacks.

another advantage with parameters is that if you do multiple operations which
differ only in parameters, the database may skip the SQL compilation and query
optimization passes.

The various API's use different syntaxes for passing the parameters. It
would have been nice if the DB-API had specified one method, and left
it to the author of each module to transform this into the form
required by the underlying API. Unfortunately the DB-API allows a
choice of 'paramstyles'. There may be technical reasons for this, but
it does make supporting multiple databases awkward.

agreed.

indeed. I suspect (not having been involved in the decisions) that the
variations were to minimise the work module implementers had to do to
get their modules working.
on the other hand, it shouldn't be that hard to create a function does this mapping
on the fly, so that drivers can be updated support any paramstyle... time for a DB
API 3.0 specification, perhaps ?
It would be a little tricky to convert name-based ("named" and
"pyformat", requiring a data mapping) parameterizations to positional
ones ("qmark", "numeric" and "format", requiring a data sequence) and
vice versa. It's probably a worthwhile effort, though.
(I'd also like to see a better granularity; the current connection/cursor model is a
bit limited; a connection/cursor/query/result set model would be nicer, but I guess
ODBC gets in the way here...)

Yes, it would at least be nice to include some of the more advanced ways
of presenting query results.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Dec 8 '05 #7
Fredrik Lundh wrote:
David Bear wrote:
Being new to pgdb, I'm finding there are lot of things I don't understand
when I read the PEP and the sparse documentation on pgdb.

I was hoping there would be a module that would properly escape longer
text strings to prevent sql injection -- and other things just make sure
the python string object ends up being a properly type for postgresql.
I've bought 3 books on postgresql and none of th code samples demonstrate
this.

web searchs for 'python sql escape string' yeild way too many results.

Any pointers would be greatly appreciated.


for x in range(1000000):
print "USE PARAMETERS TO PASS VALUES TO THE DATABASE"

</F>

Yes. Fredrik and others. Thank you for the advice.

I know have the following code:

...
parmChar = '%s'
sqlInsert = """INSERT INTO %s (%s) VALUES (%s); """ % (tn, ",
".join(fieldnames), ", ".join([parmChar] * len(fieldnames)))
try:
cursor.execute(sqlInsert, datum)
except pgdb.DatabaseError:
logerror("Error on record insert \n %s \n %s" % (sqlInsert,
traceback.print_exc()))

I was not aware that the python db interface would just handle proper
escaping of python data types to proper postgresql data types.

Any other hints on database programming much appreciated.

--
David Bear
-- let me buy your intellectual property, I want to own your thoughts --
Dec 8 '05 #8
Steve Holden wrote: Fredrik Lundh wrote:
Frank Millman wrote:

Each of the API's includes the capability of passing commands in the
form of 'string + parameters' directly into the database. This means
that the data values are never embedded into the SQL command at all,
and therefore there is no possibility of injection attacks.



My news server didn't get Franks initial post to the group, so I'm glad that
Steve included it in his followup.

The statement above can cause relief or pain. Letting the DBAPI handle
proper string escapes, formating, etc., is a big relief. However, I am
still wondering what happens under the covers. If I have a string '1\n'
that I've read from some source and I really intend on inserting it into
the data base as a number 1, if the tape column it goes into is of type int
or num or float, will the DBAPI really know what to do with the newline?

--
David Bear
-- let me buy your intellectual property, I want to own your thoughts --
Dec 8 '05 #9

David Bear wrote:

The statement above can cause relief or pain. Letting the DBAPI handle
proper string escapes, formating, etc., is a big relief. However, I am
still wondering what happens under the covers. If I have a string '1\n'
that I've read from some source and I really intend on inserting it into
the data base as a number 1, if the tape column it goes into is of type int
or num or float, will the DBAPI really know what to do with the newline?


Try it and see. This is what I get -
import psycopg2
db = psycopg2.connect(database='mydb')
c = db.cursor()
c.execute('create table xxx (col1 int)')
c.execute('insert into xxx values (%s)', '1')
c.execute('insert into xxx values (%s)', 'a') psycopg.ProgrammingError: invalid input syntax for integer: "a" c.execute('insert into xxx values (%s)', '1\n')

TypeError: not all arguments converted during string formatting

Different DBAPI modules may handle it differently.

Frank

Dec 9 '05 #10
David Bear wrote:
Steve Holden wrote:


Fredrik Lundh wrote:
Frank Millman wrote:

Each of the API's includes the capability of passing commands in the
form of 'string + parameters' directly into the database. This means
that the data values are never embedded into the SQL command at all,
and therefore there is no possibility of injection attacks.


My news server didn't get Franks initial post to the group, so I'm glad that
Steve included it in his followup.

The statement above can cause relief or pain. Letting the DBAPI handle
proper string escapes, formating, etc., is a big relief. However, I am
still wondering what happens under the covers. If I have a string '1\n'
that I've read from some source and I really intend on inserting it into
the data base as a number 1, if the tape column it goes into is of type int
or num or float, will the DBAPI really know what to do with the newline?

Yes. If you read the DB API documentation
(http://www.python.org/peps/pep-0249.html) you will see that there's a
section on "Type Objects and Constructors". It's those that ensure a
value will be coerced into the required form if possible.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/

Dec 9 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Dave Moore | last post: by
5 posts views Thread by Aloysio Figueiredo | last post: by
8 posts views Thread by felciano | last post: by
5 posts views Thread by bobbyballgame | last post: by
11 posts views Thread by Geoff Caplan | last post: by
3 posts views Thread by Taras_96 | last post: by
46 posts views Thread by C C++ C++ | last post: by
reply views Thread by Teichintx | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.