473,856 Members | 1,710 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

[Re] Checking each item in m.group()?

Hello

I need to go through each line of a CSV file, and extract some fields
using a regex. Then, I need to check each retrieved field, and if it
looks like "", turn this into NULL so that it's correct SQL.

I tried a few things, but still can't it working:
========
#Second field might be empty -""
#"Col1",""
#"Col1","Col 2"
p = re.compile('^"( .+?)","(.*?)"$' )

for line in textlines:
m = p.search(line)
if m:
#Check each column : if '', then turn into NULL

"""
for col in line:
if col == "":
col = "NULL"
"""

"""
for col in m.group():
if col == "":
col="NULL"
"""

"""
for col in m.group(0):
if col == "":
col="NULL"
"""

"""
for i in range (0,len(line)):
if line[i] == "":
line[i]="NULL"
"""

"""
for i in range(1,len(m.g roup(0))):
if m.group(i) == "":
m.group(i)="NUL L"
"""

sql = "INSERT INTO mytable (col1, col2) VALUES
('%s','%s')" % (m.group(1),m.g roup(2))
print sql
f.close()
========

Does someone know the correct syntax?

Thank you.
Jun 27 '08 #1
7 1837
On Jun 2, 11:42 am, "nos...@nospam. com" <Gilles@wrote :
Hello

I need to go through each line of a CSV file, and extract some fields
using a regex. Then, I need to check each retrieved field, and if it
looks like "", turn this into NULL so that it's correct SQL.

I tried a few things, but still can't it working:
========
#Second field might be empty -""
#"Col1",""
#"Col1","Col 2"
p = re.compile('^"( .+?)","(.*?)"$' )

for line in textlines:
m = p.search(line)
if m:
#Check each column : if '', then turn into NULL

"""
for col in line:
if col == "":
col = "NULL"
"""

"""
for col in m.group():
if col == "":
col="NULL"
"""

"""
for col in m.group(0):
if col == "":
col="NULL"
"""

"""
for i in range (0,len(line)):
if line[i] == "":
line[i]="NULL"
"""

"""
for i in range(1,len(m.g roup(0))):
if m.group(i) == "":
m.group(i)="NUL L"
"""

sql = "INSERT INTO mytable (col1, col2) VALUES
('%s','%s')" % (m.group(1),m.g roup(2))
print sql
f.close()
========

Does someone know the correct syntax?

Thank you.
I think you want to use 'groups' instead of 'group'.

Here is a brief example. Note that this code is very insecure and
susceptible to a SQL injection attack. Hopefully these csv files are
from a trusted source.

sql = "INSERT INTO mytable (col1, col2) VALUES ('%s','%s')"%tu ple(
(c, "NULL")[c == ''] for c in m.groups()
)

Also, check out the csv module for parsing your csv file.

Matt
Jun 27 '08 #2
On Jun 2, 5:06*pm, Peter Otten <__pete...@web. dewrote:
You are taking the wrong approach here.

Don't build SQL statements as strings; you are enabling the next SQL
injection attack. Pass parameters using the DB API instead.

Don't use regular expressions to parse a CSV file. Python's csv module is
more likely to deal correctly with the quirks of that standard.
I'd like to second both these statements. Regardless of whether these
CSV files are from a trusted source or not, it's a virtual truism of
programming that eventually, any application will be used in ways it
was not intended. Since using a parameterized query is a simple way
to avoid a common security hole, even if such a thing could never be
exploited by the app in its current configuration, you should do
things the Right Way. That way, even if your code is twisted to some
other use in the future, it's less likely to cause problems.
Jun 27 '08 #3
On Mon, 2 Jun 2008 12:06:21 -0700 (PDT), Matimus <mc******@gmail .com>
wrote:
>Here is a brief example. Note that this code is very insecure and
susceptible to a SQL injection attack. Hopefully these csv files are
from a trusted source.
Yes they are, and this script will only run on my PC, so it doesn't
need to be more secure than this.
>sql = "INSERT INTO mytable (col1, col2) VALUES ('%s','%s')"%tu ple(
(c, "NULL")[c == ''] for c in m.groups()
)
I don't understand this syntax :-/
>Also, check out the csv module for parsing your csv file.
Will do. Thank you.
Jun 27 '08 #4
mi***********@g mail.com wrote:
On Jun 2, 5:06 pm, Peter Otten <__pete...@web. dewrote:
>You are taking the wrong approach here.

Don't build SQL statements as strings; you are enabling the next SQL
injection attack. Pass parameters using the DB API instead.

Don't use regular expressions to parse a CSV file. Python's csv module is
more likely to deal correctly with the quirks of that standard.

I'd like to second both these statements. Regardless of whether these
CSV files are from a trusted source or not, it's a virtual truism of
programming that eventually, any application will be used in ways it
was not intended. Since using a parameterized query is a simple way
to avoid a common security hole, even if such a thing could never be
exploited by the app in its current configuration, you should do
things the Right Way. That way, even if your code is twisted to some
other use in the future, it's less likely to cause problems.
I don't have a problem with a response saying "it's
a good idea to use parameterized queries and explaining
why", but I have seen way too many responses like this
which are basically FUD.

I'm not sure what a "virtual" truism is, but if it is
like a truism, it's not true. There are many cases
where one can accurately predict that the code will
not be used in the future in some different app.
I don't know what the OP was doing, but I have done
many data conversion jobs where I have done things
similar to the OP. The jobs were one-time, move data
from system A to system B (with some munging in between)
and I could and did predict the conversion code would
not get reused. My accuracy rate is 100%.
And if you do reuse code where you feed it untrusted
input in a security sensitive context,
and you don't bother to verify the security of said,
code, you already have so many problems, one more
probably won't make much difference.

To the OP:
The advice to use parameterized queries is good
but overstated. There are cases when it is quite
safe to use non-parameterized statements:
* When you control the data going into the query
)e.g., you've generated it yourself).
* When the data come from trusted sources (including
something like sys.input if the only people with
access to the program are trusted).
* When you can reliably check the data yourself,
for example in:
sql = "SELECT * FROM foo WHERE id=%d" % int(some_string )
cursor.execute (sql)
it doesn't really matter what "some_strin g" contains
(if you are prepared for a Python exception). But
note that checking and escaping strings in more general
or complicated cases can be quite tricky.)

In most cases a good reason to use a parameterized query
is that it is no harder than to not use one, so why
not and get the additional safety for free? A parameterized
query can often run faster than a non-parameterized one
since the database can reuse the cached compiled query.
(But sometimes the opposite is true, see below).
A parameterized form of the above is:

sql = "SELECT * FROM foo WHERE id=?" % int(some_string )
cursor.execute (sql, int(some_string ))

so it is just as easy. There are times though when
it is slightly harder. If idnums is an arbitrary list
of ints:

sql = "SELECT * FROM foo WHERE id IN(%s) % ','.join(idnums )
cursor.execute (sql)

Using a parameterized query might look like:

sql = "SELECT * FROM foo WHERE id IN(%s) %
','.join(['?']*len(idnums))
cursor.execute (sql, idnums)

When you have written such code a few times it becomes
a natural idiom, but if you only do so occasionally,
you are in a hurry, and the conditions above apply,
then there is no reason not to go with the first form.

And if you you already have a text string of comma-separated
digits, the ease of using the direct sql form becomes
even greater:

sql = "SELECT * FROM foo WHERE id IN(%s) % idnums_string
cursor.execute (sql)

But of course, if "idnums_strings " came from an untrusted
source, then you need to validate it first, e.g.:

if idnums_string.s trip("012345678 9 ,"): then raise Error

There are also times when using a parameterized query
can dramatically (and I mean two or three *orders of
magnitude*) slow down your query when using prepared
queries. For example:

sql = "SELECT * FROM foo WHERE txt LIKE "%s%%" % some_string
cursor.execute (sql)

can be expected to run quite quickly in most database
systems, since the database knows that the searched for
text starts with a constant string and can thus use an
index. The parameterized form:

sql = "SELECT * FROM foo WHERE txt LIKE ?"
cursor.execute (sql, [some_string + "%"])

will often run very very slowly, because when the query
is prepared the database has no idea if the argument
will start with a constant string of not, and thus can
only assume the worst, and prepare the query so that
it doesn't use an index.

The bottom line is, as in all things, understanding
the issues will lead to much better decisions than
blindly following some dumbed down advice like,
"always use parameterized queries".

(And to the OP. if you already know all this, my
apologies if I sound like I'm talking down to you,
but perhaps other people may benefit. I get tired
of reading simplistic "do X, period." responses
sometimes.)
Jun 27 '08 #5
Gilles Ganault wrote:
On Mon, 2 Jun 2008 12:06:21 -0700 (PDT), Matimus <mc******@gmail .com>
wrote:
>Here is a brief example. Note that this code is very insecure ....
>sql = "INSERT INTO mytable (col1, col2) VALUES ('%s','%s')"%tu ple(
(c, "NULL")[c == ''] for c in m.groups()
)
I don't understand this syntax :-/
(c, "NULL") is a tuple; it is being indexed by the boolean "c == ''"
Since False is 0, and True is 1, the expression picks out "NULL"
exactly when c is the zero-length string.

A more idiomatic Python way of writing this (for the very special case
of '' or 0 or 0.0, or ...) is
sql = "INSERT INTO mytable (col1, col2) VALUES ('%s','%s')" % tuple(
(c or "NULL") for c in m.groups())

You can avoid problems w/ possible 0s or 0.0s or .. by using:
(str(c) or "NULL") for c in groups())

the "or" (or a similar "and" trick) will continue to work. The
different "empty" or "nothing" values of a data type are treated as
false in part to allow such shenanigans. If used sparingly, it make
your code clearer, but over-use can make the reader scratch his head in
wonder.

--Scott David Daniels
Sc***********@A cm.Org
Jun 27 '08 #6
On Mon, 02 Jun 2008 17:49:11 -0700, Scott David Daniels
<Sc***********@ Acm.Orgwrote:
>(c, "NULL") is a tuple; it is being indexed by the boolean "c == ''"
Since False is 0, and True is 1, the expression picks out "NULL"
exactly when c is the zero-length string.
Thanks Scott, and also to Peter above, and sorry for not having
answered earlier.

I'm having two problems: The first code doesn't strip the
double-quotes when the string is empty. IOW, "not empty" or NULL, but
not "NULL"

The second code hits "IndexError : list index out of range" because of
Yield() which I don't seem to be using correctly:

==========
p = re.compile("^(\ d+)\t(.*?)")
for line in textlines:
m = p.search(line)
if m:
sql = sql + 'INSERT INTO mytable (col1,col2) VALUES
("%s","%s"); ' % tuple ((c,"NULL")[c == ''] for c in m.groups())

#cursor.execute (sql)
connection.clos e(True)
print sql
==========
import csv
import sqlite3 as sqlite

def records(infile) :
for row in csv.reader(infi le):
#IndexError: list index out of range
#BAD yield row[0], row[1] or None
#BAD yield row[0] or None, row[1] or None

def main():
db = sqlite.connect( "test.sqlit e")
cursor = db.cursor()

#How to handle empty columns, ie. <TAB><TAB>?
cursor.executem any("insert into mytable (col1,col2) values (?,?);",
records("test.s qlite"))

if __name__ == "__main__":
main()
==========

Thank you.
Jun 27 '08 #7
On Sun, 15 Jun 2008 18:15:38 -0700, Dennis Lee Bieber
<wl*****@ix.net com.comwrote:
> I don't know quite what the reason for the sql = sql + ... is -- if
you are trying to package more than one discrete statement into a single
query you should be advised that not all adapters/DBMS support that
function (I think one uses "executescript( )" to signify multiple
distinct statements.
The script will go through about 3,000 lines of text, so I wanted to
create a transaction with BEGIN/COMMIT. It seems like APSW (the
simpler version of PySQLite) doesn't do transactions silently.
>A decent adapter should convert Python's None object into a proper DBMS
Null. The adapter is also responsible for doing any needed quoting or
escaping of the data supplied, hence no quoting of the placeholders!
Thanks for the tip. However, after looking at the code you gave, I'm
getting an error when running cur.execute(), so it is only ran once
and the program exits:

=========
import sys, re, apsw, os

connection=apsw .Connection("te st.sqlite")
cursor=connecti on.cursor()

textlines = []
textlines.appen d("123\titem1\t item2\titem3\ti tem4\t345\titem 6")
textlines.appen d("123\titem1\t \titem3\titem4\ t345\titem6")

p = re.compile("^(\ d+)\t(.*?)\t(.* ?)\t(.*?)\t(.*? )\t(\d+)\t(.+?) $")
for line in textlines:
m = p.search(line)
if m:
sql = 'INSERT INTO test (col1,col2,col3 ,col4,col5,col6 ,col7)
VALUES (?,?,?,?,?,?,?) ;'

"""
cursor.execute( sql, tuple((c, None)[c == ""] for c in m.groups()))
File "apsw.c", line 3518, in resetcursor
apsw.Constraint Error: ConstraintError : not an error
apsw.Connection NotClosedError: apsw.Connection on "test.sqlit e". The
destructor has been called, but you haven't closed the connection. All
connections must be explicitly closed. The SQLite database object is
being leaked.
"""
cursor.execute( sql, tuple((c, None)[c == ""] for c in m.groups()))

connection.comm it()
connection.clos e(True)

=========

I read the online sample for APSW, but didn't find what it could be.
Any idea?

Thank you.
Jun 27 '08 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3039
by: sarah | last post by:
Hi.. I'm having a problem trying to figure out the best way to process a POST and add items to my database, and hope someone can help me... :) Background: I have a database that I will use to hold Purchases of Items Each Item purchased is stored in it's own record: ....etc...
31
2517
by: AlexeiOst | last post by:
Everywhere in documentation there are recommendations to use threads from thread pooling for relatively short tasks. As I understand, fetching a page or multiple pages (sometimes up to 50 but not tipical) from the Internet and doing some processing on those would be considered to be a big/long task for a thread from a pool. In our app it is possible to break the task into some small ones (thread per fetch and processing thereafter or event...
8
2646
by: Mikey | last post by:
I have an XML document as follows: <Menu> <Group> <Item Text="About Us" AccessRoles="All"> <Group> <Item Text="Option 1" AccessRoles="All" /> <Item Text="Option 2" AccessRoles="All" /> <Item Text="Option 3" AccessRoles="All" /> </Group>
13
5069
by: Jason Huang | last post by:
Hi, Would someone explain the following coding more detail for me? What's the ( ) for? CurrentText = (TextBox)e.Item.Cells.Controls; Thanks. Jason
7
1874
by: Bruno Alexandre | last post by:
Hi guys, Sorry about the off topic, but I can't find the ASP group just the ASP dot NET If I want to block a user to change a form after submiting, I add a function that will disable the submit button, but when I'm getting the form collection (using post or get (form/querystring) I can't retrieve that
4
11504
by: Ron | last post by:
I've got a listbox that holds a list of groups. Users can select a group, hit the remove button and the group should be removed from the listbox. The only problem is that no matter which group you select, the first one in the listbox is always removed.(The listitem with an index of 0. Box is set to single selection mode) I've looked at multiple examples and they all do it this way. What's wrong? (variables are also being set to the values...
4
8967
by: deko | last post by:
Basic question about checking the value of Toggle/Check/Combo/OptionGroups.... Checking like this: If Me!chkCheckBox Then ... End If and like this:
26
4516
by: Tom Becker | last post by:
Is there a way, from Access, to programmatically click the Send and Receive button in Outlook?
5
2990
by: BMeyer | last post by:
I have been losing my mind trying to parse an XML document (with nested child elements, not all of which appear in each parent node) into a DataGrid object. What I want to do is "flatten" the XML document into a text document with a single row for each parent node (that has all of the values from all of the child nodes for that row) The DataView within VS 2005 IDE displays my 15 or so child tables - and knows that some parent rows...
2
2086
by: Ivor Somerset | last post by:
Hi, I've an Access DB table ("Groups") where data are as follow: Id Group Rank Item 1 1 1 7364 2 1 2 283 3 1 3 34888 4 2 1 277 5 2 2 8233
0
9906
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9758
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11051
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10379
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9528
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
7088
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4571
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4171
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3196
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.