By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,131 Members | 1,728 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,131 IT Pros & Developers. It's quick & easy.

Psycopg and threads problem

P: n/a
Hello,

I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.

Is there a way to tell psycopg or python to wait until the insert took
place?

Alban.
Jul 18 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
On Mon, 2004-09-20 at 14:43 +0200, Alban Hertroys wrote:
Hello,

I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.

Is there a way to tell psycopg or python to wait until the insert took
place?


I believe turning autocommit on in the connection would achieve this (at
the expense of performance and transaction support). You could also add
a "commit" SQL statement to each thread.

Why not have the threads notify the other threads that it's okay to
start working? If you have to wait until the data is completely
inserted anyway, have your final thread perform the "commit" and then
set a threading.Event or somesuch .

Regards,
Cliff

--
Cliff Wells <cl************@comcast.net>

Jul 18 '05 #2

P: n/a
Alban Hertroys <al***@magproductions.nl> writes:
I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.
I just wrote a small test program and was unable to duplicate your
problem -- the inserting threads ran, then the collecting thread
saw the data.

Without seeing your code we can only guess at what's wrong. Please
post the simplest program that reproduces the problem.

What versions of Python, psycopg, and PostgreSQL are you using?
What operating system and version are you running on?
Is there a way to tell psycopg or python to wait until the insert took
place?


In my test program, the inserting threads called conn.commit() and
then exited. I then joined all of the inserting threads before
starting the collecting thread. You could also use a condition
variable to signal when the inserts have been committed.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
Jul 18 '05 #3

P: n/a
Alban Hertroys wrote:
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no


Make sure you commit the inserts. Otherwise you might
simply end up selecting on the old view.

There is a commit when you close the db connection so the data is
there when you check it later. When migrating from dbs
without transaction support this can be very confusing.

Istvan.
Jul 18 '05 #4

P: n/a
This has become a bit of a mixed reply, but I think it's better for
understanding the problem I'm trying to solve.

Istvan Albert wrote:
Alban Hertroys wrote:
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no

Make sure you commit the inserts. Otherwise you might
simply end up selecting on the old view.

There is a commit when you close the db connection so the data is
there when you check it later. When migrating from dbs
without transaction support this can be very confusing.


But PostgreSQL does have transaction support... I rely on that. It is
one of the reasons I chose for Python (+psycopg).

The script I'm working on does bulk inserts from multiple related XML
files (parsed using the sax parser) and take turns inserting small
batches of xml records from those files. The collection thread combines
these into 1 xml record, which is why it's so important that the inserts
are done in time.

I can't commit until all the data has been inserted and combined. The
commit shouldn't happen until the end of the main thread is reached.

If the server goes down, or there's another reason it can't continue
parsing, the whole transaction should rollback. Committing in between
would be 'problematic' (where in the XML files were we interupted? Hard
to tell).

Also, I don't think I can join the threads (as someone else suggested),
as they are still working in an Application instance (part of the SAX
parser). The threads are waiting until they're allowed to continue; by a
linked list of Events (so that I can remove events for threads that
finished - which shouldn't happen, but it may). Unless I misunderstand
thread joining, of course.

I have to admit that I have very little experience with thread
programming, and this is (part of) my first Python program. It is
definitely a steep learning curve; I hope I don't fall back down too often.

So far, Python has been nice to me :)

Alban.
Jul 18 '05 #5

P: n/a
Michael Fuhr wrote:
Alban Hertroys <al***@magproductions.nl> writes:
I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.

I just wrote a small test program and was unable to duplicate your
problem -- the inserting threads ran, then the collecting thread
saw the data.

Without seeing your code we can only guess at what's wrong. Please
post the simplest program that reproduces the problem.


That's going to be difficult, as it is a rather complex system of
interwoven threads with XML parsers and such. I tried making a test case
myself, but after an hour of programming I wasn't near the actual situation.
It probably doesn't help that I don't have any previous experience with
either Python or thread programming and that I _need_ a vacation...
What versions of Python, psycopg, and PostgreSQL are you using?
What operating system and version are you running on?


Python 2.3
Psycopg 1.0
PostgreSQL 7.4.2

Keep in mind that these are Debian Linux versions, and with their
tendency to "touch" things to break it better... ;)
Is there a way to tell psycopg or python to wait until the insert took
place?


In my test program, the inserting threads called conn.commit() and
then exited. I then joined all of the inserting threads before
starting the collecting thread. You could also use a condition
variable to signal when the inserts have been committed.


At the moment, I still use commit (it has to go eventually, I need to be
able to rollback), but the threads don't end at that point. Instead,
they wait until they can insert the next record.

This may be the cause of my problem. It's all rather complex; I may be
missing something in my own code (of which the base was done by someone
else, who can't be reached for a while, to make matters worse).

Alban.
Jul 18 '05 #6

P: n/a
Alban Hertroys wrote:
I can't commit until all the data has been inserted and combined. The
commit shouldn't happen until the end of the main thread is reached.


If you don't commit the inserts you cannot combine them (because they are
not visible) in a different database connection. I think you should
have a commit at the end of your insert threads. That way
when all the inserts are finished the data will be available
for the combine thread.

I might be wrong here but I think all this confusion arises because
psycopg pools database connections (which is a good thing, it speeds
up access). But then even when you seemingly open a new connection
you might be just reusing a db connection from the pool that was
started (and kept alive) before the inserts took place.

Istvan.
Jul 18 '05 #7

P: n/a
>
The script I'm working on does bulk inserts from multiple related XML
files (parsed using the sax parser) and take turns inserting small
batches of xml records from those files. The collection thread combines
these into 1 xml record, which is why it's so important that the inserts
are done in time.

I can't commit until all the data has been inserted and combined. The
commit shouldn't happen until the end of the main thread is reached.

If the server goes down, or there's another reason it can't continue
parsing, the whole transaction should rollback. Committing in between
would be 'problematic' (where in the XML files were we interupted? Hard
to tell).

Also, I don't think I can join the threads (as someone else suggested),
as they are still working in an Application instance (part of the SAX
parser). The threads are waiting until they're allowed to continue; by a
linked list of Events (so that I can remove events for threads that
finished - which shouldn't happen, but it may). Unless I misunderstand
thread joining, of course.


I would probally pass a queue instance to the worker threads
and in the main thread create a timer method/function that periodically
checks the queue.

When the worker thread is done doing its thing, do a commit and
queue.put('done')

When the main thread checks the queue.get(0) = 'done'
have it do it's thing.

This is the beauty of queues.
Jul 18 '05 #8

P: n/a
Istvan Albert wrote:
have a commit at the end of your insert threads. That way
when all the inserts are finished the data will be available
for the combine thread.


I forgot this, if the commit after insert is unfeasible
you could store the db connections in a data structure shared across threads
and if/when all the insert threads complete without errors you can call commit
on each of these before the combine thread.

Sounds a bit hackish though, just an idea.

Istvan.
Jul 18 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.