473,405 Members | 2,421 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Psycopg and threads problem

Hello,

I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.

Is there a way to tell psycopg or python to wait until the insert took
place?

Alban.
Jul 18 '05 #1
8 2107
On Mon, 2004-09-20 at 14:43 +0200, Alban Hertroys wrote:
Hello,

I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.

Is there a way to tell psycopg or python to wait until the insert took
place?


I believe turning autocommit on in the connection would achieve this (at
the expense of performance and transaction support). You could also add
a "commit" SQL statement to each thread.

Why not have the threads notify the other threads that it's okay to
start working? If you have to wait until the data is completely
inserted anyway, have your final thread perform the "commit" and then
set a threading.Event or somesuch .

Regards,
Cliff

--
Cliff Wells <cl************@comcast.net>

Jul 18 '05 #2
Alban Hertroys <al***@magproductions.nl> writes:
I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.
I just wrote a small test program and was unable to duplicate your
problem -- the inserting threads ran, then the collecting thread
saw the data.

Without seeing your code we can only guess at what's wrong. Please
post the simplest program that reproduces the problem.

What versions of Python, psycopg, and PostgreSQL are you using?
What operating system and version are you running on?
Is there a way to tell psycopg or python to wait until the insert took
place?


In my test program, the inserting threads called conn.commit() and
then exited. I then joined all of the inserting threads before
starting the collecting thread. You could also use a condition
variable to signal when the inserts have been committed.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
Jul 18 '05 #3
Alban Hertroys wrote:
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no


Make sure you commit the inserts. Otherwise you might
simply end up selecting on the old view.

There is a commit when you close the db connection so the data is
there when you check it later. When migrating from dbs
without transaction support this can be very confusing.

Istvan.
Jul 18 '05 #4
This has become a bit of a mixed reply, but I think it's better for
understanding the problem I'm trying to solve.

Istvan Albert wrote:
Alban Hertroys wrote:
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no

Make sure you commit the inserts. Otherwise you might
simply end up selecting on the old view.

There is a commit when you close the db connection so the data is
there when you check it later. When migrating from dbs
without transaction support this can be very confusing.


But PostgreSQL does have transaction support... I rely on that. It is
one of the reasons I chose for Python (+psycopg).

The script I'm working on does bulk inserts from multiple related XML
files (parsed using the sax parser) and take turns inserting small
batches of xml records from those files. The collection thread combines
these into 1 xml record, which is why it's so important that the inserts
are done in time.

I can't commit until all the data has been inserted and combined. The
commit shouldn't happen until the end of the main thread is reached.

If the server goes down, or there's another reason it can't continue
parsing, the whole transaction should rollback. Committing in between
would be 'problematic' (where in the XML files were we interupted? Hard
to tell).

Also, I don't think I can join the threads (as someone else suggested),
as they are still working in an Application instance (part of the SAX
parser). The threads are waiting until they're allowed to continue; by a
linked list of Events (so that I can remove events for threads that
finished - which shouldn't happen, but it may). Unless I misunderstand
thread joining, of course.

I have to admit that I have very little experience with thread
programming, and this is (part of) my first Python program. It is
definitely a steep learning curve; I hope I don't fall back down too often.

So far, Python has been nice to me :)

Alban.
Jul 18 '05 #5
Michael Fuhr wrote:
Alban Hertroys <al***@magproductions.nl> writes:
I'm using psycopg to insert records in a number of threads. After the
threads finish, another thread runs to collect the inserted data. Now,
my problem is that psycopg let's my threads end before the inserts
actually took place, resulting in my collecting thread finding no
records... They are inserted after it checks.

I just wrote a small test program and was unable to duplicate your
problem -- the inserting threads ran, then the collecting thread
saw the data.

Without seeing your code we can only guess at what's wrong. Please
post the simplest program that reproduces the problem.


That's going to be difficult, as it is a rather complex system of
interwoven threads with XML parsers and such. I tried making a test case
myself, but after an hour of programming I wasn't near the actual situation.
It probably doesn't help that I don't have any previous experience with
either Python or thread programming and that I _need_ a vacation...
What versions of Python, psycopg, and PostgreSQL are you using?
What operating system and version are you running on?


Python 2.3
Psycopg 1.0
PostgreSQL 7.4.2

Keep in mind that these are Debian Linux versions, and with their
tendency to "touch" things to break it better... ;)
Is there a way to tell psycopg or python to wait until the insert took
place?


In my test program, the inserting threads called conn.commit() and
then exited. I then joined all of the inserting threads before
starting the collecting thread. You could also use a condition
variable to signal when the inserts have been committed.


At the moment, I still use commit (it has to go eventually, I need to be
able to rollback), but the threads don't end at that point. Instead,
they wait until they can insert the next record.

This may be the cause of my problem. It's all rather complex; I may be
missing something in my own code (of which the base was done by someone
else, who can't be reached for a while, to make matters worse).

Alban.
Jul 18 '05 #6
Alban Hertroys wrote:
I can't commit until all the data has been inserted and combined. The
commit shouldn't happen until the end of the main thread is reached.


If you don't commit the inserts you cannot combine them (because they are
not visible) in a different database connection. I think you should
have a commit at the end of your insert threads. That way
when all the inserts are finished the data will be available
for the combine thread.

I might be wrong here but I think all this confusion arises because
psycopg pools database connections (which is a good thing, it speeds
up access). But then even when you seemingly open a new connection
you might be just reusing a db connection from the pool that was
started (and kept alive) before the inserts took place.

Istvan.
Jul 18 '05 #7
>
The script I'm working on does bulk inserts from multiple related XML
files (parsed using the sax parser) and take turns inserting small
batches of xml records from those files. The collection thread combines
these into 1 xml record, which is why it's so important that the inserts
are done in time.

I can't commit until all the data has been inserted and combined. The
commit shouldn't happen until the end of the main thread is reached.

If the server goes down, or there's another reason it can't continue
parsing, the whole transaction should rollback. Committing in between
would be 'problematic' (where in the XML files were we interupted? Hard
to tell).

Also, I don't think I can join the threads (as someone else suggested),
as they are still working in an Application instance (part of the SAX
parser). The threads are waiting until they're allowed to continue; by a
linked list of Events (so that I can remove events for threads that
finished - which shouldn't happen, but it may). Unless I misunderstand
thread joining, of course.


I would probally pass a queue instance to the worker threads
and in the main thread create a timer method/function that periodically
checks the queue.

When the worker thread is done doing its thing, do a commit and
queue.put('done')

When the main thread checks the queue.get(0) = 'done'
have it do it's thing.

This is the beauty of queues.
Jul 18 '05 #8
Istvan Albert wrote:
have a commit at the end of your insert threads. That way
when all the inserts are finished the data will be available
for the combine thread.


I forgot this, if the commit after insert is unfeasible
you could store the db connections in a data structure shared across threads
and if/when all the insert threads complete without errors you can call commit
on each of these before the combine thread.

Sounds a bit hackish though, just an idea.

Istvan.
Jul 18 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Jim Hefferon | last post by:
Hello, I want to try psycopg, the module for Postgres access. I'm having trouble reaching the site for four or five days now. For example, the link on the Python web site's DB Modules page...
0
by: Gandalf | last post by:
Hi All! Every time I get an error psycopg refuses to execute further commands in the same transaction: psycopg.ProgrammingError:ERROR: current transaction is aborted, commands ignored until...
12
by: Alban Hertroys | last post by:
Good day, I have a number of threads doing inserts in a table, after which I want to do a select. This means that it will occur that the row that I want to select is locked (by the DB). In these...
4
by: Alban Hertroys | last post by:
Another python/psycopg question, for which the solution is probably quite simple; I just don't know where to look. I have a query that inserts data originating from an utf-8 encoded XML file....
11
by: Alban Hertroys | last post by:
Oh no! It's me and transactions again :) I'm not really sure whether this is a limitation of psycopg or postgresql. When I use multiple cursors in a transaction, the records inserted at the...
7
by: jslowery | last post by:
Hello, I'm new to both PostgreSQL and psycopg and I'm trying to connect to my database running on localhost. I have postgres setup to do md5 authentication and this works when using a db admin tool...
4
by: Michele Simionato | last post by:
Look at this example: >>> import psycopg >>> psycopg.__version__ '1.1.19' >>> import datetime >>> today = datetime.datetime.today() >>> co = psycopg.connect('') >>> cu = co.cursor()
2
by: Martin P. Hellwig | last post by:
Hi all, I'm playing a bit with PostgreSQL, in which I've set me the target to create a python script which with user input creates a new user role and a database with that owner (connecting to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.