473,766 Members | 2,120 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parallel insert to postgresql with thread

Hi..
I use the threading module for the fast operation. But i have some
problems..
This is my code sample:
=============== ==
conn =
psycopg2.connec t(user='postgre s',password='po stgres',databas e='postgres')
cursor = conn.cursor()
class paralel(Thread) :
def __init__ (self, veriler, sayii):
Thread.__init__ (self)
def run(self):
save(a, b, c)

def save(a,b,c):
cursor.execute( "INSERT INTO keywords (keyword) VALUES
('%s')" % a)
conn.commit()
cursor.execute( "SELECT
CURRVAL('keywor ds_keyword_id_s eq')")
idd=cursor.fetc hall()
return idd[0][0]

def start(hiz):
datas=[........]
for a in datas:
current = paralel(a, sayii)
current.start()
=============== ===
And it gives me different errors to try parallel insert. My querys
work in normal operation but in paralel don't work.
How can i insert data to postgresql the same moment ?
errors:
no results to fetch
cursor already closed

Oct 25 '07 #1
6 12154
Abandoned wrote:
Hi..
I use the threading module for the fast operation. But i have some
problems..
This is my code sample:
=============== ==
conn =
psycopg2.connec t(user='postgre s',password='po stgres',databas e='postgres')
cursor = conn.cursor()
class paralel(Thread) :
def __init__ (self, veriler, sayii):
Thread.__init__ (self)
def run(self):
save(a, b, c)

def save(a,b,c):
cursor.execute( "INSERT INTO keywords (keyword) VALUES
('%s')" % a)
conn.commit()
cursor.execute( "SELECT
CURRVAL('keywor ds_keyword_id_s eq')")
idd=cursor.fetc hall()
return idd[0][0]

def start(hiz):
datas=[........]
for a in datas:
current = paralel(a, sayii)
current.start()
=============== ===
And it gives me different errors to try parallel insert. My querys
work in normal operation but in paralel don't work.
How can i insert data to postgresql the same moment ?
errors:
no results to fetch
cursor already closed
DB modules aren't necessarily thread-safe. Most of the times, a connection
(and of course their cursor) can't be shared between threads.

So open a connection for each thread.

Diez
Oct 25 '07 #2
Diez B. Roggisch wrote:
Abandoned wrote:
>Hi..
I use the threading module for the fast operation. But ....
[in each thread]
>def save(a,b,c):
cursor.execute( "INSERT INTO ...
conn.commit()
cursor.execute( ...)
How can i insert data to postgresql the same moment ?...

DB modules aren't necessarily thread-safe. Most of the times, a connection
(and of course their cursor) can't be shared between threads.

So open a connection for each thread.
Note that your DB server will have to "serialize" your inserts, so
unless there is some other reason for the threads, a single thread
through a single connection to the DB is the way to go. Of course
it may be clever enough to behave "as if" they are serialized, but
mostly of your work parallelizing at your end simply creates new
work at the DB server end.

-Scott David Daniels
Sc***********@A cm.Org
Oct 25 '07 #3

On Oct 25, 2007, at 7:28 AM, Scott David Daniels wrote:
Diez B. Roggisch wrote:
>Abandoned wrote:
>>Hi..
I use the threading module for the fast operation. But ....
[in each thread]
>>def save(a,b,c):
cursor.execute( "INSERT INTO ...
conn.commit()
cursor.execute( ...)
How can i insert data to postgresql the same moment ?...

DB modules aren't necessarily thread-safe. Most of the times, a
connection
(and of course their cursor) can't be shared between threads.

So open a connection for each thread.

Note that your DB server will have to "serialize" your inserts, so
unless there is some other reason for the threads, a single thread
through a single connection to the DB is the way to go. Of course
it may be clever enough to behave "as if" they are serialized, but
mostly of your work parallelizing at your end simply creates new
work at the DB server end.
Fortunately, in his case, that's not necessarily true. If they do
all their work with the same connection then, yes, but there are
other problems with that as mention wrt thread safety and psycopg2.
If he goes the recommended route with a separate connection for each
thread, then Postgres will not serialize multiple inserts coming from
separate connections unless there is something like and ALTER TABLE
or REINDEX concurrently happening on the table. The whole serialized
inserts thing is strictly something popularized by MySQL and is by no
means necessary or standard (as with a lot of MySQL).

Erik Jones

Software Developer | Emma®
er**@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
Oct 25 '07 #4
Erik Jones wrote:
>
On Oct 25, 2007, at 7:28 AM, Scott David Daniels wrote:
>Diez B. Roggisch wrote:
>>Abandoned wrote:
Hi..
I use the threading module for the fast operation. But ....
[in each thread]
>>>def save(a,b,c):
cursor.execute( "INSERT INTO ...
conn.commit()
cursor.execute( ...)
How can i insert data to postgresql the same moment ?...

DB modules aren't necessarily thread-safe. Most of the times, a
connection (and ... cursor) can't be shared between threads.
So open a connection for each thread.

Note that your DB server will have to "serialize" your inserts, so
... a single thread through a single connection to the DB is the way
to go. Of course it (the DB server) may be clever enough to behave
"as if" they are serialized, but most of your work parallelizing at
your end simply creates new work at the DB server end.

Fortunately, in his case, that's not necessarily true.... If he
goes the recommended route with a separate connection for each thread,
then Postgres will not serialize multiple inserts coming from separate
connections unless there is something like and ALTER TABLE or REINDEX
concurrently happening on the table.
The whole serialized inserts thing is strictly something popularized
by MySQL and is by no means necessary or standard (as with a lot of
MySQL).
But he commits after every insert, which _does_ force serialization (if
only to provide safe transaction boundaries). I understand you can get
clever at how to do it, _but_ preserving ACID properties is exactly what
I mean by "serialize, " and while I like to bash MySQL as well as the
next person, I most certainly am not under the evil sway of the vile
MySQL cabal.

The server will have to be able to abort each transaction
_independently_ of the others, and so must serialize any index
updates that share a page by, for example, landing in the same node
of a B-Tree.

-Scott David Daniels
Sc***********@A cm.Org
Oct 26 '07 #5
If you're not Scott Daniels, beware that this conversation has gone
horribly off topic and, unless you have an interest in PostreSQL, you
may not want to bother reading on...

On Oct 25, 2007, at 9:46 PM, Scott David Daniels wrote:
Erik Jones wrote:
>>
On Oct 25, 2007, at 7:28 AM, Scott David Daniels wrote:
>>Diez B. Roggisch wrote:
Abandoned wrote:
Hi..
I use the threading module for the fast operation. But ....
[in each thread]
def save(a,b,c):
cursor.execute( "INSERT INTO ...
conn.commit()
cursor.execute( ...)
How can i insert data to postgresql the same moment ?...

DB modules aren't necessarily thread-safe. Most of the times, a
connection (and ... cursor) can't be shared between threads.
So open a connection for each thread.

Note that your DB server will have to "serialize" your inserts, so
... a single thread through a single connection to the DB is the way
to go. Of course it (the DB server) may be clever enough to behave
"as if" they are serialized, but most of your work parallelizing at
your end simply creates new work at the DB server end.

Fortunately, in his case, that's not necessarily true.... If he
goes the recommended route with a separate connection for each
thread,
then Postgres will not serialize multiple inserts coming from
separate
connections unless there is something like and ALTER TABLE or REINDEX
concurrently happening on the table.
The whole serialized inserts thing is strictly something popularized
by MySQL and is by no means necessary or standard (as with a lot of
MySQL).

But he commits after every insert, which _does_ force serialization
(if
only to provide safe transaction boundaries). I understand you can
get
clever at how to do it, _but_ preserving ACID properties is exactly
what
I mean by "serialize, "
First, bad idea to work with your own definition of a very domain
specific and standardized term. Especially when Postgres's Multi-
Version Concurrency Control mechanisms are designed specifically for
the purpose of preserve ACID compliance without forcing serialized
transactions on the user.

Second, unless he specifically sets his transaction level to
serializable, he will be working in read-committed mode. What this
specifically means is that two (or more) transactions writing to the
same table will not block any of the others. Let's say the user has
two concurrent inserts to run on the same table that, for whatever
reason, take a while to run (for example, they insert the results of
some horribly complex or inefficient select), if either is run in
serializable mode then which ever one starts a fraction of a second
sooner will run until completion before the second is even allowed to
begin. In (the default) read-committed mode they will both begin
executing as soon as they are called and will write their data
regardless of conflicts. At commit time (which may be sometime later
for transactions with multiple statements are used) is when conflicts
are resolved. So, if between the two example transactions there does
turn out to be a conflict betwen their results, whichever commits
second will roll back and, since the data written by the second
transaction will not be marked as committed, it will never be visible
to any other transactions and the space will remain available for
future transactions.

Here's the relevant portion of the Postgres docs on all of this:
http://www.postgresql.org/docs/8.2/i...tive/mvcc.html
and while I like to bash MySQL as well as the
next person, I most certainly am not under the evil sway of the vile
MySQL cabal.
Good to hear ;)
>
The server will have to be able to abort each transaction
_independently_ of the others, and so must serialize any index
updates that share a page by, for example, landing in the same node
of a B-Tree.
There is nothing inherent in B-Trees that prevents identical datum
from being written in them. If there was the only they'd be good for
would be unique indexes. Even if you do use a unique index, as noted
above, constraints and conflicts are only enforced at commit time.

Erik Jones

Software Developer | Emma®
er**@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
Oct 26 '07 #6
Le Thu, 25 Oct 2007 13:27:40 +0200, Diez B. Roggisch a écritÂ*:
DB modules aren't necessarily thread-safe. Most of the times, a
connection (and of course their cursor) can't be shared between threads.

So open a connection for each thread.

Diez
DB modules following DBAPI2 must define the following attribute:

"""
threadsafety

Integer constant stating the level of thread safety the
interface supports. Possible values are:

0 Threads may not share the module.
1 Threads may share the module, but not connections.
2 Threads may share the module and connections.
3 Threads may share the module, connections and
cursors.
"""

http://www.python.org/dev/peps/pep-0249/

--
Laurent POINTAL - la************* @laposte.net
Oct 26 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
2281
by: Joshua Nussbaum | last post by:
I came up with what I think is a good idea for making multithreading programming easier in any .NET language. I dont know where else to post it, so I'll try here. ..NET 2.0 adds the capability to write anonymous functions, it would be nice if there was a "parallel" statement, that could simplify writing threadprocs. e.g. (theoretical c#) public void DoSomeParallelStuff() {
16
17018
by: Philip Boonzaaier | last post by:
I want to be able to generate SQL statements that will go through a list of data, effectively row by row, enquire on the database if this exists in the selected table- If it exists, then the colums must be UPDATED, if not, they must be INSERTED. Logically then, I would like to SELECT * FROM <TABLE> WHERE ....<Values entered here>, and then IF FOUND UPDATE <TABLE> SET .... <Values entered here> ELSE INSERT INTO <TABLE> VALUES <Values...
7
4234
by: Steven D.Arnold | last post by:
How good is Postgres' performance for massive simultaneous insertions into the same heavily-indexed table? Are there any studies or benchmarks I can look at for that? I understand Postgres uses MVCC rather than conventional locking, which makes it easier to do parallel inserts. In my environment, I will have so many inserts that it is unworkable to have one machine do all the inserting -- it would max out the CPU of even a very...
11
9218
by: Sezai YILMAZ | last post by:
Hello I need high throughput while inserting into PostgreSQL. Because of that I did some PostgreSQL insert performance tests. ------------------------------------------------------------ -- Test schema create table logs ( logid serial primary key, ctime integer not null,
12
3981
by: Peter Eisentraut | last post by:
Is there any practical limit on the number of parallel connections that a PostgreSQL server can service? We're in the process of setting up a system that will require up to 10000 connections open in parallel. The query load is not the problem, but we're wondering about the number of connections. Does anyone have experience with these kinds of numbers? ---------------------------(end of broadcast)--------------------------- TIP 1:...
1
2209
by: Edwin Grubbs | last post by:
Hello, I have experienced problems with postgres hanging when two inserts reference the same foreign key. It appears that the second insert is waiting for the first insert to release a lock. Here is an example of how to recreate the problem. Please ignore the lack of sequences, since that is irrelevent to the problem. I have tested this under pg 7.4.3. ----------------------------------------------------------
14
7894
by: Dave Booker | last post by:
I'm doing some analysis that is readily broken up into many independent pieces, on a multicore machine. I thought it would be best to just queue like 1000 of these pieces in the ThreadPool, and let that object take care of running them in the background on the machine's free cycles. But I need the program to wait for all the queued pieces to finish before it continues. I tried giving each work item a ManualResetEvent and using...
3
2885
by: John | last post by:
I have a program that needs to run on a regular basis that looks at a queue table in my database. If there are items in the queue database I need to grab the data from the database and pass it to a web service to process. The web service will return either a completion code or error msg (the only errors expected would be timeout or cannot connection errors). This queue file can contain 0 to N messages. Naturally if there are no messages...
0
9571
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9404
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10168
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10009
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9838
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8835
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5279
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3929
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2806
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.