After a long battle with technology, "Tim Nelson" <ti***************@softhome.net>, an earthling, wrote:
I am a newbie try to port my applications to Postgres. I have an
application that is bulk loading a table with autocommit off (with
it on it's way to slow). The logic of the application dictates that
I try the insert, and if it fails because of a duplicate, update the
record instead. The entire bulk load is wrapped in a transaction
for speed purposes. When I hit a duplicate I can detect it, but
when I try the update it fails with:
ERROR: current transaction is aborted, commands ignored until end of
transaction block
I assume this is a result of the duplicate error, but is there a way
I can get around this. In my case duplicate is fatal to my
transaction...
There is no ready way of blindly doing this.
Several improvements would seem plausible:
1. Try to do groups of 1000 records, committing every 1K, so that if
there's a failure, it is only expected to roll back on the order
of 500 rows.
2. Along with that approach, keep the last 1K records 'buffered' in
memory, so that if you hit a failure, you can quickly go back and
redo the ones that _were_ good, throwing in a COMMIT as soon as
you complete all the records you _know_ to be good.
3. Alternatively, before doing each row, do a query that checks to
see if the relevant key is already there. That is regrettably
likely to be rather time-consuming, and you'll be firing in
great gobs of little queries.
4. You could create a stored procedure that amounts to "insert or
ignore", and pass the data through that.
Thus, instead of
insert into mytable (f1,f2,f3,f4) values ('this', 'that',
'other', 'foo');
You might do...
select add_to_my_table ('this','that','other', 'foo');
5. You can cut down on the number of statements by grouping these
together...
select atmt ([row 1 data]), atmt([row 2 data]), atmt([row3
data]), ... atmt([row 20 data]).
That allows you to submit 20 records in one statement, which
is fairly likely to save some query submission overhead.
It oughtn't make parsing too much more expensive.
6. Stage the bulk load.
Load the data into a "staging" table, which has minimal (if any)
integrity checks; this would be most efficiently handled via
COPY, which would load it all in one transaction.
Then take that data and insert the relevant bits into the new
location.
Thus:
-- This'll be way fast
copy stage_table from '/tmp/cruddy_data.txt';
-- Maybe do some validation, fix crud, delete crud, and such...
select *; -- Look for crud
update *; -- Fix crud
delete *; -- Delete crud
-- stage_table now contains gleaming fixed data; load into final table...
insert into ultimate_table
select * from stage_table s where
not exists (select * from ultimate_table u where u.pkey = s.pkey);
-- Alternatively, if you had deleted the cruddy data, it might
-- just be:
insert into ultimate_table
select * from stage_table s;
I rather like approach #6, FYI...
--
output = ("cbbrowne" "@" "ntlug.org")
http://www3.sympatico.ca/cbbrowne/lsf.html
"The wrath of Holloway is nothing compared to the wrath of Moon."
-- Fred Drenckhahn