473,695 Members | 2,082 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Good way to insert/update when you're not sure of duplicates?

I'm in a situation where I need to load an ASCII file into a database. No
sweat so far right? The records are indexed by date and I have a unique index
on date. The ASCII can overlap, meaning it can contain duplicate dates that
have been loaded before.

I started out with this ingenious idea. Just try to insert the record. If I
get an error about duplicate keys, do an update and switch to update mode. If
I'm updating and get zero rows updated, switch back to insert. Works fine,
except this whole thing is in a transaction and any errors abort the
transaction and I can't commit it. So, I decided I'd have to try to update
every single record and, if zero rows were updated, do the insert. This takes
a looooong time. Many hours for just 86000 records or so.

Is there a standard way to do this. I can't imagine I'm the only guy that need
to do this kind of thing.

I'm using Java with the postgresql JDBC driver if that matters.

Thanks for any suggestions.

Curtis Stanford
cu****@stanford computing.com
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #1
9 2123
On Thursday 25 September 2003 02:23, Curtis Stanford wrote:
I'm in a situation where I need to load an ASCII file into a database. No
sweat so far right? The records are indexed by date and I have a unique
index on date. The ASCII can overlap, meaning it can contain duplicate
dates that have been loaded before.

I started out with this ingenious idea. Just try to insert the record. If I
get an error about duplicate keys, do an update and switch to update mode.
If I'm updating and get zero rows updated, switch back to insert. Works
fine, except this whole thing is in a transaction and any errors abort the
transaction and I can't commit it. So, I decided I'd have to try to update
every single record and, if zero rows were updated, do the insert. This
takes a looooong time. Many hours for just 86000 records or so.

Is there a standard way to do this. I can't imagine I'm the only guy that
need to do this kind of thing.


Try inserting a batch of 1024. If you get an error, drop down to 512 and try
that. Repeat until the batch works or you've reached a size of 1.
If the batch worked, try the next set of records and repeat. If you reached a
batch size of 1 with no success then, switch to updating, and repeat the
cycle increasing your batch-size as you go.

You might find it quickest to halve batch-size while having problems then
doubling while it works. The balance is going to depend on how many insert vs
update rows you have.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #2
On Thursday 25 September 2003 18:32, Curtis Stanford wrote:
On September 25, 2003 11:27 am, Richard Huxton wrote:

[snip]
You might find it quickest to halve batch-size while having problems then
doubling while it works. The balance is going to depend on how many
insert vs update rows you have.


Hey thanks! I actually got the time down to around 1000 rows/sec. I was
passing in one of the int arguments in the update where clause as a double.
I don't know why, but changing it to an int as it should be drastically
reduced the time. Your solution is very interesting. I'll probably try it
and see which is faster.


That'll be the infamous PG typecast issue with indexes. PG is quite flexible
about you defining your own types, operators etc. To balance this it's also
quite careful about typecasting stuff without being told to. You tend to
notice this with function-calls and indexes not being used when they should.

If the index had been on a double and you'd passed an int, it might have
figured it out, but I tend to be very strict about my types in any case.
Actually, that's probably a good thing most of the time.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #3
Richard Huxton wrote:
On Thursday 25 September 2003 02:23, Curtis Stanford wrote:
I'm in a situation where I need to load an ASCII file into a database. No
sweat so far right? The records are indexed by date and I have a unique
index on date. The ASCII can overlap, meaning it can contain duplicate
dates that have been loaded before.

I started out with this ingenious idea. Just try to insert the record. If I
get an error about duplicate keys, do an update and switch to update mode.
If I'm updating and get zero rows updated, switch back to insert. Works
fine, except this whole thing is in a transaction and any errors abort the
transaction and I can't commit it. So, I decided I'd have to try to update
every single record and, if zero rows were updated, do the insert. This
takes a looooong time. Many hours for just 86000 records or so.

Is there a standard way to do this. I can't imagine I'm the only guy that
need to do this kind of thing.

Try inserting a batch of 1024. If you get an error, drop down to 512 and try
that. Repeat until the batch works or you've reached a size of 1.
If the batch worked, try the next set of records and repeat. If you reached a
batch size of 1 with no success then, switch to updating, and repeat the
cycle increasing your batch-size as you go.

You might find it quickest to halve batch-size while having problems then
doubling while it works. The balance is going to depend on how many insert vs
update rows you have.


In addition to that, you can try inserting from multiple backends simaltenously
to speed up the whole process.

And I don't like the modes idea OP gave. I would rather follow
insert->if-error-update mode in a transaction for each record. And fork over say
20/40 parallel backends to achieve good speed.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #4
On September 25, 2003 11:27 am, Richard Huxton wrote:
On Thursday 25 September 2003 02:23, Curtis Stanford wrote:
I'm in a situation where I need to load an ASCII file into a database. No
sweat so far right? The records are indexed by date and I have a unique
index on date. The ASCII can overlap, meaning it can contain duplicate
dates that have been loaded before.

I started out with this ingenious idea. Just try to insert the record. If
I get an error about duplicate keys, do an update and switch to update
mode. If I'm updating and get zero rows updated, switch back to insert.
Works fine, except this whole thing is in a transaction and any errors
abort the transaction and I can't commit it. So, I decided I'd have to
try to update every single record and, if zero rows were updated, do the
insert. This takes a looooong time. Many hours for just 86000 records or
so.

Is there a standard way to do this. I can't imagine I'm the only guy that
need to do this kind of thing.


Try inserting a batch of 1024. If you get an error, drop down to 512 and
try that. Repeat until the batch works or you've reached a size of 1.
If the batch worked, try the next set of records and repeat. If you reached
a batch size of 1 with no success then, switch to updating, and repeat the
cycle increasing your batch-size as you go.

You might find it quickest to halve batch-size while having problems then
doubling while it works. The balance is going to depend on how many insert
vs update rows you have.


Hey thanks! I actually got the time down to around 1000 rows/sec. I was
passing in one of the int arguments in the update where clause as a double. I
don't know why, but changing it to an int as it should be drastically reduced
the time. Your solution is very interesting. I'll probably try it and see
which is faster.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 12 '05 #5
Shridhar Daithankar wrote:

In addition to that, you can try inserting from multiple backends
simaltenously to speed up the whole process.

And I don't like the modes idea OP gave. I would rather follow
insert->if-error-update mode in a transaction for each record. And
fork over say 20/40 parallel backends to achieve good speed.

I didn't think mulitple backends worked.
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #6
On Friday 26 September 2003 20:48, Dennis Gearon wrote:
Shridhar Daithankar wrote:
In addition to that, you can try inserting from multiple backends
simaltenously to speed up the whole process.

And I don't like the modes idea OP gave. I would rather follow
insert->if-error-update mode in a transaction for each record. And
fork over say 20/40 parallel backends to achieve good speed.


I didn't think mulitple backends worked.


Why? Why can't you insert using multiple backends? I mean is there a problem?

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #7
On Friday 26 September 2003 21:08, Dennis Gearon wrote:
corruption of the databases, if I'm correct. Shridhar Daithankar wrote:
On Friday 26 September 2003 20:48, Dennis Gearon wrote:
I didn't think mulitple backends worked.


Why? Why can't you insert using multiple backends? I mean is there a
problem?


How does database gets corrupted? If you have 20 backend inserting their own
chunk of data in separate transactions, where does database corruption comes
in picture?

That's the whole idea of transactions.

Shridhar
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #8
Well, I may have my merms tixed up!

There is ONE component of Postgres that can only be accessing the
database by itself. Maybe it's not the 'backend', it's probably the
postmaster. Are they the same? I don't know.

Shridhar Daithankar wrote:
On Friday 26 September 2003 21:08, Dennis Gearon wrote:

corruption of the databases, if I'm correct.

Shridhar Daithankar wrote:

On Friday 26 September 2003 20:48, Dennis Gearon wrote:
I didn't think mulitple backends worked.
Why? Why can't you insert using multiple backends? I mean is there a
problem?


How does database gets corrupted? If you have 20 backend inserting their own
chunk of data in separate transactions, where does database corruption comes
in picture?

That's the whole idea of transactions.

Shridhar

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #9
On Fri, 26 Sep 2003, Shridhar Daithankar wrote:
On Friday 26 September 2003 21:08, Dennis Gearon wrote:
corruption of the databases, if I'm correct.

Shridhar Daithankar wrote:
On Friday 26 September 2003 20:48, Dennis Gearon wrote:
>I didn't think mulitple backends worked.

Why? Why can't you insert using multiple backends? I mean is there a
problem?


How does database gets corrupted? If you have 20 backend inserting their own
chunk of data in separate transactions, where does database corruption comes
in picture?

That's the whole idea of transactions.


I think Dennis is confusing multiple backends (legal and safe, each
connection gets its own backend) with multiple postmasters (not legal,
very unsafe, causes corruption, etc...)
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
5001
by: Oliver Spiesshofer | last post by:
Hi, I want to make a standard-function that handles "Are you sure"-type questions. In my idea it should look something like this: <code> Function deleteitem($item) {
11
2387
by: Kamus of Kadizhar | last post by:
I have the following function which generates MD5 hashes for files on a local and remote server. The remote server has a little applet that runs from inetd and generates an MD5 hash given the file name. The problem is that it takes 2+ minutes to generate the MD5 hash, so this function takes about 5 minutes every time it is called. Since the first MD5 hash is generated on a remote machine, the local machine does nothing but wait for...
10
1427
by: Steven Matthew Bennett | last post by:
I don't know about the rest of the posters, but I came to this NG to learn more about Access, not to have some idiot homophobic dipshit spewing hate with every post. You have nothing to add to this group that is remotely on-topic, so unless you're gonna learn Access do us a favor and get lost. Do you know how many Access MVP's post here? Do you know how much of their valuable time is spent helping people? I've had a number of MVP's...
0
4808
by: Amos Soma | last post by:
Hello. Our company has created a .Net application which the user starts by executing via a URL (i.e., http://someserver/application.exe). On some machines running Windows XP SP2, the user will see a dialog containing the message "The Publisher could not be verified. Are you sure you want to run this software." This does not happen on most machines running SP2, but it does occur on a couple. And yet, the Security settings on the machines...
6
7206
by: Alex Maghen | last post by:
Let's say I have a web forms button which performs a Delete or something. I'd like a click on the button to pop an "Are you Sure" dialog on the client side and then only execute the button's OnClick operation on the server side if the user had clicked Ok (as opposed to Cancel) on the pop-up dialog. I know I can do this all myself with JavaScript, but I was just wondering if any such functionality is already built-in in ASP.NET 2.0 so that...
32
4051
by: Nu | last post by:
I want to protect myself from if someone with a fast connection hammers my site. It's not denial of service attacks, but offline downloaders (of course that don't show they're offline downloaders in the useragent so I can't filter them by that). My main issue is my site is PHP so if they hammer it, it gets all the PHP files executing and overwhelms the CPU. I'd like to be able to after a certain amount of hits on my index.php per second,...
2
1265
by: Mr. SweatyFinger | last post by:
how can i get an "are you sure" message before delete? thanks
6
5217
by: ARC | last post by:
This is strange... I set a break point on a form's code module, and then after a bit, shut down the main program file. Now whenever I click the button for the form, it opens the code module, with a highlighted line (which is where I Had set the breakpoint). No matter what I do now, it breaks out into the code module when I click the button. Even though there is no longer a breakpoint there. I tried compacting / repairing, clearing all...
0
1137
by: miikka | last post by:
Is it possible to have dynamic update (preview) when re-sizing column in DataGridView?
3
1512
by: sheldonlg | last post by:
I have a page that has three submit buttons. Depending upon which one was clicked, I process it differently on the server in php. One of those buttons is a delete action so I would like to add an "Are you sure?" for that button. However, I don't want to do anything with the other buttons. I know how to do it if that were the only button ( onsubmit="return validateAction()" ).
0
8638
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8998
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8831
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7668
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6494
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5841
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3013
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2278
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
1979
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.