473,395 Members | 2,467 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

speeding up inserts


First of all, we are still running sapdb at the moment but are in the
process of moving to postgresql, so it seemed a good idea to post this type
of question here.

In our environment we have transaction processing, each transaction accounts
for 10-30 inserts and 3-4 selects. We also have users that use a management
interface for doing all sorts of queries on the data once it gets into the
db. Most of the user queries are selects with a few updates, and fewer
inserts.

The basic problem is that the transaction times are very critical, one
second is a big deal. The data inserted into the db from a transaction does
not have to happen instantly, it can be delayed (to a point anyways). Being
that there is only so much you can do to speed up inserts, I have been
testing out a different system of getting the data from the application into
the database.

Now what our application does is create the queries as it runs, then instead
of inserting them into the database it writes them all out to a single file
at the end of the transaction. This is a huge performance boost. We then
use a separate deamon to run the disk queue once every second and do all the
inserts. If for some reason the main application cant' write to disk it
will revert to inserting them directly.

Is this a crazy way to handle this? No matter what I have tried, opening
and writing a single line to a file on disk is way faster than any database
I have used. I even tried using BerkeleyDB as the queue instead of the
disk, but that wasn't a whole lot faster then using the cached database
handles (our application runs under mod perl).

Chris

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #1
3 1654

On Dec 31, 2003, at 14:16, Chris Ochs wrote:
Now what our application does is create the queries as it runs, then
instead
of inserting them into the database it writes them all out to a single
file
at the end of the transaction. This is a huge performance boost. We
then
use a separate deamon to run the disk queue once every second and do
all the
inserts. If for some reason the main application cant' write to disk
it
will revert to inserting them directly.

Is this a crazy way to handle this? No matter what I have tried,
opening
and writing a single line to a file on disk is way faster than any
database
I have used. I even tried using BerkeleyDB as the queue instead of the
disk, but that wasn't a whole lot faster then using the cached database
handles (our application runs under mod perl).


In my application, I've built a ``TransactionPipeline'' class that
queues up transactions for asynchronous storage. It made an incredible
difference in transaction processing speed in places where the
transaction isn't critical (the atomicity is important, but the main
place this stuff is used is in transactions that record the state of a
device, for example).

Conceptually, it's somewhat similar to yours. The thread that runs
the queue is triggered by the addition of a new transaction to execute
using a normal notification mechanism, so it's basically idle unless
stuff's going on, and if there's a lot of stuff going on, the queue
will just build up until incoming rates are lower than the rates at
which we can actually process stuff.

It's been the only thing that's kept our application running (against
MS SQL Server until we can throw that away in favor of postgres).

--
SPY My girlfriend asked me which one I like better.
pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <du****@spy.net>
| Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #2
"Chris Ochs" <ch***@paymentonline.com> writes:
Is this a crazy way to handle this?


Depends. Do you care if you lose that data (if the system crashes
before your daemon can insert it into the database)? I think the
majority of the win you are seeing comes from the fact that the data
doesn't actually have to get to disk --- your "write to file" never
gets further than kernel disk buffers in RAM.

I would think that you could get essentially the same win by aggregating
your database transactions into bigger ones. From a reliability point
of view you're doing that anyway --- whatever work the daemon processes
at a time is the real transaction size.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #3
"Chris Ochs" <ch***@paymentonline.com> writes:
Is this a crazy way to handle this?


Depends. Do you care if you lose that data (if the system crashes
before your daemon can insert it into the database)? I think the
majority of the win you are seeing comes from the fact that the data
doesn't actually have to get to disk --- your "write to file" never
gets further than kernel disk buffers in RAM.

I would think that you could get essentially the same win by aggregating
your database transactions into bigger ones. From a reliability point
of view you're doing that anyway --- whatever work the daemon processes
at a time is the real transaction size.

regards, tom lane


The transactions are as big as they can be, all the data is committed at
once. I'm guessing that for any database to be as fast as I want it, it
just needs bigger/better hardware, which isnt' an option at the moment.

I was also thinking about data loss with the disk queue. Right now it's
such a small risk, but as we do more transactions it gets bigger. So right
now yes it's an acceptable risk given the chance of it happening and what a
worst case scenario would look like. but at a point it wouldnt' be.

Chris


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Snyke | last post by:
Hi. I have a command line script which works really fine, the only problem is that it take *really* long for the first output to be printed on screen. Since I also get some HTTP headers I'm...
12
by: dvumani | last post by:
I have C code which computes the row sums of a matrix, divide each element of each row with the row sum and then compute the column sum of the resulting matrix. Is there a way I can speed up the...
9
by: mfyahya | last post by:
Hi, I'm new to databases :) I need help speeding up select queries on my data which are currently taking 4-5 seconds. I set up a single large table of coordinates data with an index on the fields...
0
by: Matik | last post by:
Hello everybody, Just short question: I have tables, which are only log tables (very less used for selects), but there is a lotof writing. I would like to have as much speed as possible by...
2
by: Robert Wilkens | last post by:
Ok... This may be the wrong forum, but it's the first place I'm trying. I'm new to C# and just implemented the 3-tier Distributed application from Chapter 1 (the first walkthrough) in the...
2
by: OHM | last post by:
I was wondering about this topic and although I accept that different situations call for different solutions, but wondered are there any other solutions and whether has anyone carried out a...
3
by: zelnaga | last post by:
according to the mysql manual, multiple inserts can be sped up by locking the table before doing them and unlocking the table afterwards. is the same true of multiple inserts in mysql? if so,...
10
by: ags5406 | last post by:
I've created an application that downloads data daily from a secure web site and stores that data in an Access database. Then there are different options for allowing the user to do keyword and...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.