473,403 Members | 2,071 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,403 software developers and data experts.

how many record versions

begin;
update t set val=val+1; -- 1000 times
commit;

How many record versions does it create? 1 or 1000? I'm implementing a
banner counter which is incremented at least 2-3 millions a day. I
thought I'd cheat by only commiting after every few minutes. Would that
work or would I still create as many record versions?

--
dave
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #1
14 1538
On Sun, May 23, 2004 at 02:44:31 +0700,
David Garamond <li***@zara.6.isreserved.com> wrote:
begin;
update t set val=val+1; -- 1000 times
commit;

How many record versions does it create? 1 or 1000? I'm implementing a
banner counter which is incremented at least 2-3 millions a day. I
thought I'd cheat by only commiting after every few minutes. Would that
work or would I still create as many record versions?


You might be better off keeping the counter in its own table and vacuuming
that table very often. It is unlikely that holding transactions open
for several minutes is a good idea. Also if you are doing multiple updates
in a single transaction, you are still going to get multiple rows.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #2
David Garamond <li***@zara.6.isreserved.com> writes:
begin;
update t set val=val+1; -- 1000 times
commit; How many record versions does it create? 1 or 1000?
1000.
I'm implementing a
banner counter which is incremented at least 2-3 millions a day. I
thought I'd cheat by only commiting after every few minutes. Would that
work or would I still create as many record versions?


Won't make a difference. You should think seriously about using a
sequence rather than an ordinary table for this.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #3
Manfred Koizar wrote:
begin;
update t set val=val+1; -- 1000 times
commit;

How many record versions does it create? 1 or 1000?


1000
I'm implementing a
banner counter which is incremented at least 2-3 millions a day.


How many rows? I would VACUUM that table after every few hundred
updates or whenever 10% to 20% of the rows have been updated, whichever
is greater.


Actually, each record will be incremented probably only thousands of
times a day. But there are many banners. Each record has a (bannerid,
campaignid, websiteid, date, countrycode) "dimensions" and (impression,
click) "measures". The table currently has +- 1,5-2 mil records (it's in
MyISAM MySQL), so I'm not sure if I can use that many sequences which
Tom suggested. Every impression (banner view) and click will result in a
SQL statement (= a "transaction" in MyISAM, since MyISAM doesn't support
BEGIN + COMMIT).

I'm contemplating of moving to Postgres, but am worried with the MVCC
thing. I've previously tried briefly using InnoDB in MySQL but have to
revert back to MyISAM because the load increased significantly.

--
dave
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #4
On Sun, 23 May 2004 23:32:48 +0700, David Garamond
<li***@zara.6.isreserved.com> wrote:
Actually, each record will be incremented probably only thousands of
times a day. But there are many banners. Each record has a (bannerid,
campaignid, websiteid, date, countrycode) "dimensions" and (impression,
click) "measures".
If you need all of bannerid, campaignid, websiteid, date, countrycode to
identify a row, it may be worth the effort to split this up into two
tables:

CREATE TABLE dimensions (
dimensionid int PRIMARY KEY,
bannerid ...,
campaignid ...,
websiteid ...,
date ...,
countrycode ...,
UNIQUE (bannerid, ..., countrycode)
);

CREATE TABLE measures (
dimensionid int PRIMARY KEY REFERENCES dimensions,
impression ...,
click ...
);

Thus you'd only update measures thousands of times and the index would
be much more compact, because the PK is only a four byte integer.
The table currently has +- 1,5-2 mil records (it's in
MyISAM MySQL), so I'm not sure if I can use that many sequences which
Tom suggested. Every impression (banner view) and click will result in a
SQL statement
Schedule a
VACUUM ANALYSE measures;
for every 100000 updates or so.
I'm contemplating of moving to Postgres, but am worried with the MVCC
thing. I've previously tried briefly using InnoDB in MySQL but have to
revert back to MyISAM because the load increased significantly.


You mean InnoDB cannot handle the load?

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #5
Manfred Koizar <mk*****@aon.at> writes:
On Sun, 23 May 2004 23:32:48 +0700, David Garamond
<li***@zara.6.isreserved.com> wrote:
I'm contemplating of moving to Postgres, but am worried with the MVCC
thing. I've previously tried briefly using InnoDB in MySQL but have to
revert back to MyISAM because the load increased significantly.
You mean InnoDB cannot handle the load?


I suspect what he meant is that InnoDB had exactly the same performance
issues with lots-of-dead-rows that Postgres will have. Around here,
however, we are accustomed to that behavior and know how to deal with
it, whereas I'll bet the MySQL community hasn't got that down yet ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #6

David Garamond <li***@zara.6.isreserved.com> writes:
Actually, each record will be incremented probably only thousands of times a
day. But there are many banners. Each record has a (bannerid, campaignid,
websiteid, date, countrycode) "dimensions" and (impression, click) "measures".


In the past when I had a very similar situation we kept the raw impression and
click event data. Ie, one record per impression in the impression table and
one record per click in the click data.

That makes the tables insert-only which is efficient and not prone to locking
contention. They would never have to be vacuumed except after purging old data.

Then to accelerate queries we had denormalized aggregate tables with a cron
job that did the equivalent of

insert into agg_clicks (
select count(*),bannerid
from clicks
where date between ? and ?
group by bannerid
)

Where the ?s were actually hourly periods. Ie, at 12:15 it ran this query for
the 11-12 period.

This meant we didn't have immediate up-to-date stats on banners but it meant
we did have stats on every single impression and click including time and
information about the users.

This worked out very well for reporting needs. If your system is using the
data to handle serving the ads, though, it's a different kettle of fish. For
that I think you'll want something that avoids having to do a database query
for every single impression.
--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #7
Manfred Koizar wrote:
You mean InnoDB cannot handle the load?


Perhaps it's more appropriate to say that the disk becomes the bottleneck.

--
dave
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #8
Greg Stark wrote:
Actually, each record will be incremented probably only thousands of times a
day. But there are many banners. Each record has a (bannerid, campaignid,
websiteid, date, countrycode) "dimensions" and (impression, click) "measures".
In the past when I had a very similar situation we kept the raw impression and
click event data. Ie, one record per impression in the impression table and
one record per click in the click data.

That makes the tables insert-only which is efficient and not prone to locking
contention. They would never have to be vacuumed except after purging old data.
Assuming there are 10 millions of impressions per day, the impression
table will grow at least 200-400MB per day, is that correct? What do you
do and how often do you purge old data? Do you do a mass DELETE on the
impression table itself or do you switch to another table? I've found
that deleting tens/hundreds of thousands of row, at least in InnoDB,
takes long, long time (plus it sucks CPU and slows other queries).
Then to accelerate queries we had denormalized aggregate tables with a cron
job that did the equivalent of

insert into agg_clicks (
select count(*),bannerid
from clicks
where date between ? and ?
group by bannerid
)


--
dave
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #9
On Mon, 24 May 2004, David Garamond wrote:
Manfred Koizar wrote:
You mean InnoDB cannot handle the load?


Perhaps it's more appropriate to say that the disk becomes the bottleneck.


Was this attempting to do each update in a separate transaction?

If so, that is certainly expected, with whatever disk based transactional
database you use. With innodb, you could hack around it by configuring
innodb not to flush its log to disk at every transaction, obviously at
the risk of loosing data if something crashes.

From what I have seen, I would expect innodb's multiversioning to work
better for the use you describe than postgresql's due how it implements
undo logs for updates. However, there could well be other differences
that could make postgresql work better for your application depending on
exactly what issues you are seeing.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #10
David Garamond <li***@zara.6.isreserved.com> writes:
Greg Stark wrote:
Actually, each record will be incremented probably only thousands of times a
day. But there are many banners. Each record has a (bannerid, campaignid,
websiteid, date, countrycode) "dimensions" and (impression, click) "measures".

In the past when I had a very similar situation we kept the raw impression and
click event data. Ie, one record per impression in the impression table and
one record per click in the click data.

That makes the tables insert-only which is efficient and not prone to locking
contention. They would never have to be vacuumed except after purging old data.


Assuming there are 10 millions of impressions per day, the impression table
will grow at least 200-400MB per day, is that correct? What do you do and how
often do you purge old data? Do you do a mass DELETE on the impression table
itself or do you switch to another table? I've found that deleting
tens/hundreds of thousands of row, at least in InnoDB, takes long, long time
(plus it sucks CPU and slows other queries).

Well this was actually under Oracle, but I can extrapolate to Postgres given
my experience.

The idea tool for the job is a feature that Postgres has discussed but hasn't
implemented yet, "partitioned tables". Under Oracle with partitioned tables we
were able to drop entire partitions virtually instantaneously. It also made
copying the data out to near-line backups much more efficient than index
scanning as well.

Before we implemented partitioned tables we used both techniques you
described. At first we had an ad-hoc procedure of creating a new table and
swapping it out. But that involved a short downtime and was a manual process.
Eventually we set up an automated batch job which used deletes.

Deletes under postgres should be fairly efficient. The I/O use would be
unavoidable, so doing it during off-peak hours would still be good. But it
shouldn't otherwise interfere with other queries. There should be no locking
contention, no additional work for other queries (like checking rollback
segments or logs) or any of the other problems other databases suffer from
with large updates.

I find the 10 million impressions per day pretty scary though. That's over
100/s across the entire 24 period. Probably twice that at peak hours. That
would have to be one pretty beefy server just to handle the transaction
processing itself. (And updates under postgres are essentially inserts where
vacuum cleans up the old tuple later, so they would be no less taxing.) A
server capable of handling that ought to be able to make quick work of
deleting a few hundred megabytes of records.

Another option is simply logging this data to a text file. Or multiple text
files one per server. Then you can load the text files with batch loads
offline. This avoids slowing down your servers handling the transactions in
the critical path. But it's yet more complex with more points for failure.

Something else you might be interested in is using a tool like this:

http://www.danga.com/memcached/

I could see it being useful for caching the counts you were looking to keep so
that the ad server doesn't need to consult the database to calculate which ad
to show. A separate job could periodically sync the counts to the database or
from the database.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #11
Greg Stark wrote:
Another option is simply logging this data to a text file. Or multiple text
Yes, this is what we've been doing recently. We write to a set of text
files and there's a process to commit to MySQL every 2-3 minutes (and if
the commit fails, we write to another text file to avoid the data being
lost). It works but I keep thinking how ugly the whole thing is :-)
files one per server. Then you can load the text files with batch loads
offline. This avoids slowing down your servers handling the transactions in
the critical path. But it's yet more complex with more points for failure.


--
dave
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #12
pw
unsubscribe all
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #13
Greg Stark wrote:
Well this was actually under Oracle, but I can extrapolate to Postgres given
my experience.

The idea tool for the job is a feature that Postgres has discussed but hasn't
implemented yet, "partitioned tables". Under Oracle with partitioned tables we
were able to drop entire partitions virtually instantaneously. It also made
copying the data out to near-line backups much more efficient than index
scanning as well.


I think you can get a similar effect by using inherited tables. Create
one "master" table, and then inherit individual "partition" tables from
that. Then you can easily create or drop a "partition", while still
being able to query the "master" and see all the rows.

HTH,

Joe

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #14
On Mon, May 24, 2004 at 11:15:07AM -0700, Joe Conway wrote:
Greg Stark wrote:
Well this was actually under Oracle, but I can extrapolate to Postgres
given
my experience.

The idea tool for the job is a feature that Postgres has discussed but
hasn't
implemented yet, "partitioned tables". Under Oracle with partitioned
tables we
were able to drop entire partitions virtually instantaneously. It also made
copying the data out to near-line backups much more efficient than index
scanning as well.


I think you can get a similar effect by using inherited tables. Create
one "master" table, and then inherit individual "partition" tables from
that. Then you can easily create or drop a "partition", while still
being able to query the "master" and see all the rows.


I've done this, in production, and it works fairly well. It's not as
clean as true partitioned tables (as a lot of things don't inherit)
but you can localise the nastiness in a pretty small bit of
application code.

Any query ends up looking like a long union of selects, which'll slow
things down somewhat, but I found that most of my queries had date range
selection on them so I could take advantage of that in the application
code to only query some subset of the inherited tables for most of the
application generated queries, while I could still do ad-hoc work from
the psql commandline using the parent table.

Cheers,
Steve
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Dan | last post by:
How can i find out what record number in the return set I am at? Example select *,recnum() from foo I would like to see recnum() be 1 then 2,3,4,5 and so on for each record returned.
17
by: Danny J. Lesandrini | last post by:
The following code works with a standard MDB to navigate to a particluar record (with a DAO recordset, of course) but it's giving me problems in an ADP I'm working on. Dim rs As ADODB.Recordset...
1
by: John | last post by:
I have a table that contains a product's forecasted revenue for each month. So: tbl_Forecast Forecast_ID Product_ID Year Month Revenue What's a good table design to track different versions...
1
by: solar | last post by:
Can i define the record source of the report in the command line opening the report ? I know how to place the record source in the OnOpen event of the report but i do not know is it posible to do...
1
by: Trevor Fairchild | last post by:
Hello all - I've been working in VB6 for about a year and now I've moved "up" to VB.NET. I think I would have fewer problems in learning vb.net if I knew nothing about previous versions of vb......
24
by: MP | last post by:
vb6, dao, mdb, win2k (no access) db.Execute "Update " & TABLE_NAME & " Set fldMark = 'unassigned'" i thought that would update all records in table it updated all but one to read them back...
16
by: google | last post by:
In a continuous form the following code is under a button in the form header. In Access 2003 and earlier, this goes to a new record, then adds relevant data to that new record. DoCmd.GoToRecord...
0
by: Cindy Huyser | last post by:
I have an Access 2000 database behind a threaded Java application that that can have have concurrent access to the same table (but not the same record). The database is set up for shared access...
7
by: Neil | last post by:
Was working in A2003 and noticed that the Save Record item on the Records menu was not available when the record was not dirty. In A2000, Save Record was always available. (This is a problem for me...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.