Select for update, locks and transaction levels

Nick Barr

Hi,

I am trying to gather stats about how many times a resource in our web
app is viewed, i.e. just a COUNT. There are potentially millions of
resources within the system.

I thought of two methods:

1. An extra column in the resource table which contains a count.
a. Each time a resource is viewed an UPDATE statement is run.

UPDATE res_table SET view_count = view_count + 1 WHERE
res_id=2177526: :bigint;

b. The count is just SELECTed from the resource table.
2. A separate table that contains a count using an algorithm similar
to the method presented here:

http://archives.postgresql.org/pgsql...1/msg00059.php

a. Each time a resource is viewed a new row is inserted with a count
of 1.
b. Each time the view count is needed, rows from the table are SUMmed
together.
c. A compression script runs regularly to group and sum the rows
together.

I personally did not like the look of 1 so I thought about using 2. The
main reason being there would be no locks that would interfere with
"updating" the view count because in fact this was just an INSERT
statement. Also vacuuming on the new table is preferred as it is
considerably thinner (i.e. less columns) than the resource table. The
second method allows me to capture more data too, such as who viewed the
resource, which resource they viewed next, but I digress :-).

Q1.Have I missed any methods?

I thought I would have a further look 2 and have some questions about
that too.

The schema for this new table is shown below.

-- SCHEMA
---------------------------------------------------------------
CREATE TABLE view_res (
res_id int8,
count int8
) WITHOUT OIDS;

CREATE INDEX view_res_res_id _idx ON view_res (res_id);
------------------------------------------------------------------------

And the compression script should reduce the following rows:

-- QUERY ---------------------------------------------------------------
db_dev=# select * from view_res where res_id=2177526: :bigint;
res_id | count
---------+-------
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
2177526 | 1
(8 rows)
------------------------------------------------------------------------

to the following

-- QUERY
---------------------------------------------------------------db_dev=#
select * from view_res where res_id=2177526: :bigint;
res_id | count
---------+-------
2177526 | 8
(1 rows)
------------------------------------------------------------------------

Now I must admit I have never really played around with select for
update, locks or transaction levels, hence the questions. I have looked
in the docs and think I figured out what I need to do. The following is
pseudo-code for the compression script.

------------------------------------------------------------------------
BEGIN;

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT res_id, sum(count) AS res_count FROM view_res GROUP BY res_id FOR
UPDATE;

For each row
{
DELETE FROM view_res WHERE res_id=<res_id> ::biignt

INSERT INTO view_res (res_id, count) VALUES (<res_id>,
<res_count>);
}

COMMIT;
------------------------------------------------------------------------

Right the questions for this method:

Q2.Will a "group by" used with a "select . for update" lock all the rows
used for the sum?
Q3.Am I right in saying freshly inserted rows will not be affected by
the delete because of the SERIALIZABLE transaction level?
Q4.Are there any other concurrency issues that I have not though of?
BTW, this is still at the planning phase so a complete redesign is
perfectly fine. Just seeing if anyone has greater experience than me at
this sort of thing.
TIA
Nick Barr

Nov 22 '05 #1

Subscribe Reply

11303

Tom Lane

"Nick Barr" <ni*******@webb ased.co.uk> writes:

I personally did not like the look of 1 so I thought about using 2. The
main reason being there would be no locks that would interfere with
"updating" the view count because in fact this was just an INSERT
statement.
INSERTs are good.
Q2.Will a "group by" used with a "select . for update" lock all the rows
used for the sum?

No; it won't work at all.

regression=# select hundred,count(* ) from tenk1 group by hundred for update;
ERROR: SELECT FOR UPDATE is not allowed with GROUP BY clause
regression=#

However, AFAICS it will not matter if you are using a serializable
transaction. If two such transactions try to delete the same row,
one of them will error out, so you do not need FOR UPDATE.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #2

Keary Suska

on 2/16/04 10:51 AM, ni*******@webba sed.co.uk purportedly said:

I am trying to gather stats about how many times a resource in our web
app is viewed, i.e. just a COUNT. There are potentially millions of
resources within the system.

I thought of two methods:

1. An extra column in the resource table which contains a count.
Not a good idea if you expect a high concurrency rate--you will create a
superfluous bottleneck in your app.
2. A separate table that contains a count using an algorithm similar
to the method presented here:

http://archives.postgresql.org/pgsql...1/msg00059.php

a. Each time a resource is viewed a new row is inserted with a count
of 1.
b. Each time the view count is needed, rows from the table are SUMmed
together.
c. A compression script runs regularly to group and sum the rows
together.

I am assuming that you are concerned about storage size, which is why you
want to "compress". You are probably better off (both by performance and
storage) with something like the following approach:

CREATE TABLE view_res (
res_id int8,
stamp timestamp
) WITHOUT OIDS;

CREATE TABLE view_res_arch (
res_id int8,
cycle date,
hits int8
);

By using a timestamp instead of count you can archive using a date/time
range and avoid any concurrency/locking issues:

INSERT INTO view_res_arch (res_id, cycle, hits)
SELECT res_id, '2003-12-31', COUNT(res_id) FROM view_res
WHERE stamp >= '2003-12-01' AND stamp <= '2003-12-31 23:59:59'
GROUP BY res_id;

then:

DELETE FROM view_res
WHERE stamp >= '2003-12-01' AND stamp <= '2003-12-31 23:59:59'

With this kind of approach you have historicity and extensibility, so you
could, for example, show historical trends with only minor modifications.

Best regards,

Keary Suska
Esoteritech, Inc.
"Leveraging Open Source for a better Internet"
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #3

NTPT

Maybe filesystem fragmenttion is a problem ??

They told that fragmentation on multiuser system is not a problem (for
example on ext2 filesystem), because many users/ many tasks shared hdd IO
subsytem and there is not benefit for having disk low fragmented
but......
In my situation I use postgresql, PHP as apache module. I make a backup and
run e2fs defragmentation program on related partitions (ie /home and /var/ ,
where php files and database cluster lives )

Result ? About 40% (!) performance boost...

----- Original Message -----
From: "Keary Suska" <hi********@pci sys.net>
To: "Postgres General" <pg***********@ postgresql.org>
Sent: Thursday, February 19, 2004 8:52 PM
Subject: Re: [GENERAL] Select for update, locks and transaction levels

on 2/16/04 10:51 AM, ni*******@webba sed.co.uk purportedly said:
I am trying to gather stats about how many times a resource in our web
app is viewed, i.e. just a COUNT. There are potentially millions of
resources within the system.

I thought of two methods:

1. An extra column in the resource table which contains a count.

Not a good idea if you expect a high concurrency rate--you will create a
superfluous bottleneck in your app.
2. A separate table that contains a count using an algorithm similar
to the method presented here:

http://archives.postgresql.org/pgsql...1/msg00059.php

a. Each time a resource is viewed a new row is inserted with a count
of 1.
b. Each time the view count is needed, rows from the table are SUMmed
together.
c. A compression script runs regularly to group and sum the rows
together.

I am assuming that you are concerned about storage size, which is why you
want to "compress". You are probably better off (both by performance and
storage) with something like the following approach:

CREATE TABLE view_res (
res_id int8,
stamp timestamp
) WITHOUT OIDS;

CREATE TABLE view_res_arch (
res_id int8,
cycle date,
hits int8
);

By using a timestamp instead of count you can archive using a date/time
range and avoid any concurrency/locking issues:

INSERT INTO view_res_arch (res_id, cycle, hits)
SELECT res_id, '2003-12-31', COUNT(res_id) FROM view_res
WHERE stamp >= '2003-12-01' AND stamp <= '2003-12-31 23:59:59'
GROUP BY res_id;

then:

DELETE FROM view_res
WHERE stamp >= '2003-12-01' AND stamp <= '2003-12-31 23:59:59'

With this kind of approach you have historicity and extensibility, so you
could, for example, show historical trends with only minor modifications.

Best regards,

Keary Suska
Esoteritech, Inc.
"Leveraging Open Source for a better Internet"
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #4

NTPT

on 2/16/04 10:51 AM, ni*******@webba sed.co.uk purportedly said:
I am trying to gather stats about how many times a resource in our web
app is viewed, i.e. just a COUNT. There are potentially millions of
resources within the system.

I thought of two methods:

1. An extra column in the resource table which contains a count.

Not a good idea if you expect a high concurrency rate--you will create a
superfluous bottleneck in your app.
2. A separate table that contains a count using an algorithm similar
to the method presented here:

http://archives.postgresql.org/pgsql...1/msg00059.php

a. Each time a resource is viewed a new row is inserted with a count
of 1.
b. Each time the view count is needed, rows from the table are SUMmed
together.
c. A compression script runs regularly to group and sum the rows
together.

I am assuming that you are concerned about storage size, which is why you
want to "compress". You are probably better off (both by performance and
storage) with something like the following approach:

CREATE TABLE view_res (
res_id int8,
stamp timestamp
) WITHOUT OIDS;

CREATE TABLE view_res_arch (
res_id int8,
cycle date,
hits int8
);

By using a timestamp instead of count you can archive using a date/time
range and avoid any concurrency/locking issues:

INSERT INTO view_res_arch (res_id, cycle, hits)
SELECT res_id, '2003-12-31', COUNT(res_id) FROM view_res
WHERE stamp >= '2003-12-01' AND stamp <= '2003-12-31 23:59:59'
GROUP BY res_id;

then:

DELETE FROM view_res
WHERE stamp >= '2003-12-01' AND stamp <= '2003-12-31 23:59:59'

With this kind of approach you have historicity and extensibility, so you
could, for example, show historical trends with only minor modifications.

Best regards,

Keary Suska
Esoteritech, Inc.
"Leveraging Open Source for a better Internet"
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #5

Similar topics

3844

Select .. for update question

by: jayson_13 | last post by:

Hi, I need to implement a counter and i face problem of locking so hope that u guys can help me. I try to do test like this : 1st connection SELECT * FROM nextkey WHERE tblname = 'PLCN' FOR Update; (when i execute this statement and i guess that this will lock the

MySQL Database

3195

Good open source mailing list system PHP / Postgresql

by: Mainlander | last post by:

An ISP I belong to uses Majordomo for their mailing list system. I'd like to encourage them to move to a system that uses a database, preferably psql which they already run on their server. Anything out there in Php?

PostgreSQL Database

9148

"select count(*) from contacts" is too slow!

by: Paul Serby | last post by:

Why does '*select count(id) from "tblContacts"'* do a sequential scan when the field '*id*' is indexed using a btree? MySql simply looks at the index which is keeping a handy record of the number of rows. Can anybody explain how and why postgres does this query like it does? Many thanks

PostgreSQL Database

4088

select for update & lock contention

by: Ed L. | last post by:

I think I'm seeing table-level lock contention in the following function when I have many different concurrent callers, each with mutually distinct values for $1. Is there a way to reimplement this function using select-for-update (or equivalent) in order to get a row-level lock (and thus less contention) while maintaining the function interface? The docs seem to suggest so, but it's not clear how to return the SETOF queued_item and also...

PostgreSQL Database

3689

Deadlock. Referential Integrity checks select for update?

by: Grant McLean | last post by:

Hi First a simple question ... I have a table "access_log" that has foreign keys "app_id" and "app_user_id" that reference the "application_type" and "app_user" tables. When I insert into "access_log", the referential integrity triggers generate these queries: SELECT 1 FROM ONLY "public"."application_type" x

PostgreSQL Database

1917

Online & update races

by: Lada 'Ray' Lostak | last post by:

Hello there, I am thinking how to solve another typical problem of online systems with combination of thin client... Imagine simple case, 2 users are going to edit 'same' datas. Both see on the 'screen' the same, after they started edit them. First one changes datas and submit changes (sucessfully). Database (set tables - inserts/updates/deleting) was changed. At this point, datas which second user is watching are not valid anymore. They...

PostgreSQL Database

8383

ASP Type mismatch error with SELECT...FOR UPDATE statement

by: Steve | last post by:

ASP error number 13 - Type mismatch with SELECT...FOR UPDATE statement I got ASP error number 13 when I use the SELECT...FOR UPDATE statement as below. However, if I use SELECT statement without FOR UPDATE, it is fine and no error. I also tried Set objRs = objConn.Execute("SELECT * FROM EMP UPDATE OF EMPNO"), but it still couldn't help. any ideas? I tried to search in the web but couldn't find similar

ASP / Active Server Pages

4053

deadlock on a single Select not in a transaction

by: ThunderMusic | last post by:

Hi, We have a web application developped in asp.net (I think it's not relevant, but well, it's so you know)... Yesterday, we received the following message "Transaction (Process ID 69) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. " The thing is, the query was a simple select with inner joins between 3 tables (like select fields from table1 inner join table2......

.NET Framework

13534

How to LOCK a table for INSERT and UPDATE, but not SELECT for duration of SqlTransaction?

by: sticky | last post by:

Hi I need to be able to lock a table against INSERT and UPDATE, but not SELECT, for the duration of a transaction. The transaction will be defined at the application level in c#, and then use stored procedures to make multiple selects and then an insert. What is the best way of doing this? Description of the system:

.NET Framework

9568

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9404

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10168

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10008

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9959

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9837

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

8833

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7381

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

3929

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp