468,539 Members | 1,715 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,539 developers. It's quick & easy.

Page access pattern in query plan using index scan

Suppose I have a table as follows:

testdb=> \d person
Table "public.person"
Column | Type | Modifiers
------------+-------------------------+-----------
id | integer | not null
age | integer |
other_info | character varying(1000) |
Indexes: person_pkey primary key btree (id),
idx_person_age btree (age)

Here is the execution plan for a not very selective query on age:

testdb=> explain select * from person where age between 30 and 40;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using idx_person_age on person (cost=0.00..17.08 rows=5 width=524)
Index Cond: ((age >= 30) AND (age <= 40))
(2 rows)

What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.

2) The index is scanned for ages 30 and 40. The index entries are
sorted so that rows on the same page are grouped together, increasing
the odds that a needed page will be present in the shared buffers.

I'm using 7.3.4, and will be upgrading to 7.4.2 soon.

Jack Orenstein

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #1
6 1786
On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote:
What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.


This one. There have been noises about doing the second, but it's non
trivial and there's no hacker currently working on it. Not to be
expected on the next version, I'd think.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Licensee shall have no right to use the Licensed Software
for productive or commercial use. (Licencia de StarOffice 6.0 beta)
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #2
Alvaro Herrera wrote:
On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote:

What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.

This one. There have been noises about doing the second, but it's non
trivial and there's no hacker currently working on it. Not to be
expected on the next version, I'd think.


Can you recommend an application-level workaround that will access pages
in the right order?

Jack Orenstein
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #3
On Wed, Jun 02, 2004 at 11:22:53PM -0400, Jack Orenstein wrote:
Alvaro Herrera wrote:
On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote:

What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.


This one. There have been noises about doing the second, but it's non
trivial and there's no hacker currently working on it. Not to be
expected on the next version, I'd think.


My naive guess about this is that you read the index entries, each one
contains a page number, and you sort by the page number. The set of
index entries could be large, requiring a potentially expensive sort.
I don't know enough about query plan internals to know if this sort of
plan can even be expressed currently, so maybe a major extension would
be needed. And then the optimizer would have to be extended to
consider the two approaches. Is this why you say the problem is
non-trivial, or is there some additional set of issues that I'm
missing?


Part of the problem is that the sort is not expressable in the current
code. Another part is that this kind of index scan is different and
violates some assumptions of the current indexscan, so it's not best in
all scenarios. Several planner/optimizer improvements are needed for
this to work. This has been discussed several times on the hackers'
list. If you want to read them, search for "bitmap indexes" on
http://www.pgsql.ru. Note that this seems to be different from what
other RDBMSs consider a bitmap index to be.

Regarding your other question: I don't know any application workaround.
There probably isn't any (unless you know how the tuples are sorted on
the heap, which is quite unlikely).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No hay cielo posible sin hundir nuestras races
en la profundidad de la tierra" (Malucha Pinto)
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #4

Jack Orenstein <ja*@geophile.com> writes:
Alvaro Herrera wrote:
On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote:
What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.
This one. There have been noises about doing the second, but it's non
trivial and there's no hacker currently working on it. Not to be
expected on the next version, I'd think.


I think the main problem is dealing with locking issues. As long as the index
scan is using the tuples it sees right away it doesn't care if someone else
changes some tuple it hasn't gotten to yet or has already seen. If you read
them all and then sort them it's possible that while you're dealing with your
local copy that someone else comes along and does something that you would
want to know about.

Actually I'm not exactly clear on what types of changes would cause problems,
and thinking about it now I don't see how this is any different than a
materialize or other plan that stores intermediate results. So there's
something I'm missing.
Can you recommend an application-level workaround that will access pages
in the right order?


Well you could try CLUSTER-ing your table on that index.

Alternatively you could try partitioning your table so the optimizer can use a
sequential scan to efficiently get the records you want. But there's no native
support for partitioning so you would have to roll your own entirely manually.
That would mean e.g. having tables cheques_jan, cheques_feb, cheques_mar,...
And a union of all the months. Any query that only spanned one month would use
the monthly table so it could do a sequential scan of just that month. I don't
recommend this unless you have a _lot_ of data that you often deal with in
particular chunks. Even then unless you're purging and archiving data in those
chunks it's probably still not worth it.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #5
Greg Stark wrote:
Can you recommend an application-level workaround that will access pages
in the right order?

Well you could try CLUSTER-ing your table on that index.


For my application, the problem with CLUSTER is that it reclusters the entire
table. So as the table grows, the cost of CLUSTER goes up.

What would be really nice is something like "cluster recent". The idea is to
cluster only rows added since the last cluster. This amortizes the time and storage
costs of clustering across the invocations of cluster in a much more palatable way.
If such a command were used this wisely, (e.g. whenever the table grows by 20%) then
I suspect applications would obtain most of the benefit of a sort following an index
lookup. (And I'm guessing this is a much simpler development task.)

Admittedly, this would not work so well in cases where the index keys change in value.
But this would be a huge win in situations (like mine) where the keys never change.

Jack Orenstein
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #6
Jack Orenstein <ja*@geophile.com> writes:
What would be really nice is something like "cluster recent". The idea is to
cluster only rows added since the last cluster.


And you would put them where?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

10 posts views Thread by Thomas R. Hummel | last post: by
13 posts views Thread by Dmitry Tkach | last post: by
6 posts views Thread by Steven D.Arnold | last post: by
14 posts views Thread by Sean C. | last post: by
reply views Thread by NPC403 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.