473,791 Members | 3,193 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Page access pattern in query plan using index scan

Suppose I have a table as follows:

testdb=> \d person
Table "public.per son"
Column | Type | Modifiers
------------+-------------------------+-----------
id | integer | not null
age | integer |
other_info | character varying(1000) |
Indexes: person_pkey primary key btree (id),
idx_person_age btree (age)

Here is the execution plan for a not very selective query on age:

testdb=> explain select * from person where age between 30 and 40;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using idx_person_age on person (cost=0.00..17. 08 rows=5 width=524)
Index Cond: ((age >= 30) AND (age <= 40))
(2 rows)

What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.

2) The index is scanned for ages 30 and 40. The index entries are
sorted so that rows on the same page are grouped together, increasing
the odds that a needed page will be present in the shared buffers.

I'm using 7.3.4, and will be upgrading to 7.4.2 soon.

Jack Orenstein

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 23 '05 #1
6 2047
On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote:
What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.


This one. There have been noises about doing the second, but it's non
trivial and there's no hacker currently working on it. Not to be
expected on the next version, I'd think.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Licensee shall have no right to use the Licensed Software
for productive or commercial use. (Licencia de StarOffice 6.0 beta)
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #2
Alvaro Herrera wrote:
On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote:

What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.

This one. There have been noises about doing the second, but it's non
trivial and there's no hacker currently working on it. Not to be
expected on the next version, I'd think.


Can you recommend an application-level workaround that will access pages
in the right order?

Jack Orenstein
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #3
On Wed, Jun 02, 2004 at 11:22:53PM -0400, Jack Orenstein wrote:
Alvaro Herrera wrote:
On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote:

What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.


This one. There have been noises about doing the second, but it's non
trivial and there's no hacker currently working on it. Not to be
expected on the next version, I'd think.


My naive guess about this is that you read the index entries, each one
contains a page number, and you sort by the page number. The set of
index entries could be large, requiring a potentially expensive sort.
I don't know enough about query plan internals to know if this sort of
plan can even be expressed currently, so maybe a major extension would
be needed. And then the optimizer would have to be extended to
consider the two approaches. Is this why you say the problem is
non-trivial, or is there some additional set of issues that I'm
missing?


Part of the problem is that the sort is not expressable in the current
code. Another part is that this kind of index scan is different and
violates some assumptions of the current indexscan, so it's not best in
all scenarios. Several planner/optimizer improvements are needed for
this to work. This has been discussed several times on the hackers'
list. If you want to read them, search for "bitmap indexes" on
http://www.pgsql.ru. Note that this seems to be different from what
other RDBMSs consider a bitmap index to be.

Regarding your other question: I don't know any application workaround.
There probably isn't any (unless you know how the tuples are sorted on
the heap, which is quite unlikely).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"No hay cielo posible sin hundir nuestras raíces
en la profundidad de la tierra" (Malucha Pinto)
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #4

Jack Orenstein <ja*@geophile.c om> writes:
Alvaro Herrera wrote:
On Wed, Jun 02, 2004 at 08:38:58PM -0400, Jack Orenstein wrote:
What is the pattern of access to data pages? I can think of two likely
answers:

1) The index is scanned for ages 30 through 40. As each index entry is
scanned, a row is retrieved.
This one. There have been noises about doing the second, but it's non
trivial and there's no hacker currently working on it. Not to be
expected on the next version, I'd think.


I think the main problem is dealing with locking issues. As long as the index
scan is using the tuples it sees right away it doesn't care if someone else
changes some tuple it hasn't gotten to yet or has already seen. If you read
them all and then sort them it's possible that while you're dealing with your
local copy that someone else comes along and does something that you would
want to know about.

Actually I'm not exactly clear on what types of changes would cause problems,
and thinking about it now I don't see how this is any different than a
materialize or other plan that stores intermediate results. So there's
something I'm missing.
Can you recommend an application-level workaround that will access pages
in the right order?


Well you could try CLUSTER-ing your table on that index.

Alternatively you could try partitioning your table so the optimizer can use a
sequential scan to efficiently get the records you want. But there's no native
support for partitioning so you would have to roll your own entirely manually.
That would mean e.g. having tables cheques_jan, cheques_feb, cheques_mar,...
And a union of all the months. Any query that only spanned one month would use
the monthly table so it could do a sequential scan of just that month. I don't
recommend this unless you have a _lot_ of data that you often deal with in
particular chunks. Even then unless you're purging and archiving data in those
chunks it's probably still not worth it.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #5
Greg Stark wrote:
Can you recommend an application-level workaround that will access pages
in the right order?

Well you could try CLUSTER-ing your table on that index.


For my application, the problem with CLUSTER is that it reclusters the entire
table. So as the table grows, the cost of CLUSTER goes up.

What would be really nice is something like "cluster recent". The idea is to
cluster only rows added since the last cluster. This amortizes the time and storage
costs of clustering across the invocations of cluster in a much more palatable way.
If such a command were used this wisely, (e.g. whenever the table grows by 20%) then
I suspect applications would obtain most of the benefit of a sort following an index
lookup. (And I'm guessing this is a much simpler development task.)

Admittedly, this would not work so well in cases where the index keys change in value.
But this would be a huge win in situations (like mine) where the keys never change.

Jack Orenstein
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #6
Jack Orenstein <ja*@geophile.c om> writes:
What would be really nice is something like "cluster recent". The idea is to
cluster only rows added since the last cluster.


And you would put them where?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
5225
by: Paul Mateer | last post by:
Hi, I have been running some queries against a table in a my database and have noted an odd (at least it seems odd to me) performance issue. The table has approximately 5 million rows and includes the following columns: DocID (INTEGER, PRIMARY KEY, CLUSTERED) IsRecord (INTEGER, NONCLUSTERED)
10
3747
by: Thomas R. Hummel | last post by:
I have a stored procedure that suddenly started performing horribly. The query plan didn't look right to me, so I copy/pasted the code and ran it (it's a single SELECT statement). That ran pretty well and used a query plan that made sense. Now, I know what you're all thinking... stored procedures have to optimize for variable parameters, etc. Here's what I've tried to fix the issue: 1. Recompiled the stored procedure 2. Created a new,...
13
1996
by: Dmitry Tkach | last post by:
Hi, everybody! Here is a weird problem, I ran into... I have two huge (80 million rows each) tables (a and b), with id as a PK on both of them and also an FK from b referencing a. When I try to run a query like: select * from a, b where a.id >= 7901288 and a.id=b.id limit 1; The query takes *forever*.
6
1934
by: Steven D.Arnold | last post by:
I have a query which does not use column indexes that it should use. I have discovered some interesting behaviors of Postgres which may indicate a bug in the database's query planning. Take a look at the query below. There is a btree index on both m.account_id and a.account_id. Query (1) does not use the index on the messages table, instead opting for a full table scan, thus killing performance. The messages table can contain...
8
3272
by: Együd Csaba | last post by:
Hi All, how can I improve the query performance in the following situation: I have a big (4.5+ million rows) table. One query takes approx. 9 sec to finish resulting ~10000 rows. But if I run simultaneously 4 similar queries it takes nearly 5 minutes instead of 4 times 9 seconds or something near of that. here is a sample query: select mertido, fomeazon, ertektipus, mertertek from t_me30 where fomeazon in (select distinct fomeazon...
14
5423
by: Sean C. | last post by:
Helpful folks, Most of my previous experience with DB2 was on s390 mainframe systems and the optimizer on this platform always seemed very predictable and consistent. Since moving to a WinNT/UDB 7.2 environment, the choices the optimizer makes often seem flaky. But this last example really floored me. I was hoping someone could explain why I get worse response time when the optimizer uses two indexes, than when it uses one. Some context:
5
6343
by: sql-db2-dba | last post by:
We have DB2 UDB v8.1 fixpak3 on AIX 5. Production and Development configuarations (at least for DB2) are identical albeit production is a 2-way server while development has only one processor. Tables and indexes have the same schema. In fact, the dev database was taken from a prod backup recently. Size of the tables differ slightly. Yet, on a given query (with 4 tables joined), it took 30-50 times longer to run in prod than on development....
21
6984
by: CSN | last post by:
I have a pretty simple select query that joins a table (p) with 125K rows with another table (pc) with almost one million rows: select p.* from product_categories pc inner join products p on pc.product_id = p.id where pc.category_id = $category_id order by p.title
3
2049
by: Kevin Macdonald | last post by:
I expected Postgresql to use an indexed access method, but in certain cases it is using a sequential scan. Details are below: Table: P1_NRN_ROAD ( sobjid int8 primary key, v int8 not null, ord int2 not null)
0
9669
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10428
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10207
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10156
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9997
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6776
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5435
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5559
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4110
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.