473,698 Members | 2,149 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

performance of IN (subquery)

I'm using PG 7.4.3 on Mac OS X.

I am disappointed with the performance of queries like 'select foo from
bar where baz in (subquery)', or updates like 'update bar set foo = 2
where baz in (subquery)'. PG always seems to want to do a sequential
scan of the bar table. I wish there were a way of telling PG, "use the
index on baz in your plan, because I know that the subquery will return
very few results". Where it really matters, I have been constructing
dynamic queries by looping over the values for baz and building a
separate query for each one and combining with a UNION (or just
directly updating, in the update case). Depending on the size of the
bar table, I can get speedups of hundreds or even more than a thousand
times, but it is a big pain to have to do this.

Any tips?

Thanks,
Kevin Murphy

Illustrated:

The query I want to do is very slow:

select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.ele ment_id
FROM superlocs_2 NATURAL JOIN bundle_superloc s_2
WHERE bundle_superloc s_2.protobundle _id = 1);
-----------
7644
7644
(2 rows)
Time: 518.242 ms
The subquery is fast:

SELECT superlocs_2.ele ment_id
FROM superlocs_2 NATURAL JOIN bundle_superloc s_2
WHERE bundle_superloc s_2.protobundle _id = 1;
------------
41209
25047
(2 rows)
Time: 3.268 ms
And using indexes on the main table is fast:

select bundle_id from build.elements
where elementid in (41209, 25047);
-----------
7644
7644
(2 rows)
Time: 2.468 ms

The plan for the slow query:

egenome_test=# explain analyze select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.ele ment_id
FROM superlocs_2 NATURAL JOIN bundle_superloc s_2
WHERE bundle_superloc s_2.protobundle _id = 1);
egenome_test-# egenome_test(# egenome_test(# egenome_test(#
QUERY PLAN
\

------------------------------------------------------------------------
-------------------------------------------------------------
Hash Join (cost=70.33..72 .86 rows=25 width=4) (actual
time=583.051..5 83.059 rows=2 loops=1)
Hash Cond: ("outer".elemen t_id = "inner".element id)
-> HashAggregate (cost=47.83..47 .83 rows=25 width=4) (actual
time=0.656..0.6 58 rows=2 loops=1)
-> Hash Join (cost=22.51..47 .76 rows=25 width=4) (actual
time=0.615..0.6 25 rows=2 loops=1)
Hash Cond: ("outer".superl oc_id = "inner".superlo c_id)
-> Seq Scan on superlocs_2 (cost=0.00..20. 00 rows=1000
width=8) (actual time=0.004..0.0 12 rows=9 loops=1)
-> Hash (cost=22.50..22 .50 rows=5 width=4) (actual
time=0.076..0.0 76 rows=0 loops=1)
-> Seq Scan on bundle_superloc s_2
(cost=0.00..22. 50 rows=5 width=4) (actual time=0.024..0.0 33 rows=2
loops=1)
Filter: (protobundle_id = 1)
-> Hash (cost=20.00..20 .00 rows=1000 width=8) (actual
time=581.802..5 81.802 rows=0 loops=1)
-> Seq Scan on elements (cost=0.00..20. 00 rows=1000 width=8)
(actual time=0.172..405 .243 rows=185535 loops=1)
Total runtime: 593.843 ms
(12 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #1
22 3350
On Thu, 26 Aug 2004, Kevin Murphy wrote:
I'm using PG 7.4.3 on Mac OS X.

I am disappointed with the performance of queries like 'select foo from bar
where baz in (subquery)', or updates like 'update bar set foo = 2 where baz
in (subquery)'. PG always seems to want to do a sequential scan of the bar
table. I wish there were a way of telling PG, "use the index on baz in your
plan, because I know that the subquery will return very few results". Where
it really matters, I have been constructing dynamic queries by looping over
the values for baz and building a separate query for each one and combining
with a UNION (or just directly updating, in the update case). Depending on
the size of the bar table, I can get speedups of hundreds or even more than a
thousand times, but it is a big pain to have to do this.

Any tips?

Thanks,
Kevin Murphy

Illustrated:

The query I want to do is very slow:

select bundle_id from build.elements
where elementid in (
SELECT superlocs_2.ele ment_id
FROM superlocs_2 NATURAL JOIN bundle_superloc s_2
WHERE bundle_superloc s_2.protobundle _id = 1);
-----------
7644
7644
(2 rows)
Time: 518.242 ms


what field type is protobundle_id? if you typecast the '1' to be the
same, does the index get used?

Email: sc*****@hub.org Yahoo!: yscrappy ICQ: 7615664

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #2
Kevin Murphy wrote:
------------------------------------------------------------------------
-------------------------------------------------------------
Hash Join (cost=70.33..72 .86 rows=25 width=4) (actual
time=583.051..5 83.059 rows=2 loops=1)
Hash Cond: ("outer".elemen t_id = "inner".element id)
-> HashAggregate (cost=47.83..47 .83 rows=25 width=4) (actual
time=0.656..0.6 58 rows=2 loops=1)
-> Hash Join (cost=22.51..47 .76 rows=25 width=4) (actual
time=0.615..0.6 25 rows=2 loops=1)
Hash Cond: ("outer".superl oc_id = "inner".superlo c_id)
-> Seq Scan on superlocs_2 (cost=0.00..20. 00 rows=1000 width=8)
(actual time=0.004..0.0 12 rows=9 loops=1)
-> Hash (cost=22.50..22 .50 rows=5 width=4) (actual time=0.076..0.0 76
rows=0 loops=1)
-> Seq Scan on bundle_superloc s_2 (cost=0.00..22. 50 rows=5 width=4)
(actual time=0.024..0.0 33 rows=2 loops=1)
Filter: (protobundle_id = 1)
-> Hash (cost=20.00..20 .00 rows=1000 width=8) (actual
time=581.802..5 81.802 rows=0 loops=1)
-> Seq Scan on elements (cost=0.00..20. 00 rows=1000 width=8) (actual
time=0.172..405 .243 rows=185535 loops=1)
The planner thinks that the sequential scan on elements will return 1000
rows, but it actually returned 185000. Did you ANALYZE this table recently?

Afterthought: It would be nice if the database was smart enough to
analyze a table of its own accord when a sequential scan returns more
than, say, 20 times what it was supposed to.

Paul
Total runtime: 593.843 ms
(12 rows)
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #3
> Afterthought: It would be nice if the database was smart enough to
analyze a table of its own accord when a sequential scan returns more
than, say, 20 times what it was supposed to.


I've wondered on several occasions if there is any good reason for PG not
to automatically perform an analyze concurrently with a seq scan as it's
happening. That way, no extra disk IO is needed and the stats could say
up-to-date for almost free.

Any hackers around who can say why this might be a bad idea, or is it one
of those things that just needs a volunteer? (I'm not; at least not now.)

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #4

"Arthur Ward" <aw************ **@dominionscie nces.com> writes:
Any hackers around who can say why this might be a bad idea, or is it one
of those things that just needs a volunteer? (I'm not; at least not now.)


a) that would make plans change spontaneously. I hate being paged in the
middle of the night because some query is suddenly being slow when it had been
performing fine before.

b) Not all sequential scans will actually complete the scan. There could be a
limit imposed or a the sequential scan could be inside a EXISTS. In that case
the scan could be aborted at any point.

What I do think would be easy to do would be to keep statistics on the expense
of various components of the cost estimates. cpu_*_cost, random_page_cos t
effective_cache _size, ought to be values that can be solved for empirically
from the timing results.

But that still doesn't have to be done on every query. There's a trade-off
between work done on every query to plan queries and the benefit. Gathering
statistics and storing them on every sequential scan is way too much work
slowing down every query for minimal gain.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #5
Paul Tillotson <pn***@shentel. net> writes:
The planner thinks that the sequential scan on elements will return 1000
rows, but it actually returned 185000. Did you ANALYZE this table recently?
Or either of the other ones? All those scan costs look like defaults
:-(
Afterthought: It would be nice if the database was smart enough to
analyze a table of its own accord when a sequential scan returns more
than, say, 20 times what it was supposed to.


I've thought about this before. One simple trick would be to get rid
of the current pg_class reltuples/relpages fields in favor of a
tuples-per-page estimate, which could be multiplied by
RelationGetNumb erOfBlocks() during planning. In the absence of any
ANALYZE data the tuples-per-page estimate might be pretty bogus, but
it couldn't be off by more than an order of magnitude or so either way.
And in any case we'd have a guaranteed up-to-date number of blocks.

The objections that could be raised to this are (AFAICS) two:

1. Adding at least an lseek() kernel call per table, and per index, to
every planning operation. I'm not sure this would be significant,
but I'm not sure it wouldn't be, either.

2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...

It's certainly doable if we decide the pluses outweigh the minuses.
Thoughts?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 23 '05 #6

Tom Lane <tg*@sss.pgh.pa .us> writes:
I've thought about this before. One simple trick would be to get rid of the
current pg_class reltuples/relpages fields in favor of a tuples-per-page
estimate, which could be multiplied by RelationGetNumb erOfBlocks() during
planning.
This would do something interesting to one of the problem cases I have now. I
have trouble testing a particular batch job that generates a large amount of
precalculated denormalized data.

That's because the data is empty when the job starts, so the plpgsql function
that handles the job caches plans based on an empty table. But as the job
proceeds the data grows and I'm afraid the cached plan may start performing
poorly.

In order to test it I need to run it once, analyze, then reload the function,
truncate the data and run it another time. And hope I generated good
representative test data.

I'm thinking of changing to a non-plpgsql implementation. But that's only half
the issue. I'm not about to run analyze in the middle of the data generation
(which wouldn't work anyways since it's in a transaction). So I can't really
get good statistics for this job, not until we're actually in a steady state
in production.

Sometimes I wonder whether I wouldn't rather a more predictable system with
less focus on statistics. The resulting plans would be more predictable and
predictability is a good thing in production systems...
In the absence of any ANALYZE data the tuples-per-page estimate might be
pretty bogus, but it couldn't be off by more than an order of magnitude or
so either way.
I don't see why it couldn't. If you have a table badly in need of vacuuming
(or had one at the time of the last analyze) it could be off by way more than
an order of magnitude.

For that matter, a table that had undergone many deletes and then been
vacuumed would not change length for a long time afterward even as many new
inserts are performed. Until the table is analyzed the estimated table size
could be off by an arbitrary factor.
The objections that could be raised to this are (AFAICS) two:

1. Adding at least an lseek() kernel call per table, and per index, to
every planning operation. I'm not sure this would be significant,
but I'm not sure it wouldn't be, either.
That seems like something that could be addressed with enough time. A single
value per table, couldn't it be stored in shared memory and only updated
whenever the heap is extended or truncated? Even if you have thousands of
tables that would only be a few kilobytes of shared memory.
2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...


This is what I'm afraid of. As a OLTP application programmer -- web sites, I
admit -- I care a lot more about plan stability than finding optimal plans.

An well written OLTP application will only have a fixed number of queries that
are executed repeatedly with different parameters. I don't care how long the
queries take as long as they're always "fast enough".

Every time a new plan is tried there's a chance that it will be wrong. It only
takes one wrong plan out of the hundreds of queries to bring down the entire
application.

Ideally I would want a guarantee that every query would *always* result in the
same plan. Once I've tested them and approved the plans I want to know that
only those approved plans will ever run, and I want to be present and be able
to verify new plans before they go into production.

I doubt I'm going to convince anyone today, but I think there will be a
gradual change in mindset as the new binary protocol becomes more popular. And
postgres takes over some of mysql's web mindshare.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #7
Tom Lane wrote:

2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...


Did you forgot the autovacuum daemon ? I didn't see anyone bitten or
paged during the night for autovacuum daemon job.
Regards
Gaetano Mendola
Nov 23 '05 #8
Tom Lane wrote:
I've thought about this before. One simple trick would be to get rid
of the current pg_class reltuples/relpages fields in favor of a
tuples-per-page estimate, which could be multiplied by
RelationGetNumb erOfBlocks() during planning. In the absence of any
ANALYZE data the tuples-per-page estimate might be pretty bogus, but
it couldn't be off by more than an order of magnitude or so either way.
And in any case we'd have a guaranteed up-to-date number of blocks.

The objections that could be raised to this are (AFAICS) two:
[snip]
2. Instability of plans. Right now, the planner will not change plans
underneath you --- you have to issue an explicit VACUUM or ANALYZE
to change the terms of discussion. That would stop being true if
physical file size were always taken into account. Maybe this is a
problem, or maybe it isn't ... as someone who likes to be able to
debug planner behavior without actually creating umpteen-gig test
tables, my world view may be a bit skewed ...

It's certainly doable if we decide the pluses outweigh the minuses.
Thoughts?


My first reaction is to wonder if this would give performance exactly
equal to running a true ANALYZE in every situation? If not, then you
would end up with an automated pseudo-ANALYZE (performance-wise).

In my opinion, it is almost a feature that non-ANALYZE-d tables give
such horrendous performance, it kicks you in the butt to do some
thinking about when to correctly deal with ANALYZEing.

So, in short, I think it is a huge win if we could have automatic
ANALYZE with true ANALYZE performance, but a huge loss if the automatic
ANALYZE performance is not exactly as good as a true ANALYZE.

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
Jon Lapham <la****@jandr.o rg> Rio de Janeiro, Brasil
Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #9
Thanks all for the reminders about analyzing, and I apologize for wasting
everyone's time. The main table did indeed need to be analyzed (I had
truncated it and repopulated it with "insert ... select" but forgotten to
analyze). The other tables are very small temporary tables, and I assumed,
whether correctly or not, that analyzing would not be helpful for them.

All this is happening in the context of an algorithm in a PL/PGSQL function.
The three temporary tables are reused thousands of times. I wasn't sure if
it would be better to truncate them between uses to keep them small or just
allow them to grow. Unfortunately for the Tree of Knowledge, performance is
now more than adequate, so I may not do this experiment.

Thanks,
Kevin Murphy

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
4573
by: lev | last post by:
CREATE TABLE . ( NULL , , (44) ) ID is non-unique. I want to select all IDs where the last entry for that ID is of type 11.
12
8348
by: serge | last post by:
I have an SP that is big, huge, 700-800 lines. I am not an expert but I need to figure out every possible way that I can improve the performance speed of this SP. In the next couple of weeks I will work on preparing SQL statements that will create the tables, insert sample record and run the SP. I would hope people will look at my SP and give me any hints on how I can better write the SP.
57
25514
by: Bing Wu | last post by:
Hi all, I am running a database containing large datasets: frames: 20 thousand rows, coordinates: 170 million row. The database has been implemented with: IBM DB2 v8.1
14
5409
by: Sean C. | last post by:
Helpful folks, Most of my previous experience with DB2 was on s390 mainframe systems and the optimizer on this platform always seemed very predictable and consistent. Since moving to a WinNT/UDB 7.2 environment, the choices the optimizer makes often seem flaky. But this last example really floored me. I was hoping someone could explain why I get worse response time when the optimizer uses two indexes, than when it uses one. Some context:
0
1762
by: Joerg Ammann | last post by:
hi, os: aix 4.3.3 DB2: version 7 FP3 we are using a federated DB setup, datasource and fed-Db are both V7FP3 (in fact they are on the same server) and are having massiv performance problems. i tracked it back to the way the queries are push-downed to the
23
82028
by: Brian | last post by:
Hello All - I am wondering if anyone has any thoughts on which is better from a performance perspective: a nested Select statement or an Inner Join. For example, I could do either of the following: SELECT supplier_name FROM supplier
0
3295
by: phlype.johnson | last post by:
I'm struggling to find the best query from performance point of view and readability for a normalized DB design. To illustrate better my question on whether normalized designs lead to more complex queries yes or no, I have prepared an example. The example is a database with the following tables: *table person with fields: -persid: autoincrement id -name: name of the person *table material with fields: -materialid: autoincrement id
10
4289
by: shsandeep | last post by:
The ETL application loaded around 3000 rows in 14 seconds in a Development database while it took 2 hours to load in a UAT database. UAT db is partitioned. Dev db is not partitioned. the application looks for existing rows in the table...if they already exist then it updates otherwise inserts them. The table is pretty large, around 6.5 million rows.
3
1938
by: bharadwajrv | last post by:
Hi... i need to know which approach is good in-terms of performance while deleting the records from two tables... Here is my table struct: Master table (table1) ------------------------------- MasterTable_ID (PK) App_ID (FK)
0
8674
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8604
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9157
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8895
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8861
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
5860
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4369
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3046
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2001
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.