By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
426,115 Members | 919 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 426,115 IT Pros & Developers. It's quick & easy.

SELECT from a set of values really slow?

P: n/a
I've been benchmarking some very simple databases. By simple, I mean a table
like this:

CREATE TABLE bench (
id SERIAL,
data TEXT )
CREATE INDEX bench_data_index ON bench (data)

which is filled with 100k records, where the data values for each record are
distinct (the data for record N is "text_item_N").

I generate a random data value known to be in the table and lookup its id,
with "SELECT id FROM bench WHERE data = 'whatever'".

I run this for three seconds. PostgreSQL does OK: 4300 selects in 2999
msec. MySQL does a little better, but not much: 5500 selects in 2999 msec.

In my application, this kind of lookup turns out to be the bottleneck on
performance (which is why I'm benchmarking). On MySQL, I can speed it up
quite a bit by looking up more than one thing at a time:

SELECT id FROM bench WHERE data IN ('X', 'Y')

for example, gets about 4050 selects in 3 seconds, so that's 8100 records
looked up, compared to 5300 when they were done one at a time. It continues
to improve selecting more at a time. To my surprise, when I tried this
trick with PostgreSQL, it did not speed things up. In fact, it *massively*
slowed down--it only is getting 13 selects in 3 seconds, searching for two
at a time.

What's going on here?

PostgreSQL 7.4.2 on SuSE 9.1 Linux. (The version that comes with SuSE).

--
--Tim Smith
Nov 23 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
CoL
hi,

Tim Smith wrote:

What's going on here?


#>explain analyze SELECT id FROM bench WHERE data IN ('X', 'Y');
#>set enable_seqscan to off;
#>explain analyze SELECT id FROM bench WHERE data IN ('X', 'Y');

result?

C.
Nov 23 '05 #2

P: n/a
Tim Smith <re************@mouse-potato.com> writes:
SELECT id FROM bench WHERE data IN ('X', 'Y') To my surprise, when I tried this trick with PostgreSQL, it did not
speed things up. In fact, it *massively* slowed down--it only is
getting 13 selects in 3 seconds, searching for two at a time. What's going on here?


Likely it's switching from index to sequential scan because of a poor
estimate of how many rows will be returned. Have you ever ANALYZEd
the test table? Without either ANALYZE stats or a unique index,
the planner will certainly not think that the column is unique.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #3

P: n/a
test=> insert into bench (id,data) select id, 'text_item_'||id::text from
dummy where id<=100000 order by id;
INSERT 0 100001
test=> CREATE INDEX bench_data_index ON bench (data);
CREATE INDEX
test=> explain select * from bench where data = 'test_item_1';
Index Scan using bench_data_index on bench (cost=0.00..1478.85 rows=501
width=36)
Index Cond: (data = 'test_item_1'::text)
(2 lignes)

test=> explain select * from bench where data in ( 'test_item_1',
'test_item_2' );
Seq Scan on bench (cost=0.00..2190.01 rows=998 width=36)
Filter: ((data = 'test_item_1'::text) OR (data = 'test_item_2'::text))
(2 lignes)

test=> vacuum analyze bench;
VACUUM
test=> explain select * from bench where data in ( 'test_item_1',
'test_item_2' );
Index Scan using bench_data_index, bench_data_index on bench
(cost=0.00..7.91 rows=2 width=22)
Index Cond: ((data = 'test_item_1'::text) OR (data =
'test_item_2'::text))
(2 lignes)

=> once you analyze, it works...
***********************************************

With 1 item :

test=>explain analyze select * from bench where data in ( 'test_item_1' );
Total runtime: 0.127 ms

With 11 items :

test=>explain analyze select * from bench where data in ( 'test_item_1',
'test_item_2', 'test_item_55', 'test_item_64', 'test_item_1005',
'test_item_78541', 'test_item_96521', 'test_item_8574', 'test_item_89652',
'test_item_14527', 'test_item_48652' );
Total runtime: 0.352 ms

***********************************************

With a Join... see on psql-performance

On Sun, 10 Oct 2004 16:00:10 -0400, Tom Lane <tg*@sss.pgh.pa.us> wrote:
Tim Smith <re************@mouse-potato.com> writes:
SELECT id FROM bench WHERE data IN ('X', 'Y')

To my surprise, when I tried this trick with PostgreSQL, it did not
speed things up. In fact, it *massively* slowed down--it only is
getting 13 selects in 3 seconds, searching for two at a time.

What's going on here?


Likely it's switching from index to sequential scan because of a poor
estimate of how many rows will be returned. Have you ever ANALYZEd
the test table? Without either ANALYZE stats or a unique index,
the planner will certainly not think that the column is unique.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.