Hi,
Suppose I have 2 tables. TableA has about 3 million rows and TableB has
about 50,000 rows. Assuming a btree index is used in the time column.
I need to query a limited and time ordered data in TableA that are not in
TableB. Because TableA can be huge, I only want to compare the top 1000 rows
ordered by time in TableA. What would be the fastest way to write the SQL
query?
Method 1
--------
SELECT * FROM TableA a
WHERE date_part('epoch', CURRENT_TIMESTAMP - a.col2) > 3600 AND
NOT EXISTS (SELECT b.col1 FROM TableB b WHERE b.col1 = a.col1 AND b.col2 >=
100)
ORDER BY a.col1 LIMIT 1000;
Method 2
--------
SELECT * FROM (SELECT * FROM TableA WHERE date_part('epoch',
CURRENT_TIMESTAMP - col2) > 3600 ORDER BY col1 LIMIT 1000) a
WHERE NOT EXISTS (SELECT b.col1 FROM TableB b WHERE b.col1 = a.col1 AND
b.col2 >= 100);
Method 3
--------
SELECT * FROM (SELECT * FROM TableA ORDER BY col1 LIMIT 1000) a
WHERE date_part('epoch', CURRENT_TIMESTAMP - a.col2) > 3600 AND NOT EXISTS
(SELECT b.col1 FROM TableB b WHERE b.col1 = a.col1 AND b.col2 >= 100);
Method 4
--------
SELECT * FROM (SELECT * FROM TableA ORDER BY col1 LIMIT 1000) a
WHERE date_part('epoch', CURRENT_TIMESTAMP - a.col2) > 3600 AND a.col1 NOT
IN (SELECT b.col1 FROM TableB b WHERE b.col2 >= 100);
There are a few more ways to write this (using views, temp tables), but what
would be fastest in PostgreSQL? Please suggest other SQL statements. Method
1 may not work because it may end up looping the entire TableA.
Thanks.