correlated delete with "in" and "left outer join"

mike

I'm using postgresl 7.3.2 and have a query that executes very slowly.

There are 2 tables: Item and LogEvent. ItemID (an int4) is the primary key
of Item, and is also a field in LogEvent. Some ItemIDs in LogEvent do not
correspond to ItemIDs in Item, and periodically we need to purge the
non-matching ItemIDs from LogEvent.

The query is:

delete from LogEvent where EventType != 'i' and ItemID in
(select distinct e.ItemID from LogEvent e left outer join Item i
on e.ItemID = i.ItemID where e.EventType != 'i' and i.ItemID is null);

I understand that using "in" is not very efficient.

Is there some other way to write this query without the "in"?

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 22 '05 #1

Subscribe Post Reply

5652

Michael Glaesemann

On Feb 27, 2004, at 11:26 AM, <mi**@linkify.com> wrote:

I'm using postgresl 7.3.2 and have a query that executes very slowly.
<snip />
I understand that using "in" is not very efficient.

Is there some other way to write this query without the "in"?

NOT EXISTS ( ) is sometimes more efficient. If at all possible, upgrade
to 7.4.1. One of the many things that have improved since 7.3.2 is the
efficiency of queries using IN.

Michael Glaesemann
grzm myrealbox com
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #2

Stephan Szabo

On Thu, 26 Feb 2004 mi**@linkify.com wrote:

I'm using postgresl 7.3.2 and have a query that executes very slowly.

There are 2 tables: Item and LogEvent. ItemID (an int4) is the primary key
of Item, and is also a field in LogEvent. Some ItemIDs in LogEvent do not
correspond to ItemIDs in Item, and periodically we need to purge the
non-matching ItemIDs from LogEvent.

The query is:

delete from LogEvent where EventType != 'i' and ItemID in
(select distinct e.ItemID from LogEvent e left outer join Item i
on e.ItemID = i.ItemID where e.EventType != 'i' and i.ItemID is null);

I understand that using "in" is not very efficient.

Is there some other way to write this query without the "in"?

Perhaps
delete from LogEvent where EventType != 'i' and not exists
(select * from Item i where i.ItemID=LogEvent.ItemID);

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #3

Mike Mascari

Stephan Szabo wrote:

On Thu, 26 Feb 2004 mi**@linkify.com wrote:

I'm using postgresl 7.3.2 and have a query that executes very slowly.

There are 2 tables: Item and LogEvent. ItemID (an int4) is the primary key
of Item, and is also a field in LogEvent. Some ItemIDs in LogEvent do not
correspond to ItemIDs in Item, and periodically we need to purge the
non-matching ItemIDs from LogEvent.

The query is:

delete from LogEvent where EventType != 'i' and ItemID in
(select distinct e.ItemID from LogEvent e left outer join Item i
on e.ItemID = i.ItemID where e.EventType != 'i' and i.ItemID is null);

I understand that using "in" is not very efficient.

Is there some other way to write this query without the "in"?

Perhaps
delete from LogEvent where EventType != 'i' and not exists
(select * from Item i where i.ItemID=LogEvent.ItemID);

Maybe I'm not reading his subquery correctly, but the left outer
join will produce a row from LogEvent regardless of whether or not a
matching row exists in Item, correct? So doesn't it reduce to:

DELETE FROM LogEvent WHERE EventType <> 'i';

???

Mike Mascari

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #4

Mike Mascari

Mike Mascari wrote:

Stephan Szabo wrote:
On Thu, 26 Feb 2004 mi**@linkify.com wrote:
I'm using postgresl 7.3.2 and have a query that executes very slowly.

There are 2 tables: Item and LogEvent. ItemID (an int4) is the
primary key
of Item, and is also a field in LogEvent. Some ItemIDs in LogEvent
do not
correspond to ItemIDs in Item, and periodically we need to purge the
non-matching ItemIDs from LogEvent.

Perhaps
delete from LogEvent where EventType != 'i' and not exists
(select * from Item i where i.ItemID=LogEvent.ItemID);

Maybe I'm not reading his subquery correctly, but the left outer join
will produce a row from LogEvent regardless of whether or not a matching
row exists in Item, correct? So doesn't it reduce to:

DELETE FROM LogEvent WHERE EventType <> 'i';

I failed to read what he was trying to accomplish and assumed the
original query was precisely what he intended. My apologies...

Mike Mascari

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #5

mike

The subquery will always return a row from LogEvent, but that row's itemID
will be null if theitemID doesn't match a row from Item. That's why the subquery has the
"and i.ItemID is null".

Stephan Szabo wrote:
On Thu, 26 Feb 2004 mi**@linkify.com wrote:

I'm using postgresl 7.3.2 and have a query that executes very slowly.

There are 2 tables: Item and LogEvent. ItemID (an int4) is the
primary key of Item, and is also a field in LogEvent. Some ItemIDs in
LogEvent do not correspond to ItemIDs in Item, and periodically we
need to purge the non-matching ItemIDs from LogEvent.

The query is:

delete from LogEvent where EventType != 'i' and ItemID in
(select distinct e.ItemID from LogEvent e left outer join Item i on
e.ItemID = i.ItemID where e.EventType != 'i' and i.ItemID is null);

I understand that using "in" is not very efficient.

Is there some other way to write this query without the "in"?

Perhaps
delete from LogEvent where EventType != 'i' and not exists
(select * from Item i where i.ItemID=LogEvent.ItemID);

Maybe I'm not reading his subquery correctly, but the left outer
join will produce a row from LogEvent regardless of whether or not a
matching row exists in Item, correct? So doesn't it reduce to:

DELETE FROM LogEvent WHERE EventType <> 'i';

???

Mike Mascari

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #6

Mike Mascari

mi**@linkify.com wrote:

The subquery will always return a row from LogEvent, but that row's itemID
will be null if the itemID doesn't match a row from Item.
That's why the subquery has the "and i.ItemID is null".

You lost me.

[test@lexus] \d foo
Table "public.foo"
Column | Type | Modifiers
--------+---------+-----------
key | integer |

[test@lexus] \d bar
Table "public.bar"
Column | Type | Modifiers
--------+---------+-----------
key | integer |
value | text |

[test@lexus] select * from foo;
key
-----
1
3
(2 rows)

[test@lexus] select * from bar;
key | value
-----+-------
1 | Mike
2 | Joe
(2 rows)

[test@lexus] select f.key from foo f left outer join bar b on f.key
= b.key and b.key is null;
key
-----
1
3
(2 rows)

To do what I think you believe to be happening w.r.t. outer joins,
you'd have to have a subquery like:

[test@lexus] select a.fookey
test-# FROM
test-# (SELECT foo.key AS fookey, bar.key as barkey FROM foo LEFT
OUTER JOIN bar ON foo.key = bar.key) AS a
test-# WHERE a.barkey IS NULL;
fookey
--------
3
(1 row)

Nevertheless, Stephan's solution matches your description of the
problem and excutes the logical equivalent of the above much more
rapidly...

Mike Mascari

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #7

Michael Chaney

On Thu, Feb 26, 2004 at 06:26:19PM -0800, mi**@linkify.com wrote:

I'm using postgresl 7.3.2 and have a query that executes very slowly.

There are 2 tables: Item and LogEvent. ItemID (an int4) is the
primary key
of Item, and is also a field in LogEvent. Some ItemIDs in LogEvent do
not
correspond to ItemIDs in Item, and periodically we need to purge the
non-matching ItemIDs from LogEvent.

delete from LogEvent where EventType!='i' and
ItemID not in (select ItemID from Item);

delete from LogEvent where EventType!='i' and
not exists (select * from Item where Item.ItemID=LogEvent.ItemID);

You might also use a foreign key, cascading delete, etc. As for the
query style, I've had cases with the latest 7.4 where the "in" style
wasn't optimized but the "exists" style was. It's the exact same query,
and technically the optimizer should figure that out. Use "explain" to
see if it's being optimized to use indexes or if it's just doing table
scans.

Michael
--
Michael Darrin Chaney
md******@michaelchaney.com
http://www.michaelchaney.com/

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #8

Stephan Szabo

On Fri, 27 Feb 2004, Mike Mascari wrote:

To do what I think you believe to be happening w.r.t. outer joins,
you'd have to have a subquery like:

[test@lexus] select a.fookey
test-# FROM
test-# (SELECT foo.key AS fookey, bar.key as barkey FROM foo LEFT
OUTER JOIN bar ON foo.key = bar.key) AS a
test-# WHERE a.barkey IS NULL;

This AFAICS is pretty much what he did, except that he didn't alias the
join which is okay I believe. He had one condition in on and two
conditions in where.

The original subquery looked like:
select distinct e.ItemID from LogEvent e left outer join Item i
on e.ItemID = i.ItemID where e.EventType != 'i' and i.ItemID is null
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 22 '05 #9

Mike Mascari

Stephan Szabo wrote:

On Fri, 27 Feb 2004, Mike Mascari wrote:
To do what I think you believe to be happening w.r.t. outer joins,
you'd have to have a subquery like:

[test@lexus] select a.fookey
test-# FROM
test-# (SELECT foo.key AS fookey, bar.key as barkey FROM foo LEFT
OUTER JOIN bar ON foo.key = bar.key) AS a
test-# WHERE a.barkey IS NULL;

This AFAICS is pretty much what he did, except that he didn't alias the
join which is okay I believe. He had one condition in on and two
conditions in where.

The original subquery looked like:
select distinct e.ItemID from LogEvent e left outer join Item i
on e.ItemID = i.ItemID where e.EventType != 'i' and i.ItemID is null

That is indeed the original subquery. But the 'i.ItemID is null'
condition doesn't change the IN list one iota. He was somehow
expecting the subquery to yield records internally like:

1 NULL
2 NULL
3 3

and simultaneously have the condition 'i.ItemID is null' eliminate
the third tuple. But that is not how the left outer join executes.
The 'i.ItemID is null' condition is evaluated, probably always to
false, which ensures that the left outer join will never find a
matching row from the 'Item' relation and, if queried not as a
subquery but stand-alone as:

select distinct e.ItemID, i.ItemID
from LogEvent e left outer join Item i on e.ItemID = i.ItemID
where e.EventType != 'i' and i.ItemID is null

would always yield a relation of the form:

e.ItemID NULL

for every e.ItemID whose e.EventType != 'i'. That ain't right.

Another example:

[test@lexus] select * from foo;
key
-----
1
3
(2 rows)

[test@lexus] select * from bar;
key | value
-----+-------
1 | Mike
2 | Joe
(2 rows)

[test@lexus] select foo.key, bar.key from foo left outer join bar on
foo.key = bar.key and bar.key is null;
key | key
-----+-----
1 |
3 |
(2 rows)

[test@lexus] select foo.key, bar.key from foo left outer join bar on
foo.key = bar.key;
key | key
-----+-----
1 | 1
3 |
(2 rows)

[test@lexus] select a.fookey, a.barkey from (select foo.key as
fookey, bar.key as barkey from foo left outer join bar on foo.key =
bar.key) as a where a.barkey is null;
fookey | barkey
--------+--------
3 |
(1 row)
Mike Mascari
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #10

Michael Chaney

> >The original subquery looked like:

select distinct e.ItemID from LogEvent e left outer join Item i
on e.ItemID = i.ItemID where e.EventType != 'i' and i.ItemID is null

Please, before continuing this thread, read my post below. What you're
all getting around to, albeit painfully, is that this subquery is
worthless as-is. This is the mysql way of finding rows in one table
with no match in another without the convenience of the "in" or "exists"
constructs.

Because we're using Postgres and have those constructs, the original
query can be rewritten simply with either:

delete from LogEvent where EventType != 'i' and ItemID not in
(select ItemID from Item)

That's it. That's the whole query. It does what he wants.

Michael
--
Michael Darrin Chaney
md******@michaelchaney.com
http://www.michaelchaney.com/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 22 '05 #11

Mike Mascari

Michael Chaney wrote:

Please, before continuing this thread, read my post below. What you're
all getting around to, albeit painfully, is that this subquery is
worthless as-is. This is the mysql way of finding rows in one table
with no match in another without the convenience of the "in" or "exists"
constructs.

Because we're using Postgres and have those constructs, the original
query can be rewritten simply with either:

delete from LogEvent where EventType != 'i' and ItemID not in
(select ItemID from Item)

That's it. That's the whole query. It does what he wants.
One more minor point. :-)

If you are using 7.3 or earlier, PostgreSQL will sequentially scan
the IN subquery result, which executes quite slowly and therefore
the EXISTS method Stephan stated should be used:

DELETE FROM LogEvent
WHERE EventType != 'i' AND NOT EXISTS (
SELECT 1
FROM Item
WHERE Item.ItemID = LogEvent.ItemID
);

If you are using >= 7.4, then your query above is optimal:

http://www.postgresql.org/docs/7.4/s...ml#RELEASE-7-4

Just something to consider,

Mike Mascari

Michael

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #12

Michael Chaney

On Fri, Feb 27, 2004 at 12:05:48PM -0500, Mike Mascari wrote:

Michael Chaney wrote:
Please, before continuing this thread, read my post below. What you're
all getting around to, albeit painfully, is that this subquery is
worthless as-is. This is the mysql way of finding rows in one table
with no match in another without the convenience of the "in" or "exists"
constructs.

Because we're using Postgres and have those constructs, the original
query can be rewritten simply with either:

delete from LogEvent where EventType != 'i' and ItemID not in
(select ItemID from Item)

That's it. That's the whole query. It does what he wants.

One more minor point. :-)

If you are using 7.3 or earlier, PostgreSQL will sequentially scan
the IN subquery result, which executes quite slowly and therefore
the EXISTS method Stephan stated should be used:

DELETE FROM LogEvent
WHERE EventType != 'i' AND NOT EXISTS (
SELECT 1
FROM Item
WHERE Item.ItemID = LogEvent.ItemID
);

If you are using >= 7.4, then your query above is optimal:

Not necessarily. I had a query just last week that still wouldn't
optimize with the "in" notation, but did optimize with "exists"
notation. My other post about this showed both queries for that reason,
but I still feel that, for academic purposes, the "in" clause is far
more readable.

Anyway, good point.

Michael
--
Michael Darrin Chaney
md******@michaelchaney.com
http://www.michaelchaney.com/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #13

Stephan Szabo

On Fri, 27 Feb 2004, Mike Mascari wrote:

Stephan Szabo wrote:
On Fri, 27 Feb 2004, Mike Mascari wrote:
To do what I think you believe to be happening w.r.t. outer joins,
you'd have to have a subquery like:

[test@lexus] select a.fookey
test-# FROM
test-# (SELECT foo.key AS fookey, bar.key as barkey FROM foo LEFT
OUTER JOIN bar ON foo.key = bar.key) AS a
test-# WHERE a.barkey IS NULL;
This AFAICS is pretty much what he did, except that he didn't alias the
join which is okay I believe. He had one condition in on and two
conditions in where.

The original subquery looked like:
select distinct e.ItemID from LogEvent e left outer join Item i
on e.ItemID = i.ItemID where e.EventType != 'i' and i.ItemID is null

That is indeed the original subquery. But the 'i.ItemID is null'
condition doesn't change the IN list one iota. He was somehow

.... Another example:

[test@lexus] select * from foo;
key
-----
1
3
(2 rows)

[test@lexus] select * from bar;
key | value
-----+-------
1 | Mike
2 | Joe
(2 rows)

[test@lexus] select foo.key, bar.key from foo left outer join bar on
foo.key = bar.key and bar.key is null;

ON conditions and WHERE conditions are different.

Try
select foo.key, bar.key from foo left outer join bar on foo.key=bar.key
where bar.key is null;

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 22 '05 #14

Mike Mascari

Stephan Szabo wrote:

ON conditions and WHERE conditions are different.

Try
select foo.key, bar.key from foo left outer join bar on foo.key=bar.key
where bar.key is null;

Yep. Sorry.

Mike Mascari

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #15

by: Dam | last post by:

Using SqlServer : Query 1 : SELECT def.lID as IdDefinition, TDC_AUneValeur.VALEURDERETOUR as ValeurDeRetour FROM serveur.Data_tblDEFINITIONTABLEDECODES def,...

Microsoft SQL Server

Need Help with "Left Outer Join"...

by: Steve | last post by:

I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three "Left Outer Joins" in order to return every transaction for a specific set of criteria. Using three "Left Outer Joins"...

Microsoft SQL Server

Can someone help me with multiple "Left Outer Joins"?

by: Steve | last post by:

I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three "Left Outer Joins" in order to return every transaction for a specific set of criteria. Using three "Left Outer Joins"...

Microsoft SQL Server

Alternatives for "OUTER JOIN"

by: Martin | last post by:

Hello everybody, I have the following question. As a join clause on Oracle we use " table1.field1 = table2.field1 (+) " On SQL Server we use " table1.field1 *= table2.field1 " Does DB2...

DB2 Database

regarding "goto" in C

by: M.B | last post by:

Guys, Need some of your opinion on an oft beaten track We have an option of using "goto" in C language, but most testbooks (even K&R) advice against use of it. My personal experience was that...

C / C++

Java code or Pseudo Code for "Outer Join"

by: SKB | last post by:

Hi, I want to implement the "outer join" functionality in Java. Can somebody explain the pseudo code for the same. OR what needs to be done to extend the hash-join Java code of equijoin(I have the...

MySQL Database

correct parameter usage for "select * where id in ..."

by: saniac | last post by:

I am working on a little project using pysqlite. It's going to be exposed on the web, so I want to make sure I quote all incoming data correctly. However, I've run into a brick wall trying to use...

Python

Array to "IN" operator

by: Kevin Chambers | last post by:

Hi all-- Quick question: has anyone come up with an easy way to take an array and use its elements as part of a WHERE clause? For example: <This obviously doesn't work> SELECT * FROM Table1...

Microsoft Access / VBA

More than 1 "LEFT OUTER JOIN"

by: nico3334 | last post by:

I have a query that currently pulls data from a main table and a second table using LEFT OUTER JOIN. I know how to do make another LEFT OUTER JOIN with the main table, but I want to add another LEFT...

Microsoft SQL Server

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

correlated delete with "in" and "left outer join"

Similar topics