469,089 Members | 1,250 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,089 developers. It's quick & easy.

Please Help Obi-Wan: Efficient Join/Cursor/Something/Anything?

Hi all

I have a bit of a dilema that I am hoping some of you smart dudes
might be able to help me with.

1. I have a table with about 50 million records in it and quite a few
columns. [Table A]

2. I have another table with just over 300 records in it and a single
column (besides the id). [Table B]

3. I want to:

Select all of those records from Table A where [table A].description
does NOT contain any of (select color from [table B])

4. An example

Table A
id ... [other columns] ... description
1 the green hornet
2 a red ball
3 a green dog
4 the yellow submarine
5 the pink panther
Table B
id color
55 blue
56 gold
57 green
58 purple
59 pink
60 white

So I want to select all those rows in Table A where none of the words
from Table B.color appear in the description field in Table A.
I.E: The query would return the following from Table A:

2 a red ball
4 the yellow submarine
The real life problem has more variables and is a little more
complicated than this but this should suffice to give me the right
idea.
Due to the number of rows involved I need this to be relevantly
efficient. Can someone suggest the most efficient way to proceed.
PS. Please excuse my ignorance.
Cheers
Sean
Jul 20 '05 #1
3 1993
On 7 Nov 2003 09:36:26 -0800, sf*****@efinancialnews.com (Sean) wrote:
Due to the number of rows involved I need this to be relevantly
efficient. Can someone suggest the most efficient way to proceed.


Well, I don't know exactly about efficient. :)

I came up with this method, using like instead of not like (then
reversing the results) because like is supposedly faster, and not in
on integers isn't a big deal. Maybe reversed is faster (not like on
the cursor, in on the integers), but I was doubting it. Couldn't think
a way around a big honkin' SQL statement built by a cursor, executed
in nicely limited 8000 character chunks. At least this cuts down the
number of executions (versus 300 seperate ones...)

Test bed:

create table TableA (id int, description varchar(50))
insert into TableA values (1, 'the green hornet')
insert into TableA values (2, 'a red ball')
insert into TableA values (3, 'a green dog')
insert into TableA values (4, 'the yellow submarine')
insert into TableA values (5, 'the pink panther')
create table TableB (id int, color varchar(50))
insert into TableB values (55, 'blue')
insert into TableB values (56, 'gold')
insert into TableB values (57, 'green')
insert into TableB values (58, 'purple')
insert into TableB values (59, 'pink')
insert into TableB values (60, 'white')

SQL to get values:

declare @SQLstr varchar(8000)
declare @Color varchar(50)

create table #resultstable (id int)

declare color_cursor CURSOR for select distinct color from TableB
open color_cursor
fetch next from color_cursor into @Color
select @SQLStr = 'select id from TableA where ((1=0) '
while @@fetch_status = 0
begin
if (len(@SQLStr) > 7000)
begin
select @SQLStr = @SQLStr + ')'
insert into #resultstable(id) execute @SQLStr
select @SQLStr = 'select id from TableA where ((1=0) '
end
else
begin
select @SQLStr = @SQLStr + ' or description like ''%'
+ @Color + '%'' '
end
fetch next from color_cursor into @Color
end

select @SQLStr = @SQLStr + ')'
insert into #resultstable(id) execute(@SQLStr)
close color_cursor
deallocate color_cursor

select * from TableA where id not in (select distinct id from
#resultstable)
drop table #resultstable
___________
To replay by email, chop off the head!
Jul 20 '05 #2
sf*****@efinancialnews.com (Sean) wrote in message news:<da**************************@posting.google. com>...
Hi all
1. I have a table with about 50 million records [sic] in it and quite a few
columns. [Table A] <<

Rows are not records -- big difference.
I want to select all of those records [sic] from Table A where TableA.description does NOT contain any of (SELECT color FROM TableB)
<<

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in
your schema are. I think that description is free text, Latin
alphabet and lowercased, but who knows?
So I want to select all those rows in Table A where none of the words from TableB.color appear in the description field [sic] in
TableA. <<

SELECT *
FROM TableA AS A1
WHERE NOT EXISTS
(SELECT *
FROM TableB AS B1
WHERE A1.description LIKE '%' + B1.color +'%');
Due to the number of rows involved I need this to be relevantly

efficient. Can someone suggest the most efficient way to proceed. <<

Don't use SQL for text searches; there are better products for that.
This is simply going to be a slow table scan.
Jul 20 '05 #3
Hi Sean,

You can use ... A left outer join B ... where B is null. -- Louis

create table #A (X varchar(100))
insert into #A values ('the green hornet')
insert into #A values ('a red ball')
insert into #A values ('a green dog')
insert into #A values ('the yellow submarine')
insert into #A values ('the pink panther')
create Table #B (X varchar(100))
insert into #B values ('green')
insert into #B values ('pink')

select a.*
from #A as a
left outer join #B as b
on a.x like '%'+b.x+'%'
where b.x is null

returns:
x
---------
a red ball
the yellow submarine
Jul 20 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by HolaGoogle | last post: by
5 posts views Thread by settyv | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by kglaser89 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.