Mammoth amounts of self joins

sks

I have a table that contains keywords (Varchars) each one mapped to a
product. so the database schema is just an id column, product column and
keyword column.

I want to select the products that contain x,y,z keywords. Now if this query
involves many keywords I end up with a massive amount of self joins on the
keywords table, is there a better way to do this than self joins? What would
work perhaps is multiple unions where I could discard non duplicated rows
but you can't seem to do that. Any ideas... ?

Mar 22 '06 #1

Subscribe Post Reply

1503

noone

sks wrote:

I have a table that contains keywords (Varchars) each one mapped to a
product. so the database schema is just an id column, product column and
keyword column. I want to select the products that contain x,y,z keywords. Now if this query
involves many keywords I end up with a massive amount of self joins on the
keywords table, is there a better way to do this than self joins? What would
work perhaps is multiple unions where I could discard non duplicated rows
but you can't seem to do that. Any ideas... ?

Show us the queries you have come up with and why they don't work and
provide some sample data (does not have to be real data).

Is there more than one table involved?

What do you consider "many keywords"?

Why are you doing "self-joins"?

Do the keywords map to more than one product?

Is this for a school project or some job that you have?

Mar 22 '06 #2

sks

"noone" <no***@nowhere.com> wrote in message
news:f2********************************@www.firstd basource.com...

sks wrote:
I have a table that contains keywords (Varchars) each one mapped to a
product. so the database schema is just an id column, product column and
keyword column.

I want to select the products that contain x,y,z keywords. Now if this
query
involves many keywords I end up with a massive amount of self joins on
the
keywords table, is there a better way to do this than self joins? What
would
work perhaps is multiple unions where I could discard non duplicated rows
but you can't seem to do that. Any ideas... ?

Show us the queries you have come up with and why they don't work and
provide some sample data (does not have to be real data).
Is there more than one table involved?
What do you consider "many keywords"?

Why are you doing "self-joins"?
Do the keywords map to more than one product?
Is this for a school project or some job that you have?

It's for a software package I write.

The keywords are used as meta data about an item. Each of the keywords is
for a particular attribute that is relevant for certain items only. This is
all definable by the user at runtime. Let's say we had a site selling DVDs,
the attributes in this case might be Director, Actor, and Genre. Each of
these attributes might have multiple values (or none).

Here is a simple schema to demonstrate

Items
--------------
id
name

Attributes
----------
id
name

Attributes_Values
----------
id
item
value

I might want to find all DVDs that had Clint Eastwood, or Tommy Lee Jones,
or Richard Burton in as Actor, and where Comedies, Action or Drama in Genre.

I could self join like this

select distinct i.* from item i join attributes_values av1 on i.id=av1.item
join attributes_values av2 on i.id=av2.item where (av1.value='Clint
Eastwood' or av1.value='Tommy Lee Jones' ......) and av2.value='Comedy' or
av2.value='Drama' ....) where i.status='Live' .... (other item related
clauses here).

This runs extremely slowly when I have more than a few attributes. 9 or 10
is quite common on some items and then you could be searching for 5 to 10
values in each attribute.

My faster solution (and the one I use ) is to select the item column from
the attribute_values table and union it for each different attribute I am
searching against, then group that result and select from it where count is
equal to the number of attributes I searched for and then join that against
the item table. It's fast but I think ugly.

I am wondering if there is an obvious solution I have missed.

Mar 24 '06 #3

Bill Karwin

"sks" <sk*@privacy.me> wrote in message
news:GI******************************@eclipse.net. uk...

My faster solution (and the one I use ) is to select the item column from
the attribute_values table and union it for each different attribute I am
searching against, then group that result and select from it where count
is equal to the number of attributes I searched for and then join that
against the item table. It's fast but I think ugly.

This solution also scales much better than the self-join method. Keep in
mind that MySQL has a limit of 31 joined tables per query (or 63 if you use
a 64-bit operating system). It's likely that a 31-way join performs very
poorly.

The Entity-Attribute-Value data model (or EAV) is hard to use right, and it
violates a few principles of relational modeling and normalization.

See also my reply to today's thread with subject, "Joining data from 2
fields into the first one".

Regards,
Bill K.

Mar 24 '06 #4

Similar topics

Trying to SELECT a SELF JOIN Max value

by: Denis St-Michel | last post by:

Hello All, Hope some Guru will be able to help me with this. Let's take this example table A ------------------------------------------------------------------------------- id | ...

MySQL Database

Alternative to self-joins??

by: nuked | last post by:

I have a table that has values of variables for certain entities. The columns of interest are targetID, variableID, and valueID. A row (1, 5, 9) means that target number 1 has a value of 9 for...

MySQL Database

Stored procedure deadlock on self with subquery

by: Andrew Mayo | last post by:

Here's a really weird one for any SQL Server gurus out there... We have observed (SQL Server 2000) scenarios where a stored procedure which (a) begins a transaction (b) inserts some rows into...

Microsoft SQL Server

complexity of nested self left joins?

by: jbm05 | last post by:

Hi, I'm curious about the computational complexity of a query I have. The query contains multiple nested self left joins, starting with a simple select, then doing a self left join with the...

Microsoft SQL Server

Can I use PostgreSQL to develop a self-organizing database system?

by: Rada Chirkova | last post by:

Hi, at NC State University, my students and I are working on a project called "self-organizing databases," please see description below. I would like to use an open-source database system for...

PostgreSQL Database

Query won't pull amounts less than zero

by: Julia Baresch | last post by:

Hi everyone, I haven't found any reference to this problem on this group or in Access 97 help. I hope someone here knows this - probably a simple thing, but not obvious. I designed a query to...

Microsoft Access / VBA

Self-Referencing tables

by: John E | last post by:

I have a question about how to make records in a table reference other records in the same table.... Suppose I'm building a small database to track some basic information on several companies....

Microsoft Access / VBA

Creating self-join table

by: davegb | last post by:

I'm trying to create a self-join table to show the relationship between employee and supervisor. In another thread, I was advised to create a SupervisorID in the employee table, a separate...

Microsoft Access / VBA

Self Joins in Access - Error Encountered.

by: Darragh | last post by:

Hi all! I'm having a bit of an issue making a self join in access (similar to the example explained on Allen Browne's excellent site - http://allenbrowne.com/ser-06.html). I've made the self...

Microsoft Access / VBA

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++