By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,318 Members | 2,340 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,318 IT Pros & Developers. It's quick & easy.

Best way for a Set-based query

P: n/a
I have a database where each row has a set of keywords separated by
commas eg .

ROW 1 ID,<other columns>,"green,hat,man,boat"
ROW 2 ID,<other columns>,"green,boat,seaside"
ROW 3 ID,<other columns>,"red,hat"

I want users to be able to find similar records to the one they are
looking it (this is art BTW). ie if they are looking at row 1 and
click "Find Similar Works of Art" I want to do a query where I find
records that have a commonality of keywords and obviously the ones
that match the most keywords should be shown first. The keywords are
currently just stored comma-separated in a single column but I am not
attached to this. Nor are they guaranteed to come in any order (ie
they are not alpha sorted) but they could be.

Number of records is around 15000 on a hosted server.

Any tips for best query/algorithm/data storage to achieve this? or
references to useful sources?

Thanks,
RB
Jul 23 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
Roland Burr wrote:
I have a database where each row has a set of keywords separated by
commas eg .

ROW 1 ID,<other columns>,"green,hat,man,boat"
ROW 2 ID,<other columns>,"green,boat,seaside"
ROW 3 ID,<other columns>,"red,hat"

I want users to be able to find similar records to the one they are
looking it (this is art BTW). ie if they are looking at row 1 and
click "Find Similar Works of Art" I want to do a query where I find
records that have a commonality of keywords and obviously the ones
that match the most keywords should be shown first. The keywords are
currently just stored comma-separated in a single column but I am not
attached to this. Nor are they guaranteed to come in any order (ie
they are not alpha sorted) but they could be.

Number of records is around 15000 on a hosted server.

Any tips for best query/algorithm/data storage to achieve this? or
references to useful sources?

Thanks,
RB


normally you break the keywords out into a child table, so that you have two
tables (there may be typos in my SQL, this is off the top of my head):

CREATE TABLE artwork (
ID char(10) PRIMARY KEY,
....other columns...
)

CREATE TABLE keywords (
ID char(10) FOREIGN KEY REFERENCES artwork(ID),
keyword char(10)
)

Index on keyword and the SQL queries should write themselves.

--
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth@(Sec)ure(Dat)a(.com)
Jul 23 '05 #2

P: n/a
Roland Burr (rb*****@hotmail.com) writes:
I have a database where each row has a set of keywords separated by
commas eg .

ROW 1 ID,<other columns>,"green,hat,man,boat"
ROW 2 ID,<other columns>,"green,boat,seaside"
ROW 3 ID,<other columns>,"red,hat"

I want users to be able to find similar records to the one they are
looking it (this is art BTW). ie if they are looking at row 1 and
click "Find Similar Works of Art" I want to do a query where I find
records that have a commonality of keywords and obviously the ones
that match the most keywords should be shown first. The keywords are
currently just stored comma-separated in a single column but I am not
attached to this. Nor are they guaranteed to come in any order (ie
they are not alpha sorted) but they could be.


I'm glad to hear that you are not attached to the format, because
this sort of format, is really ugly to work with in SQL, as it
violates a basic principle for relational database design. As Kenneth
said, store the keywords as rows.

Find the number of matches given one item would be as easy as this:

SELECT k1.id, COUNT(*)
JOIN keywords k1
WHERE EXISTS (SELECT *
FROM keywordss k2
WHERE k2.id = @currentid
AND k1.keyword = k2.keyword)
ORDER BY 2 DESC

But I will have to add the disclaimed that I have not tested the
above.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #3

P: n/a
>> I have a database where each row has a set of keywords separated by
commas <<

You are in violation of First Normal Form, so we are not even in a
relational model at the start of this.
a query where I find records [sic] that have a commonality of keywords <<

CREATE TABLE Keywords
(title_id INTEGER NOT NULL,
keyword VARCHAR(15) NOT NULL,
PRIMARY KEY (title_id, keyword));

load this table with your search words:

CREATE TABLE Searchwords
(keyword VARCHAR(15) NOT NULL PRIMARY KEY);

now the query is easy.

SELECT K1.title_id, COUNT(*) AS matches
Keywords AS K1,
Searchwords AS S1
WHERE K1.keyword = S1.keyword
GROUP BY K1.title_id
ORDER BY matches DESC;
... and obviously the ones that match the most keywords should be

shown first.<<

Actually, ordering is a display issue for the front end and not the
database, but you can add an ORDER BY clause. Having said all of this,
have you considered using a text search tool instead of SQL? Cheap,
fast and much more powerful for this knd of thing.

Jul 23 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.