473,786 Members | 2,712 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Best way for a Set-based query

I have a database where each row has a set of keywords separated by
commas eg .

ROW 1 ID,<other columns>,"green ,hat,man,boat"
ROW 2 ID,<other columns>,"green ,boat,seaside"
ROW 3 ID,<other columns>,"red,h at"

I want users to be able to find similar records to the one they are
looking it (this is art BTW). ie if they are looking at row 1 and
click "Find Similar Works of Art" I want to do a query where I find
records that have a commonality of keywords and obviously the ones
that match the most keywords should be shown first. The keywords are
currently just stored comma-separated in a single column but I am not
attached to this. Nor are they guaranteed to come in any order (ie
they are not alpha sorted) but they could be.

Number of records is around 15000 on a hosted server.

Any tips for best query/algorithm/data storage to achieve this? or
references to useful sources?

Thanks,
RB
Jul 23 '05 #1
3 3192
Roland Burr wrote:
I have a database where each row has a set of keywords separated by
commas eg .

ROW 1 ID,<other columns>,"green ,hat,man,boat"
ROW 2 ID,<other columns>,"green ,boat,seaside"
ROW 3 ID,<other columns>,"red,h at"

I want users to be able to find similar records to the one they are
looking it (this is art BTW). ie if they are looking at row 1 and
click "Find Similar Works of Art" I want to do a query where I find
records that have a commonality of keywords and obviously the ones
that match the most keywords should be shown first. The keywords are
currently just stored comma-separated in a single column but I am not
attached to this. Nor are they guaranteed to come in any order (ie
they are not alpha sorted) but they could be.

Number of records is around 15000 on a hosted server.

Any tips for best query/algorithm/data storage to achieve this? or
references to useful sources?

Thanks,
RB


normally you break the keywords out into a child table, so that you have two
tables (there may be typos in my SQL, this is off the top of my head):

CREATE TABLE artwork (
ID char(10) PRIMARY KEY,
....other columns...
)

CREATE TABLE keywords (
ID char(10) FOREIGN KEY REFERENCES artwork(ID),
keyword char(10)
)

Index on keyword and the SQL queries should write themselves.

--
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth@(Sec )ure(Dat)a(.com )
Jul 23 '05 #2
Roland Burr (rb*****@hotmai l.com) writes:
I have a database where each row has a set of keywords separated by
commas eg .

ROW 1 ID,<other columns>,"green ,hat,man,boat"
ROW 2 ID,<other columns>,"green ,boat,seaside"
ROW 3 ID,<other columns>,"red,h at"

I want users to be able to find similar records to the one they are
looking it (this is art BTW). ie if they are looking at row 1 and
click "Find Similar Works of Art" I want to do a query where I find
records that have a commonality of keywords and obviously the ones
that match the most keywords should be shown first. The keywords are
currently just stored comma-separated in a single column but I am not
attached to this. Nor are they guaranteed to come in any order (ie
they are not alpha sorted) but they could be.


I'm glad to hear that you are not attached to the format, because
this sort of format, is really ugly to work with in SQL, as it
violates a basic principle for relational database design. As Kenneth
said, store the keywords as rows.

Find the number of matches given one item would be as easy as this:

SELECT k1.id, COUNT(*)
JOIN keywords k1
WHERE EXISTS (SELECT *
FROM keywordss k2
WHERE k2.id = @currentid
AND k1.keyword = k2.keyword)
ORDER BY 2 DESC

But I will have to add the disclaimed that I have not tested the
above.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #3
>> I have a database where each row has a set of keywords separated by
commas <<

You are in violation of First Normal Form, so we are not even in a
relational model at the start of this.
a query where I find records [sic] that have a commonality of keywords <<

CREATE TABLE Keywords
(title_id INTEGER NOT NULL,
keyword VARCHAR(15) NOT NULL,
PRIMARY KEY (title_id, keyword));

load this table with your search words:

CREATE TABLE Searchwords
(keyword VARCHAR(15) NOT NULL PRIMARY KEY);

now the query is easy.

SELECT K1.title_id, COUNT(*) AS matches
Keywords AS K1,
Searchwords AS S1
WHERE K1.keyword = S1.keyword
GROUP BY K1.title_id
ORDER BY matches DESC;
... and obviously the ones that match the most keywords should be

shown first.<<

Actually, ordering is a display issue for the front end and not the
database, but you can add an ORDER BY clause. Having said all of this,
have you considered using a text search tool instead of SQL? Cheap,
fast and much more powerful for this knd of thing.

Jul 23 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

33
9874
by: Frank | last post by:
What is the best IDE for developing PHP applications? What do you use and why? Thanks.
1
2097
by: Chris Uwins | last post by:
Hi there, i know theres a number of ways I can achieve this but want to know the best, (but still quite simple). Up until a year ago I never used Access but have designed a few databases for work. I am working on Access 2000. I have basic SQL/VB skills - and am pretty accomplished at putting the databases together. Anyway...I've created a database to keep track of "Dayworks" we are
5
4200
by: Schoo | last post by:
I have some program constants that I need to set and I thought it would be best to set them all up in a central location so that I can change them quickly when we go from development to production. What is the best way to do that? Should I put them in web.config? Should I create a special class for these settings? Wherever it is best to write them, can you give me a line or 2 of code to show the best way to set a constant string? ...
10
3484
by: jojobar | last post by:
Hello, I am trying to use vs.net 2005 to migrate a project originally in vs.net 2003. I started with creation of a "web site", and then created folders for each component of the site. I read somewhere that each folder under the "web site" is compiled in separate assembly. I however, did not find that the "web site" creation in vs.net 2005 created any AssemblyInfo.cs file.
16
3896
by: JD | last post by:
Hi guys What's the best way to specify font size using CSS? I try to avoid absolute units like pt and px because then users can't resize the fonts in IE, but % and em are a complete pain to use IMO. I read somewhere (a W3C tip I think) that the best way is to specify a "base" size and then have all your fonts relative that, but I'm not sure how that works. TIA
8
1732
by: Tim Chase | last post by:
When you have a set, known to be of length one, is there a "best" ("most pythonic") way to retrieve that one item? # given that I've got Python2.3. on hand, # hack the following two lines to get a "set" object >>> import sets >>> set = sets.Set >>> s = set() >>> len(s)
45
8579
by: Summercoolness | last post by:
it seems that range() can be really slow: the following program will run, and the last line shows how long it ran for: import time startTime = time.time() a = 1.0
41
2887
by: Jim | last post by:
Hi guys, I have an object which represents an "item" in a CMS "component" where an "item" in the most basic form just a field, and a "component" is effectively a table. "item" objects can be created and then added to "component" objects to build up the component definition. My dilemma comes in deciding how to read/write data to the "item"
7
6564
by: PeteSpeed | last post by:
Hi, I need help with creating an algorithm to solve the problem below in the most efficient manner (efficiency in this case means low processing, not memory) (Disclaimer: I studied discrete maths as part of my university course, but this was many years ago and I have forgotten most of the correct syntax. Sorry!) Imagine we have Set A (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20) Set B (3,3,4,5,5,5,5,5,6,6,7,8,12)
6
2762
by: tig2810 | last post by:
Hi I was wondering if someone could let me know the best/correct way of doing this. I have a shortened example of code below: In the beginning of the class I have a List of string containing all records. I have added each encapsulated field to the list in the set property and I need to maintain the order in the list. Is this way best way to do this? Or should I be doing this another cleaner way. I guess I’m asking for best coding...
0
9655
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10169
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9964
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8993
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7517
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5398
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4067
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3670
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2894
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.