473,854 Members | 1,457 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

SELECT DISTINCT with JOIN

Hi everyone

Have a problem I would areally appreciate help with.
I have 3 tables in a standard format for a Bookshop, eg

Products
Categories
Categories_Prod ucts

the latter allowing me to have products in multiple categories.

Everthing works well except for one annoying little thing.

When an individual product (which is in more than one topcategory) is added
to the Shopping Cart it displays twice, because in my select statement I
have the Category listed. I realise I could remove the TopCategory from the
statement and that makes my DISTINCT work as I wanted, but Id prefer to have
the TopCategory as it saves me later having to another SQL query (Im already
doing one to allow me not to list category in the Statement .... but If I
can overcome this one ... then I can remove this as well).

Here is my table structure (the necessary bits)
products
idProduct int
....

categories
idcategory int
idParentCategor y int
topcategory int
...

categories_prod ucts
idCatProd int
idProduct int
idCategory

When I run a query such as

SELECT DISTINCT a.idProduct, a.description,a .descriptionLon g,
a.listPrice,a.p rice,a.smallIma geUrl,a.stock, a.fileName,a.no ShipCharge,
c.topcategory
FROM products a, categories_prod ucts b, categories c
WHERE active = -1 AND homePage = -1
AND a.idProduct = b.idProduct
AND c.idcategory=b. idcategory
AND prodType = 1 ORDER BY a.idProduct DESC

This will return all products as expected, as well as any products which are
in more than one TopCategory.

Any ideas how to overcome this would be greatly appreciated.

Cheers

Craig
Jul 20 '05
14 45476
Hi Hugo

Hmmmm. Since SQL Server still won't choose for you, allow me to do it for
you. The following two queries will select the "first" topcategory (I
don't know the datatype, so it will either be the lowest numbered, the one
that comes first in the alphabet or the oldest).
Thanks for that ... it was your previous posting that gave me an idea about
the MAX. Good plan :-)

2: Using a subquery.

SELECT a.idProduct, a.description, a.descriptionLo ng,
a.listPrice, a.price, a.smallImageUrl , a.stock,
a.fileName, a.noShipCharge,
(SELECT MIN(c.topcatego ry)
FROM categories_prod ucts b, categories c
WHERE a.idProduct = b.idProduct
AND c.idcategory = b.idcategory)
FROM products a
WHERE a.active = -1
AND a.homePage = -1
AND a.prodType = 1
ORDER BY a.idProduct DESC

Both queries are untested. Test them both and check the results. Then
compare the execution plan to see which is most efficient with your data.
If you see little difference, pick the one you like best :-)


This last one looked the best.

From a cursory glance, which would be better, yours where the SELECT is a
subquery (forgove my terminiology) or mine where it is in the WHERE clause?

Both provide the same results (it appears) and appear OK performace wise ...
just wondering which is better for a larger number of records?

Thanks again for all your help ... truly very much appreciated.

Cheers

Craig
Jul 20 '05 #11
Hi Erland

Thanks for the reply.

I note you and Hugo both placed the "select Max()" in the SELECT part, as
opposed to one I made where I placed it in the WHERE clause.

Am I correct in assuming that, given you guys and your experience, that
yours will be more efficient ... since you both seem to have choosen the
same approach?

Cheers

Craig

"Erland Sommarskog" <so****@algonet .se> wrote in message
news:Xn******** **************@ 127.0.0.1...
Craig Hoskin (no****@infobah n.co.nz) writes:
Have a problem I would areally appreciate help with.
I have 3 tables in a standard format for a Bookshop, eg

Products
Categories
Categories_Prod ucts

the latter allowing me to have products in multiple categories.

Everthing works well except for one annoying little thing.

When an individual product (which is in more than one topcategory) is
added to the Shopping Cart it displays twice, because in my select
statement I have the Category listed. I realise I could remove the
TopCategory from the statement and that makes my DISTINCT work as I
wanted, but Id prefer to have the TopCategory as it saves me later
having to another SQL query (Im already doing one to allow me not to
list category in the Statement .... but If I can overcome this one ...
then I can remove this as well).


You want to display a distinct list, but you only want one TopCategory.
So which one do you want?

Well, assuming you only want the highest, you could do:

SELECT a.idProduct, a.description,a .descriptionLon g,
a.listPrice,a.p rice,a.smallIma geUrl,a.stock, a.fileName,a.no ShipCharge,
MAX(c.topcatego ry)
FROM products a, categories_prod ucts b, categories c
WHERE active = -1 AND homePage = -1
AND a.idProduct = b.idProduct
AND c.idcategory=b. idcategory
AND prodType = 1 ORDER BY a.idProduct DESC
GROUP BY a.idProduct, a.description,a .descriptionLon g,
a.listPrice,a.p rice,a.smallIma geUrl,a.stock,
a.fileName,a.no ShipCharge

--
Erland Sommarskog, SQL Server MVP, so****@algonet. se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #12
On Fri, 14 May 2004 10:08:16 +1200, Craig Hoskin wrote:

(snip)
This last one looked the best.

From a cursory glance, which would be better, yours where the SELECT is a
subquery (forgove my terminiology) or mine where it is in the WHERE clause?
Hi Craig,

Your query seems to contain some redundant joins. This may not hurt you if
the optimizer "sees" that it doesn't need to perform the join twice, but
I'm not sure if the optimizer is intelligent enough for that.

Both provide the same results (it appears) and appear OK performace wise ...
just wondering which is better for a larger number of records?


Without testing, I'd say mine is quicker (probably) or they have equal
performance (slight chance). But you'd have to test to be sure. Make sure
you have an adequate (ie large) set of test data with as typical
distribution as possible. Run both queries with SET STATISTICS IO. Also,
compare the execution plans.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Jul 20 '05 #13
On Fri, 14 May 2004 00:48:02 +0200, Hugo Kornelis wrote:
On Fri, 14 May 2004 10:08:16 +1200, Craig Hoskin wrote:

(snip)
This last one looked the best.

From a cursory glance, which would be better, yours where the SELECT is a
subquery (forgove my terminiology) or mine where it is in the WHERE clause?


Hi Craig,

Your query seems to contain some redundant joins. This may not hurt you if
the optimizer "sees" that it doesn't need to perform the join twice, but
I'm not sure if the optimizer is intelligent enough for that.


Forgot to mention - my first suggestion (identical to Erland's) has a high
probability of being the fastest of them all (though it does depend on
what indexes are available for the optimizer to toy with).

I suspect the long group by list is what makes this one "look" less good
to you. However, this will probably look better to the optimizer than the
correlated subqueries in my second query and in your post. Also, do keep
in mind that you can mentally replace the entire group by list by "all
columns in the select clause except topcategory".

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Jul 20 '05 #14
Craig Hoskin (no****@infobah n.co.nz) writes:
I note you and Hugo both placed the "select Max()" in the SELECT part, as
opposed to one I made where I placed it in the WHERE clause.

Am I correct in assuming that, given you guys and your experience, that
yours will be more efficient ... since you both seem to have choosen the
same approach?


For this case it is probably more effecient. It was also less typing...

--
Erland Sommarskog, SQL Server MVP, so****@algonet. se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
3672
by: Colman | last post by:
Howdy all! I guess I'm a newbie, because I am stumped (or maybe just too durned tired). Here's what I got... CREATE TABLE `nodecat_map` ( `nodecat_id` mediumint(8) unsigned NOT NULL auto_increment, `nodecat_cat_id` mediumint(8) unsigned NOT NULL default '0', `nodecat_node_id` mediumint(8) unsigned NOT NULL default '0', PRIMARY KEY (`nodecat_id`),
5
1819
by: Reestit Mutton | last post by:
Hi, I'm currently learning the ropes with CSS, PHP, MYSQL, Javascript etc... by redesigning my website as a database driven site. Okay, so I'm skilled at perl, data manipulation and data analysis and I've done a fair bit of (non CSS) HTML in the past and know what a relational database is. However, I'm having problems trying to formulate a particular MYSQL query (maybe I just don't know the lingo or maybe I've designed my data...
6
2806
by: RCS | last post by:
I've been running into more and more complexity with an application, because as time goes on - we need more and more high-level, rolled-up information. And so I've created views, and views that use other views.. and the queries are getting slower and slower. This morning, I'm working on something like this: select <some columns>, "calculatedcolumn" = (select top 1 crap from stuff where
3
6477
by: Tcs | last post by:
My backend is DB2 on our AS/400. While I do HAVE DB2 PE for my PC, I haven't loaded it yet. I'm still using MS Access. And no, I don't believe this is an Access question. (But who knows? I COULD be wrong... :) I've tried the access group...twice...and all I get is "Access doesn't like ".", which I know, or that my query names are too long, as there's a limit to the length of the SQL statement(s). But this works when I don't try to...
23
82065
by: Brian | last post by:
Hello All - I am wondering if anyone has any thoughts on which is better from a performance perspective: a nested Select statement or an Inner Join. For example, I could do either of the following: SELECT supplier_name FROM supplier
9
10916
by: Kelvin | last post by:
Okay so this is baking my noodle. I want to select all the attritbutes/fields from a table but then to excluded any row in which a single attributes data has been duplicated. I.E. Here's my table:- ID Ref Name DATE 1 AAA Joe 1/2 2 BBB Ken 1/2 3 AAA Len 6/3
2
7500
by: Andrew Lias | last post by:
Let us say that I have a table with two sets of values as such: Item Extension --- ---- 100023 1 100025 1 100025 2 100028 1 100029 1 100029 2
6
13772
by: Bob Stearns | last post by:
I am getting duplicate rows back from a select distinct statement of the form: SELECT DISTINCT 'jhough', '000111', t0.bhid FROM (SELECT lots of good stuff) t0 LEFT OUTER JOIN another_table t1 ON relevant_stuff WHERE (lots of conditions) After re-reading the relevant pat ofVol 1 of the SQL Reference I am unablee to see how this is possible.
22
12509
by: MP | last post by:
vb6,ado,mdb,win2k i pass the sql string to the .Execute method on the open connection to Table_Name(const) db table fwiw (the connection opened via class wrapper:) msConnString = "Data Source=" & msDbFilename moConn.Properties("Persist Security Info") = False moConn.ConnectionString = msConnString moConn.CursorLocation = adUseClient moConn.Mode = adModeReadWrite' or using default...same result
4
10693
tjc0ol
by: tjc0ol | last post by:
Hi guys, I'm a newbie in php and I got error in my index.php which is: 1054 - Unknown column 'p.products_id' in 'on clause' select p.products_image, pd.products_name, p.products_id, p.manufacturers_id, p.products_price, p.products_tax_class_id, IF(s.status, s.specials_new_products_price, NULL) as specials_new_products_price, IF(s.status, s.specials_new_products_price, p.products_price) as final_price from products_description pd,...
0
9752
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11031
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9517
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7918
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
7082
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5744
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5942
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4563
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4159
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.