473,703 Members | 2,361 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Settle a disagreement with my boss re SELECT DISTINCT

My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record. She states that with disctinct the query stops
as soon as it finds a matching record. Both of us are relative novices
in database theory (obviously).

Can someone help settle this?

Nov 13 '05 #1
18 3051
mathilda wrote:
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record. She states that with disctinct the query stops
as soon as it finds a matching record. Both of us are relative
novices in database theory (obviously).

Can someone help settle this?


It would vary from query to query, but I would expect DISTINCT to either be
slower or a tie, never faster. If your query is "unintellig ent" enough to
force a table scan then it is going to do that regardless of whether you
have the DISTINCT clause or not. After all, the engine wouldn't "know" that
you are only looking for one record and it won't know if other rows are
duplicates until AFTER they have been looked at.

Perhaps if a TOP 1 clause were added?
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com
Nov 13 '05 #2
Thanks, looks like I win. But want do you mean by "forcing a table
scan"

Nov 13 '05 #3
mathilda wrote:
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record. She states that with disctinct the query stops
as soon as it finds a matching record. Both of us are relative novices
in database theory (obviously).


This seems entirely illogical.

From BOL: "The DISTINCT keyword eliminates duplicate rows from the
results of a SELECT statement. If DISTINCT is not specified, all rows
are returned, including duplicates."

So if there is only one row in the result set then there is none to
eliminate. If there are more then they will have to be tested for
duplicity (!). In the first case any difference would be unnoticeable;
in the second case the difference might be noticeable to an atomic clock.

Of course if you are using ODBC someone else would know.

--
Lyle
--
From ADO28.chm

Deprecated Components
Each of the following components is considered obsolete. While these
components are still supported in this release of the Microsoft® Data
Access Components (MDAC), they may be removed in the future. When
writing new applications, you should avoid using these deprecated
components. When modifying existing applications, you are strongly
encouraged to remove any dependency on these components.

ODBC Provider (MSDASQL)
You are strongly encouraged to use one of the native OLE DB Providers
instead of the Microsoft Open Database Connectivity (ODBC) Provider.
Native OLE DB Providers provide better application stability and
performance. Furthermore, native OLE DB Providers will be supported in
the future, whereas MSDASQL will not have any new features added to it,
will not be available on 64-bit, and will not be accessible from the OLE
DB NET Data Provider.

.....
Nov 13 '05 #4
mathilda wrote:
Thanks, looks like I win. But want do you mean by "forcing a table
scan"


If your query's WHERE clause is against an indexed field then only the index
is scanned and then the matching rows retrieved. If it is against a
non-indexed field then the entire table is scanned to look for matches.

--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com
Nov 13 '05 #5

mathilda wrote:
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record. She states that with disctinct the query stops
as soon as it finds a matching record. Both of us are relative novices in database theory (obviously).

Can someone help settle this?


You are correct: the DISTINCT command will actually slow the query.
First, the SELECT command is performed, and then the result set is
further processed to return the DISTINCT result only. SQL Server
processes DISTINCT like an aggregate function, it doesn't just stop
when it gets to one result.

The proof in in the pudding: open Query Analyzer, tell it to show the
Execution plan and then run the query with and without DISTINCT.
Probably the Execution Time will read "0:00:00" for both queries, but
you can show your boss the extra processing step.

Nov 13 '05 #6
"mathilda" <sm***********@ yahoo.com> wrote in message
news:11******** **************@ o13g2000cwo.goo glegroups.com.. .
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record.


There is some important things to be aware of here:

* since you are using sql server, then when you return ONLY one record
(either example....via a condition, or by the distinct) only ONE record is
transfer down the wire. Thus, assuming that both approaches can readably
return the ONE record in a short time, you as a end user likely will not
experience ANY difference. In other words, the LARGE time issue here is
transferring the one record...not that time of finding the one record. It is
possible that one approach might be faster then the other (say 1 / 10,000 of
a second), however, the time taken to transfer the data is a few 100's of s
second, so that 10,000 times faster to FIND and retrieve the record will NOT
be noticed by you. The MOST important issue here is that ONE record is being
returned down the wire.

* So, we have two issues:
Time to transfer the record (AFTER it has been found)
Time to find the ONE record, either via Distinct, or by some keyID
etc.

I have little doubt that using a condition to return one record is
considerably faster then telling the database to figure things out, and
NEVER return duplicates. Telling the database system to not return
duplicates
takes a lot of work. If you can return the record via:

select * from tblCustomer where custid = 123

The above is most certainly better then:

select distinct * from tblCustomer where custid = 123

However, adding the distinct keyword in the above is NOT going to improve
speed if ONLY one record exists. The time difference would be minimal..but
adding distinct would increase things by some amount...but likely not even a
amount that you can measure).

However, what about when there is going to be multiple records returned, and
we ONLY want one record?

Again:

select top 1 from tblCustomer where custid = 123

This I think again would be faster then distinct, with distinct all possible
matches
of custID = 123 have to be tested. There could be 50, or even 2000 records
with a custid = 123....and making distinct work on that can really cost.
With
a top 1, then only the first match need be returned.

However, you question was about one record, and all things being
equal. So, if only one record is to be normally retuned, and the distinct is
NOT needed, then I would leave it out. I am CERTAIN that putting
in distinct is NOT faster. The amount that distinct would be slower
by is likely not measure in this case, .but it would be a tiny tiny bit
slower.
--
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada
pl************* ****@msn.com
http://www.members.shaw.ca/AlbertKallal

Nov 13 '05 #7
On Feb 09 2005, 10:36 am, Lyle Fairfield <ly******@yahoo .ca> wrote in
news:M5******** ***********@rea d2.cgocable.net :
So if there is only one row in the result set then there is none to
eliminate. If there are more then they will have to be tested for
duplicity (!). In the first case any difference would be unnoticeable;
in the second case the difference might be noticeable to an atomic clock.


Actually the difference may be quite noticeable in the second case,
depending on the size of data set. This is similar to the difference (!)
between UNION and UNION ALL.

Paste the following into QA, hit Ctrl+L, and compare the plans:

use pubs

select state
from dbo.authors
go

select distinct state
from dbo.authors
go

--
remove a 9 to reply by email
Nov 13 '05 #8
Dimitri Furman wrote:
On Feb 09 2005, 10:36 am, Lyle Fairfield <ly******@yahoo .ca> wrote in
news:M5******** ***********@rea d2.cgocable.net :

So if there is only one row in the result set then there is none to
eliminate. If there are more then they will have to be tested for
duplicity (!). In the first case any difference would be unnoticeable;
in the second case the difference might be noticeable to an atomic clock.

Actually the difference may be quite noticeable in the second case,
depending on the size of data set. This is similar to the difference (!)
between UNION and UNION ALL.

Paste the following into QA, hit Ctrl+L, and compare the plans:

use pubs

select state
from dbo.authors
go

select distinct state
from dbo.authors
go


I feel
"We are linking an
Access front end to a SQL Server back end and normally are only
returning one record."
implies a very small number of records.

--
--
Lyle
--
Nov 13 '05 #9
Dimitri Furman wrote:
On Feb 09 2005, 10:36 am, Lyle Fairfield <ly******@yahoo .ca> wrote in
news:M5******** ***********@rea d2.cgocable.net :

So if there is only one row in the result set then there is none to
eliminate. If there are more then they will have to be tested for
duplicity (!). In the first case any difference would be unnoticeable;
in the second case the difference might be noticeable to an atomic clock.

Actually the difference may be quite noticeable in the second case,
depending on the size of data set. This is similar to the difference (!)
between UNION and UNION ALL.

Paste the following into QA, hit Ctrl+L, and compare the plans:

use pubs

select state
from dbo.authors
go

select distinct state
from dbo.authors
go


BTW the time difference is not so noticeable here.
--
--
Lyle
--
From ADO28.chm

Deprecated Components
Each of the following components is considered obsolete. While these
components are still supported in this release of the Microsoft® Data
Access Components (MDAC), they may be removed in the future. When
writing new applications, you should avoid using these deprecated
components. When modifying existing applications, you are strongly
encouraged to remove any dependency on these components.

ODBC Provider (MSDASQL)
You are strongly encouraged to use one of the native OLE DB Providers
instead of the Microsoft Open Database Connectivity (ODBC) Provider.
Native OLE DB Providers provide better application stability and
performance. Furthermore, native OLE DB Providers will be supported in
the future, whereas MSDASQL will not have any new features added to it,
will not be available on 64-bit, and will not be accessible from the OLE
DB NET Data Provider.

Remote Data Services (RDS)
Remote Data Services (RDS) is a proprietary Microsoft mechanism for
accessing remote data across the Internet or intranet. Microsoft is now
shipping the Microsoft Simple Object Access Protocol (SOAP) Toolkit 2.0
that enables you to access remote data using an open, XML-based
standard. Given the availability of the SOAP Toolkit 2.0, you should
migrate from RDS to SOAP. The SOAP 2.0 Toolkit 2.0 also includes sample
code for remotely accessing Microsoft ActiveX® Data Objects (ADO)
Recordsets.

Jet and Replication Objects (JRO)
The Microsoft Jet OLE DB Provider and other related components were
removed from MDAC 2.6. Microsoft has deprecated the Microsoft Jet
Engine, and plans no new releases or service packs for this component.
As a result, the Jet and Replication Objects (JRO) is being deprecated
in this release and will not be available in any future MDAC releases.

.....
Nov 13 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
11687
by: Martin Feuersteiner | last post by:
Dear Group I'm having trouble with the clause below. I would like to select only records with a distinct TransactionDate but somehow it still lists duplicates. I need to select the TransactionDate once as smalldatetime and once as varchar as I'm populating a drop-down with Text/Value pairs. So I can't just use 'SELECT DISTINCT TransactionDate FROM...' I'm grateful for any hints.
3
10558
by: blue | last post by:
I'm trying to order a varchar column first numerically, and second alphanumerically using the following SQL: SELECT distinct doc_number FROM doc_line WHERE product_id = 'WD' AND doc_type = 'O' ORDER BY CASE WHEN IsNumeric(doc_number) = 1 THEN CONVERT(FLOAT, doc_number) ELSE 999999999 END,
2
12545
by: mfyahya | last post by:
I have two tables, both containing an 'authors' column. Is there a way to get a unique list of authors from the two tables? I tried SELECT DISTINCT `authors` from `table1`, `table2`; but I got an "Column 'authors' in field list is ambiguous" error. Is there also a query to return only the count of distinct authors from the two tables? Thanks for any help.
3
6468
by: Tcs | last post by:
My backend is DB2 on our AS/400. While I do HAVE DB2 PE for my PC, I haven't loaded it yet. I'm still using MS Access. And no, I don't believe this is an Access question. (But who knows? I COULD be wrong... :) I've tried the access group...twice...and all I get is "Access doesn't like ".", which I know, or that my query names are too long, as there's a limit to the length of the SQL statement(s). But this works when I don't try to...
9
10888
by: Kelvin | last post by:
Okay so this is baking my noodle. I want to select all the attritbutes/fields from a table but then to excluded any row in which a single attributes data has been duplicated. I.E. Here's my table:- ID Ref Name DATE 1 AAA Joe 1/2 2 BBB Ken 1/2 3 AAA Len 6/3
6
5551
by: John M | last post by:
Hi, The line below is used to feed a combobox. (It is from a database which is used to log pupil behaviour!) The 'incidents' table contains a list of students who have been involved in incidents. Some may appear several times, hence the Distinct. However, thelist generated should still be sorted by Surname. When I add Order by I'm told it conflict with the 'Distinct'. Surely it does not? SELECT DISTINCT ., . & " " & . FROM...
3
5652
by: orekinbck | last post by:
Hi There Our test database has duplicate data: COMPANYID COMPANYNAME 1 Grupple Group 2 Grupple Group 5 Grupple Group 3 Yada Inc 4 Yada Inc
6
13563
by: Bob Stearns | last post by:
I am getting unwanted duplicate rows in my result set, so I added the DISTINCT keyword to my outermost SELECT. My working query then returned the following message: DB2 SQL error: SQLCODE: -214, SQLSTATE: 42822, SQLERRMC: CASE...;ORDER BY;2 Message: An expression in the ORDER BY clause in the following position, or starting with "CASE..." in the "ORDER BY" clause is not valid. Reason code = "2".  More exceptions ... DB2 SQL error:...
5
6894
by: Daniel Wetzler | last post by:
Dear MSSQL experts, I use MSSQL 2000 and encountered a strange problem wqhile I tried to use a select into statement . If I perform the command command below I get only one dataset which has the described properties. If I use the same statement in a select into statement (see the second select) I get several datasets with the described properties like I didn't use distinct
0
8759
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9122
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9017
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7872
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5922
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4687
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3125
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2453
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2069
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.