By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,244 Members | 1,772 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,244 IT Pros & Developers. It's quick & easy.

Settle a disagreement with my boss re SELECT DISTINCT

P: n/a
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record. She states that with disctinct the query stops
as soon as it finds a matching record. Both of us are relative novices
in database theory (obviously).

Can someone help settle this?

Nov 13 '05 #1
Share this Question
Share on Google+
18 Replies


P: n/a
mathilda wrote:
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record. She states that with disctinct the query stops
as soon as it finds a matching record. Both of us are relative
novices in database theory (obviously).

Can someone help settle this?


It would vary from query to query, but I would expect DISTINCT to either be
slower or a tie, never faster. If your query is "unintelligent" enough to
force a table scan then it is going to do that regardless of whether you
have the DISTINCT clause or not. After all, the engine wouldn't "know" that
you are only looking for one record and it won't know if other rows are
duplicates until AFTER they have been looked at.

Perhaps if a TOP 1 clause were added?
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com
Nov 13 '05 #2

P: n/a
Thanks, looks like I win. But want do you mean by "forcing a table
scan"

Nov 13 '05 #3

P: n/a
mathilda wrote:
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record. She states that with disctinct the query stops
as soon as it finds a matching record. Both of us are relative novices
in database theory (obviously).


This seems entirely illogical.

From BOL: "The DISTINCT keyword eliminates duplicate rows from the
results of a SELECT statement. If DISTINCT is not specified, all rows
are returned, including duplicates."

So if there is only one row in the result set then there is none to
eliminate. If there are more then they will have to be tested for
duplicity (!). In the first case any difference would be unnoticeable;
in the second case the difference might be noticeable to an atomic clock.

Of course if you are using ODBC someone else would know.

--
Lyle
--
From ADO28.chm

Deprecated Components
Each of the following components is considered obsolete. While these
components are still supported in this release of the Microsoft® Data
Access Components (MDAC), they may be removed in the future. When
writing new applications, you should avoid using these deprecated
components. When modifying existing applications, you are strongly
encouraged to remove any dependency on these components.

ODBC Provider (MSDASQL)
You are strongly encouraged to use one of the native OLE DB Providers
instead of the Microsoft Open Database Connectivity (ODBC) Provider.
Native OLE DB Providers provide better application stability and
performance. Furthermore, native OLE DB Providers will be supported in
the future, whereas MSDASQL will not have any new features added to it,
will not be available on 64-bit, and will not be accessible from the OLE
DB NET Data Provider.

.....
Nov 13 '05 #4

P: n/a
mathilda wrote:
Thanks, looks like I win. But want do you mean by "forcing a table
scan"


If your query's WHERE clause is against an indexed field then only the index
is scanned and then the matching rows retrieved. If it is against a
non-indexed field then the entire table is scanned to look for matches.

--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com
Nov 13 '05 #5

P: n/a

mathilda wrote:
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record. She states that with disctinct the query stops
as soon as it finds a matching record. Both of us are relative novices in database theory (obviously).

Can someone help settle this?


You are correct: the DISTINCT command will actually slow the query.
First, the SELECT command is performed, and then the result set is
further processed to return the DISTINCT result only. SQL Server
processes DISTINCT like an aggregate function, it doesn't just stop
when it gets to one result.

The proof in in the pudding: open Query Analyzer, tell it to show the
Execution plan and then run the query with and without DISTINCT.
Probably the Execution Time will read "0:00:00" for both queries, but
you can show your boss the extra processing step.

Nov 13 '05 #6

P: n/a
"mathilda" <sm***********@yahoo.com> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record.


There is some important things to be aware of here:

* since you are using sql server, then when you return ONLY one record
(either example....via a condition, or by the distinct) only ONE record is
transfer down the wire. Thus, assuming that both approaches can readably
return the ONE record in a short time, you as a end user likely will not
experience ANY difference. In other words, the LARGE time issue here is
transferring the one record...not that time of finding the one record. It is
possible that one approach might be faster then the other (say 1 / 10,000 of
a second), however, the time taken to transfer the data is a few 100's of s
second, so that 10,000 times faster to FIND and retrieve the record will NOT
be noticed by you. The MOST important issue here is that ONE record is being
returned down the wire.

* So, we have two issues:
Time to transfer the record (AFTER it has been found)
Time to find the ONE record, either via Distinct, or by some keyID
etc.

I have little doubt that using a condition to return one record is
considerably faster then telling the database to figure things out, and
NEVER return duplicates. Telling the database system to not return
duplicates
takes a lot of work. If you can return the record via:

select * from tblCustomer where custid = 123

The above is most certainly better then:

select distinct * from tblCustomer where custid = 123

However, adding the distinct keyword in the above is NOT going to improve
speed if ONLY one record exists. The time difference would be minimal..but
adding distinct would increase things by some amount...but likely not even a
amount that you can measure).

However, what about when there is going to be multiple records returned, and
we ONLY want one record?

Again:

select top 1 from tblCustomer where custid = 123

This I think again would be faster then distinct, with distinct all possible
matches
of custID = 123 have to be tested. There could be 50, or even 2000 records
with a custid = 123....and making distinct work on that can really cost.
With
a top 1, then only the first match need be returned.

However, you question was about one record, and all things being
equal. So, if only one record is to be normally retuned, and the distinct is
NOT needed, then I would leave it out. I am CERTAIN that putting
in distinct is NOT faster. The amount that distinct would be slower
by is likely not measure in this case, .but it would be a tiny tiny bit
slower.
--
Albert D. Kallal (Access MVP)
Edmonton, Alberta Canada
pl*****************@msn.com
http://www.members.shaw.ca/AlbertKallal

Nov 13 '05 #7

P: n/a
On Feb 09 2005, 10:36 am, Lyle Fairfield <ly******@yahoo.ca> wrote in
news:M5*******************@read2.cgocable.net:
So if there is only one row in the result set then there is none to
eliminate. If there are more then they will have to be tested for
duplicity (!). In the first case any difference would be unnoticeable;
in the second case the difference might be noticeable to an atomic clock.


Actually the difference may be quite noticeable in the second case,
depending on the size of data set. This is similar to the difference (!)
between UNION and UNION ALL.

Paste the following into QA, hit Ctrl+L, and compare the plans:

use pubs

select state
from dbo.authors
go

select distinct state
from dbo.authors
go

--
remove a 9 to reply by email
Nov 13 '05 #8

P: n/a
Dimitri Furman wrote:
On Feb 09 2005, 10:36 am, Lyle Fairfield <ly******@yahoo.ca> wrote in
news:M5*******************@read2.cgocable.net:

So if there is only one row in the result set then there is none to
eliminate. If there are more then they will have to be tested for
duplicity (!). In the first case any difference would be unnoticeable;
in the second case the difference might be noticeable to an atomic clock.

Actually the difference may be quite noticeable in the second case,
depending on the size of data set. This is similar to the difference (!)
between UNION and UNION ALL.

Paste the following into QA, hit Ctrl+L, and compare the plans:

use pubs

select state
from dbo.authors
go

select distinct state
from dbo.authors
go


I feel
"We are linking an
Access front end to a SQL Server back end and normally are only
returning one record."
implies a very small number of records.

--
--
Lyle
--
Nov 13 '05 #9

P: n/a
Dimitri Furman wrote:
On Feb 09 2005, 10:36 am, Lyle Fairfield <ly******@yahoo.ca> wrote in
news:M5*******************@read2.cgocable.net:

So if there is only one row in the result set then there is none to
eliminate. If there are more then they will have to be tested for
duplicity (!). In the first case any difference would be unnoticeable;
in the second case the difference might be noticeable to an atomic clock.

Actually the difference may be quite noticeable in the second case,
depending on the size of data set. This is similar to the difference (!)
between UNION and UNION ALL.

Paste the following into QA, hit Ctrl+L, and compare the plans:

use pubs

select state
from dbo.authors
go

select distinct state
from dbo.authors
go


BTW the time difference is not so noticeable here.
--
--
Lyle
--
From ADO28.chm

Deprecated Components
Each of the following components is considered obsolete. While these
components are still supported in this release of the Microsoft® Data
Access Components (MDAC), they may be removed in the future. When
writing new applications, you should avoid using these deprecated
components. When modifying existing applications, you are strongly
encouraged to remove any dependency on these components.

ODBC Provider (MSDASQL)
You are strongly encouraged to use one of the native OLE DB Providers
instead of the Microsoft Open Database Connectivity (ODBC) Provider.
Native OLE DB Providers provide better application stability and
performance. Furthermore, native OLE DB Providers will be supported in
the future, whereas MSDASQL will not have any new features added to it,
will not be available on 64-bit, and will not be accessible from the OLE
DB NET Data Provider.

Remote Data Services (RDS)
Remote Data Services (RDS) is a proprietary Microsoft mechanism for
accessing remote data across the Internet or intranet. Microsoft is now
shipping the Microsoft Simple Object Access Protocol (SOAP) Toolkit 2.0
that enables you to access remote data using an open, XML-based
standard. Given the availability of the SOAP Toolkit 2.0, you should
migrate from RDS to SOAP. The SOAP 2.0 Toolkit 2.0 also includes sample
code for remotely accessing Microsoft ActiveX® Data Objects (ADO)
Recordsets.

Jet and Replication Objects (JRO)
The Microsoft Jet OLE DB Provider and other related components were
removed from MDAC 2.6. Microsoft has deprecated the Microsoft Jet
Engine, and plans no new releases or service packs for this component.
As a result, the Jet and Replication Objects (JRO) is being deprecated
in this release and will not be available in any future MDAC releases.

.....
Nov 13 '05 #10

P: n/a
Unfortunately, I am dealing with a large table (30 fields, 40,000
records) with no indexes. Yes, no indexes. And no, I can't get them
to index anything (it's not broke, so we're not risking it). A query
without a ton of limiters can take several seconds, tens of seconds if
I have to use a like, and I can't even consider using an aggregate.
Given what I have, I've been trying to snare every second possible on
calls to the database.

Nov 13 '05 #11

P: n/a
"Albert D. Kallal" <ka****@msn.com> wrote in
news:uWwOd.344456$Xk.206929@pd7tw3no:
select top 1 from tblCustomer where custid = 123


Without an ORDER BY you'll never know what you're going to get,
unless CUSTID is the primary key. In that case, DISTINCT and TOP are
completely useless.

--
David W. Fenton http://www.bway.net/~dfenton
dfenton at bway dot net http://www.bway.net/~dfassoc
Nov 13 '05 #12

P: n/a
On 9 Feb 2005 18:15:05 -0800, "mathilda" <sm***********@yahoo.com>
wrote:
Unfortunately, I am dealing with a large table (30 fields, 40,000
records) with no indexes. Yes, no indexes. And no, I can't get them
to index anything (it's not broke, so we're not risking it).

<clip>

Wow, that's quite a position to take. <insert extrapolations into
absurdness here>
I would propose a beta test on a copy of the database. It should show
lots of payoff and no pain.
If the concern is with unique indexes, then one should realize that in
addition to performance they serve the important function of enforcing
business rules ("in our company, the OrderNumber + CustomerNumber
combination should be unique"), which can only reliably be done at the
DBMS level.

-Tom.

Nov 13 '05 #13

P: n/a
mathilda wrote:
Unfortunately, I am dealing with a large table (30 fields, 40,000
records) with no indexes. Yes, no indexes. And no, I can't get them
to index anything (it's not broke, so we're not risking it). A query
without a ton of limiters can take several seconds, tens of seconds if
I have to use a like, and I can't even consider using an aggregate.
Given what I have, I've been trying to snare every second possible on
calls to the database.


Select * vs Select distinct * is the least of your worries about speed,
if you think it's not broken...

--
This sig left intentionally blank
Nov 13 '05 #14

P: n/a
I think it is broken as all hell, but it is what I have to work with.
Problem was that they bought a prepackaged software system 4 years ago
but didn't have anyone with database knowledge when they set it up and
molded it to their business plan. Our main product is serialized, but
serial number is not indexed. In fact, I'm trying to work around
situations where two different products have the same serial number.

As a follow question on this very long thread, does having three or
four criteria in a where clause speed up the search through a pile of
records?

Nov 13 '05 #15

P: n/a
mathilda wrote:
I think it is broken as all hell, but it is what I have to work with.
Problem was that they bought a prepackaged software system 4 years ago
but didn't have anyone with database knowledge when they set it up and
molded it to their business plan. Our main product is serialized, but
serial number is not indexed. In fact, I'm trying to work around
situations where two different products have the same serial number.

As a follow question on this very long thread, does having three or
four criteria in a where clause speed up the search through a pile of
records?


It will speed up the part where the data is coming over the LAN because you
are pulling fewer rows the more you narrow the criteria. Without indexes it
will not speed up the actual processing of the query on the server. Once
you are scanning the whole table you are scanning the whole table. An index
is the ONLY thing that will prevent that.
--
I don't check the Email account attached
to this message. Send instead to...
RBrandt at Hunter dot com
Nov 13 '05 #16

P: n/a
"mathilda" <sm***********@yahoo.com> wrote in
news:11**********************@f14g2000cwb.googlegr oups.com:
I think it is broken as all hell, but it is what I have to work
with. Problem was that they bought a prepackaged software system 4
years ago but didn't have anyone with database knowledge when they
set it up and molded it to their business plan. Our main product
is serialized, but serial number is not indexed. In fact, I'm
trying to work around situations where two different products have
the same serial number.
Can't ADO do temporary indexes? If you can use an ADO driver to get
to the data, maybe you can return a dataset, create a temporary
index, then do your query on that.

I'm just speculating, as I don't use ADO myself.
As a follow question on this very long thread, does having three
or four criteria in a where clause speed up the search through a
pile of records?


The question can't be answered without considering:

1. the database engine -- different ones will optimize differently.

2. the indexes

If there are no indexes, then what's happening is a table scan for
each of the criteria. Without an index, there's no way to optimize,
since the DB engine has no information on sparseness of the index
(i.e., degree of uniqueness of the values within a field).

Without indexes, 3 or 4 criteria won't make any difference.

With indexes, most db engines will optimize in ways that get them as
quickly as possible to the smallest dataset, and then do the
non-indexed operations on that.

--
David W. Fenton http://www.bway.net/~dfenton
dfenton at bway dot net http://www.bway.net/~dfassoc
Nov 13 '05 #17

P: n/a
"mathilda" <sm***********@yahoo.com> wrote:
My boss has been adamant that SELECT DISTINCT is a faster query than
SELECT all other factors being equal. I disagree. We are linking an
Access front end to a SQL Server back end and normally are only
returning one record. She states that with disctinct the query stops
as soon as it finds a matching record. Both of us are relative novices
in database theory (obviously).

Can someone help settle this?


Trouble is you never win when you disagree with a boss.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Nov 13 '05 #18

P: n/a
"mathilda" <sm***********@yahoo.com> wrote:
Unfortunately, I am dealing with a large table (30 fields, 40,000
records) with no indexes. Yes, no indexes. And no, I can't get them
to index anything (it's not broke, so we're not risking it).


But it is broke. Your query times are terrible.

Tony

--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Nov 13 '05 #19

This discussion thread is closed

Replies have been disabled for this discussion.