473,406 Members | 2,710 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Is there a more efficient way to get unique results than by using DISTINCT?

I am successfully calling around half a million records, but I have a problem with duplicate records. Without using the DISTINCT function I can call these records in half a second, but when using the DISTINCT function it takes about five minutes for Access to weed out the duplicate records. Is there a more efficient way to get rid of these duplicate records?

(I am most familiar with SQL and only moderately familiar with Access, so I am working in the SQL View. If your solution is better implemented in the Design View I can work with that fine also.)

Let me know if you need more information.
Thank you!
Feb 11 '11 #1
8 2169
code green
1,726 Expert 1GB
I have found GROUP BY to be quicker than DISTINCT.
Not much, but I have never tried with that many records.

Post the SQL and maybe somebody can suggest a more efficient query.
Feb 11 '11 #2
TheSmileyCoder
2,322 Expert Mod 2GB
Is there any possibility of removing the duplicate records? Another posibility is if you can use a smaller query (i.e. on fewer fields) to determine uniqueness and then join that query with your main query.
Feb 11 '11 #3
NeoPa
32,556 Expert Mod 16PB
Interesting question Jesse. I would have expected a DISTINCT predicate to be implemented in such a way as to be more efficient than a GROUP BY personally, but I guess there may be complications with optimisation of the basic concept when many fields are involved and particularly when non-indexed fields are involved. This is not speaking from testing, but just the idea that there is less work to do and less checking involved logically. Each implementation would have its own way of doing things though, hence I expect you're finding DISTINCT slower than GROUP BY in your case.

I'd be interested to know from you what determines your idea of duplicate records? This is typically determined by comparing the PKs, but it may be that you are comparing each field (the full record essentially), which of course, is orders of magnitude less efficient.

Smiley's idea is a good one going forward, but I'd make sure, if at all possible, that the compared field(s) are all included in a single index. Determining how practical this will be is for you, who knows more about your use of the data, to do. This is one of the areas where db design experience comes in.
Feb 11 '11 #4
Here is my code:

Expand|Select|Wrap|Line Numbers
  1. SELECT [Call Log].[Client ID], DCount("[Call ID]","[Call Log]","[Call Code]<>'NOM' AND [Client ID]=" & [Clients].[Client ID]) AS CountactsCount, [Call Log].[Call Code], Clients.[First Name], Clients.[Last Name], Clients.Sfx, Clients.[Mailing Address], Clients.City, Clients.ST, Clients.Zip, Clients.Phone, Clients.Type, Clients.[Time Zone], Clients.Gender, DCount("Phone","Clients") AS countTotal
  2. FROM Clients LEFT JOIN [Call Log] ON Clients.[Client ID] = [Call Log].[Client ID]
  3. WHERE (((DCount("[Call ID]","[Call Log]","[Call Code]<>'NOM' AND [Client ID]=" & [Clients].[Client ID]))=0));
  4.  
Here's the background: I have a table of clients and a table that serves as a call log. I want to be able to find all of the people who have not given a reply to our inquiry. We do this by using the dcount shown above to count how many times in the call log people have given us a response other than the no response code. (So, how many times they have given us a response. If it is >0, we don't need to call them again.)

This works. The problem is that we get a record back for every call to that client. I only want the client to show once. DISTINCT works, but it takes quite a long time. Any thoughts would help.

Thanks.
Feb 13 '11 #5
NeoPa
32,556 Expert Mod 16PB
Jesse, if you're going to use a Domain Aggregate function within a query then you can't really expect anything in the way of performance. I'm only surprised it waited for you to add the DISTINCT predicate before slowing down. If you have a record source that needs to be included in the logic of your query then a JOIN of some form is a much more logical approach to use.

Unfortunately, most RDBMSs run so fast that even SQL like this example runs pretty quickly unless large numbers of records are involved. People often don't even consider what their design structure until they get to that stage, by which time it is too late really. It can be fixed. I'm not saying it's beyond repair, but that sort of approach does tend to waste a great deal of your time in the long run if you want to design a database that can handle the sort of numbers you're talking about.
Feb 13 '11 #6
Okay, thanks for the thought. I'm sorry, but I'm pretty unschooled in this. This database was my first and I am learning as I go. Unfortunately (from a technical side), the demand for this information has grown at a degree that was incomprehensible a few months ago. Am I reading that right to understand that I'm pretty much in big trouble? The size of this is only going to grow. I am in the process of converting to SQLServer, but we won't get there for a few weeks at least. I need filter running in the next day or two.

Any suggestions?
Feb 13 '11 #7
NeoPa
32,556 Expert Mod 16PB
Jesse Jones:
Am I reading that right to understand that I'm pretty much in big trouble?
I would say yes.

SQL Server does give you more options certainly, but I would stick with my earlier advice of designing a query to process all the inputs required. Domain Aggregate functions are not things that work very efficiently within SQL.
Feb 13 '11 #8
Rabbit
12,516 Expert Mod 8TB
For the WHERE clause I would try a subquery while for the count in the SELECT clause, I would try a cross join count if it's feasible. If not, I would use a subquery. I would also make sure all the appropriate indexes have been created.
Feb 13 '11 #9

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: mfyahya | last post by:
I have a table `books` with booktitle and authorname columns. In my applicatoin booktitle has unique values but authorname doesn't. ie an author can have many books but not the other way around. ...
2
by: MasterChief | last post by:
I have a Drop Down box that is being populated using the SQL Distinct command. How do I get it so it doesn't show Null values? Here is my SQL statement sSQL = "SELECT DISTINCT Title FROM...
4
by: Johnson, Shaunn | last post by:
Howdy: Can someone tell what the difference (and why you would use it) is between the following: select distinct on (col_1, col_2), col_1, col_2, col_3
9
by: LarryR | last post by:
The following XSLT works fine using MSXML 4.0 (e.g I receive a result in about 20 seconds), but effectively hangs in both .NET 1.0 sp2 with the XML hot fix and NET 1.1. My source XML file is...
1
by: Patrick.O.Ige | last post by:
I'm trying to do distinct below in a datatable but i get the error :- Syntax error: Missing operand after 'Code' operator. Any ideas? string strExpr = "DISTINCT Code LIKE '%ve%'" ; DataRow...
5
by: mikevde | last post by:
Hi, Can anyone confirm whether it is possible to: a) use subqueries in mySQL b) use DISTINCT TOP in mySQL? I am trying to find a way to return the top 5 results in my table. For simplicity...
8
by: bnashenas1984 | last post by:
I'm using DISTINCT to get one of each title out of my table. Here is my table: +---------------------------+ | title | weather | +---------------------------+ | Monday | ...
3
by: bnashenas1984 | last post by:
Hi everyone You see an example table below.. I'm trying to get one of each (day) using DISTINCT command... +---------------------------+ | day | weather |...
4
by: omnittha | last post by:
I would like to pull unique records of a SINGLE column but not the other related columns. When I use DISTINCT, it pulls up all the records which are unique across the columns. Here is the code.......
6
by: rahulrhegde | last post by:
Generate 10 digit unique number using VBA ? I used the Unix -Time formula Now the problem with this is 2 person from different computer can generate number at same sec then this will fail...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.