473,385 Members | 1,409 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Showing dupes in Database

I am having trouble with a query for my database. It is a name & address database with columns like firstname, lastname, email, etc. I am trying to show dupes in my 20,000 name database. I have written a query that shows the dupes. For example, if there are 10 email addresses as john@john.net, my output only shows that entry once. What I would like to do is output all the dupes, so I can edit or delete them. Also, another part of this is that I would like to output all dupes that match first name and last name (double column query). That is, the output would show all dupes for John Smith and all dupes for Bill Smith, not just all the Smiths.
Here is my query that shows only one instance of each email (single column query). It does not show each instance of dupes.

Expand|Select|Wrap|Line Numbers
  1. SELECT email, COUNT(email) AS cnt
  2. FROM tbl_data
  3. GROUP BY email
  4. HAVING cnt > 1
  5. ORDER BY cnt
PHP 5, MySQL 5
Dec 3 '07 #1
10 1646
code green
1,726 Expert 1GB
That is because you are GROUPing by email.
This works the same as DISTINCT.
Can't see any reason to use GROUP BY
Dec 3 '07 #2
That is because you are GROUPing by email.
This works the same as DISTINCT.
Can't see any reason to use GROUP BY
So, what would the query look like for a double column query as described above?
Dec 4 '07 #3
code green
1,726 Expert 1GB
I am not sure what you mean by double column query.
But if you use GROUP BY or DISTINCT you will only return unique records.
Have you got IDs in the table. You should get into the habit of using INT ids even if you see no immediate use for them.
I am confused about the names.
Do you have duplicate emails under different names?
There are varying methods to remove duplicates, a sub-query is the clearest example.
if you could give me the table construction and what result you are looking for it will help
Dec 4 '07 #4
mwasif
802 Expert 512MB
Hi username88,

Welcome to TSDN!
The following query will return all the results where the combination of first and last names are john smith and bill smith.
Expand|Select|Wrap|Line Numbers
  1. SELECT * from FROM tbl_data
  2. WHERE (firstname='John' and lastname='Smith') OR (firstname='Bill' and lastname='Smith')
Dec 4 '07 #5
I have 20,000 entries and some are duplicates. That is, the firstname and lastname are the same. I also have other fields such as email, city, state, zip, company, comments, and a few others. There are dupes such that all the fields have the same information except for the comments field. I would like to be able to display a list of dupes so that all the dupes are shown. If the dupes are shown, I can then edit the fields so that all the fields are complete.

Another point (to help describe this): The entries are sometimes added by different people and some fields are not entered. That is, one will enter john doe and leave out the state. The other will enter john doe and leave out the company. If these are displayed side by side, I can add the missing information and delete the one that is not needed.

Not sure what ID's in the table means. Can you explain that and how it will help. I will try to read about ID's in the mean time.
Dec 5 '07 #6
Expand|Select|Wrap|Line Numbers
  1. SELECT *, COUNT(*)
  2. FROM tbl_data
  3. GROUP BY lastname, firstname
  4. HAVING count(*) > 1
This is what I have now. If my database has 4 rows, each with a name entered that is John Doe, the above displays only ONE row with John Doe. What I want is to display ALL FOUR rows of John Doe ( and the other columns for each of the four rows). Any ideas what needs to change?
Dec 6 '07 #7
mwasif
802 Expert 512MB
Kindly use CODE tags when posting code.

If you want to show all the records, then remove GROUP BY and make simple select statement (if I am understanding correctly)
Expand|Select|Wrap|Line Numbers
  1. SELECT * FROM tbl_data
Dec 6 '07 #8
code green
1,726 Expert 1GB
As mwasif has kindly confirmed, GROUP BY is causing your problem.
Not sure what ID's in the table means. Can you explain that and how it will help. I will try to read about ID's in the mean time.
By ID I mean a numeric ID field,
The simplest way to implement this is using an extra column of type INT and the auto-increment feature.
This would help you because no matter how many duplicates you had, each record would still be unique because of the ID.
But I think what you are hoping to do is SELECT every occurrence of records with duplicated data.
You were on the right lines. Try this
Expand|Select|Wrap|Line Numbers
  1. SELECT * FROM tablename WHERE name IN
  2. (SELECT name FROM tablename 
  3. GROUP BY name 
  4. HAVING COUNT(*) > 1) 
  5.  
Dec 6 '07 #9
If by ID you mean the primary key, I do have a column called "recordid". Here is what I tried:
Expand|Select|Wrap|Line Numbers
  1. SELECT *
  2. FROM tbl_data
  3. WHERE lastname IN (SELECT lastname FROM tbl_data
  4. GROUP BY lastname
  5. HAVING COUNT(*) > 1)
  6.  
This times out the server after about 90 seconds. I do have about 20,000 records, and I think about 3,000 of them can be trimmed out because they are dupes. The above code does not use the primary key, so should I try to use it? If so, how?
Dec 9 '07 #10
code green
1,726 Expert 1GB
This times out the server after about 90 seconds
This type of query is a sub-query which are generally slow.
Try a JOIN with a temporary table.
First create a temp table
Expand|Select|Wrap|Line Numbers
  1. CREATE TABLE `temp` (SELECT * FROM `tbl_data` GROUP BY 'name`)
This is not strictly a temporary table. A table wil be created of unique names (or surnames or emails whichever you are looking for)
Then try this query
Expand|Select|Wrap|Line Numbers
  1. SELECT * 
  2. FROM `tbl_data`
  3. LEFT JOIN `temp` ON `temp`.`name` = `tbl_data`.`name` 
  4. WHERE `temp`.`prodid` IS NOT NULL 
I think will pull out all duplicated names only
Dec 10 '07 #11

Sign in to post your reply or Sign up for a free account.

Similar topics

1
by: Amber | last post by:
The DataGrid allows you to make columns visible or invisible on demand - even edit and other special columns. This article will show you how it is done. Some developers have reported problems...
12
by: google_groups3 | last post by:
Hi all. I currently have 2 text files which contain lists of file names. These text files are updated by my code. What I want to do is be able to merge these text files discarding the...
3
by: Georges Heinesch | last post by:
Hi. This issue semms trivial, but I didn't get it working so far. I have a database, which contains dupes. I'd like to create a query, which shows all dupes (not only one record, but all...
1
by: gdarian216 | last post by:
I am trying to write a program that will input numbers in a array and then loop throught the array and look for dupes. The final output of the array should have the numbers in the array printed only...
1
by: gdarian216 | last post by:
I am tring to get rid of dupes and his code is taking the first input and repeating it. I don't know why. this is what i have so far can anyone help #include <iostream> using namespace std; ...
2
by: mmr315 | last post by:
this is the code i had written in php to connect to mysql ======================================== <? php class Database { var $database_name; var $database_user; var...
7
by: Jan | last post by:
Hi: When I searched the newsgroup for this problem, I saw two or three instances of the question being asked, but it was never answered. Not too promising, but here goes: I have a form with...
6
by: painkiller | last post by:
language: vb.net environment: windows forms .net : v1.1 i am having a checkedlistbox control that display document category such as text, image, video, audio etc. these values are coming from...
3
by: reelrave | last post by:
Hello All, I've checked around a little on Google (and this group) and have found info similar to what I'm trying to do (i.e., changes that were made in the BE's tables are not reflected in the...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.