473,321 Members | 1,669 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,321 software developers and data experts.

Deleting duplicates of rows

196 100+
Hi im new to msql, anyway i have made up a table which contains a bunch of results from a search and what i want to do is to delete any duplicate rows in my database.

I not actually sure how to go about achieve this, i thought perphaps performing a search combined with some check method and a delete statement.

Is there perphaps a msql function that can achieve this??

Also note that when i say duplicate copies, it does not necesarilly mean that all the colums two rows while be the same but will have a enough in common to be able to tell if duplicate.

Any help with this or a reference would be great, thanks
Jul 14 '08 #1
15 1760
coolsti
310 100+
Basically it sounds like you have the wrong table structure if you have duplicate rows that you wish to remove. Either your program that puts data into the database, or the database itself, through its structure, should filter out the rows that you consider to be duplicate before they even get put into the database tables.

Take a look at the mysql documentation (e.g. online) about the "INSERT INTO ..... ON DUPLICATE KEY UPDATE" syntax. This will do what you want.

For example, this query statement:

INSERT INTO mytable (id,xpos,ypos,string1,string2,string3)
values (25, 2, 5, 'hello','world','from me')
ON DUPLICATE KEY UPDATE string1 = VALUES(string1);

This is only an example, and surely does not do what you need. But take a look what it does. In this case, let us say that I have set up my table "mytable" to have a unique key composed of the three columns id, xpos and ypos.

Let us say I try to insert into the database a set of values for which the combination of id, xpos and ypos are not yet in the database. In this case, the insert part of the query will be carried out, and the row will be inserted into the database.

Now let us say I try to insert a set of values for which the combination of id, xpos and ypos already are present in the table. Because of the "ON DUPLICATE KEY" clause, mysql will not return an error saying I am trying to insert a duplicate key, but instead will carry out the statement after the ON DUPLICATE KEY keywords. In my case above, I ask to update the value stored in the table for string1 with the value in my insert statement, but to leave the current values of string2 and string3 alone. You can update any or none of the other values, as you need.

What I am trying to say here is, instead of trying to find a query to remove the rows you consider to be duplicate, you should instead change your queries and your database structure (by adding unique keys) so that these duplicate rows do not even get into the database.
Jul 14 '08 #2
chazzy69
196 100+
First thanks for replying, secondly this sounds like what i probably need to do, also i do have a few unique colums approximatley about five of them if needed to differenatate between rows and another six or so rows that contain arbitry data that only relates to the other colums but can vary in the case of a duplicate of of a row.

i.e. The first five colums can be used to find a duplicate but the rest does vary.
meaning that i have check for duplicate in the first five colums, but i need to enter data also in the rest that may not be the same. You may ask why are there differences between duplicates? its because im retrieve the same information from different sources so that the sources are identical to each other.

Next i kinda understand the code example your using, correct me if i wrong but it seems that a specific key/colum name is been update when there is a duplicate rather than actually not adding the row at all.

What im asking is it possible to not update a certain key/colum but rather just skip over the row.

Note im using a php script to input data into the database.

Thanks for help,
Jul 14 '08 #3
chazzy69
196 100+
I think i just realised that the duplicate key means that it will skip putting in the next entry and just update a certain colum or not.

This will work fine but can it determine a duplicate key with only five of 10 colums been identical?

Thanks for help
Jul 14 '08 #4
coolsti
310 100+
You need to determine which columns in your table need to be identical in order for it to be considered by you as being a duplicate row.

If your table has 10 columns, and you say that 5 of them need to be unique, then you need to create a unique key for that table which consists of these 5 columns, and not the others.

Here an example:

CREATE TABLE `mytable` (
`id` smallint(5) unsigned NOT NULL auto_increment,
`firstname` varchar(12) NOT NULL,
`lastname` varchar(12) NOT NULL,
`zipcode` smallint(3) unsigned NOT NULL default '0',
`age` smallint(3) unsigned default NULL,
`haircolor` varchar(20) NOT NULL default '',
PRIMARY KEY (`id`),
UNIQUE KEY (`firstname`,`lastname`,`zipcode`) )

In the above case I declare that the columns firstname, lastname and zipcode have to be unique. But not the age or haircolor. So if I tried to add a row which had the same combination of firstname, lastname and zipcode of another row, I would not be allowed to. But the database does not care about what is in the other two columns, age and haircolor. Only the other three columns must be unique.

And yes, if you use the "insert ... on duplicate key .... " command, it will insert the row if a duplicate row would not occur, but instead would perform what you ask it to do after the "on duplicate key" clause otherwise. So you can decide yourself which of the columns will be updated or left alone in the event that an attempt is made to create a duplicate key.
Jul 15 '08 #5
chazzy69
196 100+
Thanks this helps heaps, so i only need to specify something as a unique key just once and then the duplicate will only look at the colums specified.


Thanks agains, this really helps
Jul 15 '08 #6
chazzy69
196 100+
This seems to work quite well except when one of the unique keys is different it still recognises that it is a duplicate.

Is it possible to it to recognise a duplicate only when say all five unique keys are idenitical.


Thanks agian for all the help
Jul 15 '08 #7
coolsti
310 100+
This should not happen if you have it configured correctly.

Try to post your table structure here, and the query you are using to insert.

If you have access to it, try to do a "SHOW CREATE TABLE tablename"
and post that. It will tell the indices, keys, etc. that you have set up.

You may have some other key as well, because if you have created a unique key composed of 5 columns, all 5 columns would need to be the same for the insert to be disallowed. So if only a subset of those 5 columns is the same, then you may have some other key.

You also can try to do the query without the "on duplicate key" part, and have the mysql error printed out and look at that. If it is not inserting a row because it thinks the row is a duplicate, the error should give you an indication of the criteria that mysql used for this decision.
Jul 15 '08 #8
chazzy69
196 100+
Heres the query im using in php

Expand|Select|Wrap|Line Numbers
  1. $con = mysql_connect("127.0.0.1","a_table","password");
  2. if (!$con)
  3.           {
  4.               die('Could not connect: ' . mysql_error());
  5.           }
  6.  
  7.         else {
  8.  
  9.             mysql_select_db("realestate_aus", $con);
  10.  
  11.  
  12.             $sql = "INSERT INTO search_results (prices, beds, baths, car, logo, houseimage, propadd, blur,                     moreinfo, state, pcode, suburb) 
  13.             VALUES 
  14.             ('12000', 
  15.             '2', 
  16.             '2', 
  17.             '4', 
  18.             'logos', 
  19.             'image', 
  20.             'address', 
  21.             'blur4', 
  22.             'info', 
  23.             'nsw', 
  24.             '1100', 
  25.             'liverpool' )
  26.             ON DUPLICATE KEY UPDATE blur = VALUES(blur)";
  27.  
  28.         if (!mysql_query($sql,$con))
  29.           {
  30.               die('Error: ' . mysql_error());
  31.           }
  32.         echo "Your Form Successfully submitted" . '<br />';
  33.  
  34.         mysql_close($con);
  35.  
  36.  
  37. }
-------------------

and table is like this but can't draw a actual table sorry:

Prices| Beds| Baths| Car| Logo| Houseimage| Propadd| Blur| Moreinfo| Index| State| Pcode| Suburb|

thats the best way i can desiplay;

The colums that are currently unique are prices, beds, baths, car, state, pcode, suburb.

[EDIT]

I got that show table thing to work heres the results, looks a mess though sorry,

CREATE TABLE `search_results` (\n `prices` varchar(30) NOT NULL,\n `beds` int(10) NOT NULL,\n `baths` int(10) NOT NULL,\n `car` int(10) NOT NULL,\n `logo` varchar(200) NOT NULL,\n `houseimage` varchar(200) NOT NULL,\n `propadd` varchar(50) NOT NULL,\n `blur` varchar(400) NOT NULL,\n `moreinfo` varchar(50) NOT NULL,\n `Index` int(11) NOT NULL auto_increment,\n `state` varchar(10) NOT NULL,\n `pcode` int(15) NOT NULL,\n `suburb` varchar(30) NOT NULL,\n UNIQUE KEY `prices` (`prices`),\n UNIQUE KEY `beds` (`beds`),\n UNIQUE KEY `baths` (`baths`),\n UNIQUE KEY `car` (`car`),\n UNIQUE KEY `state` (`state`),\n UNIQUE KEY `pcode` (`pcode`),\n UNIQUE KEY `suburb` (`suburb`),\n KEY `Index` (`Index`)\n) ENGINE=MyISAM AUTO_INCREMENT=482 DEFAULT CHARSET=utf8
Jul 15 '08 #9
amitpatel66
2,367 Expert 2GB
Heres the query im using in php

Expand|Select|Wrap|Line Numbers
  1. $con = mysql_connect("127.0.0.1","a_table","password");
  2. if (!$con)
  3.           {
  4.               die('Could not connect: ' . mysql_error());
  5.           }
  6.  
  7.         else {
  8.  
  9.             mysql_select_db("realestate_aus", $con);
  10.  
  11.  
  12.             $sql = "INSERT INTO search_results (prices, beds, baths, car, logo, houseimage, propadd, blur,                     moreinfo, state, pcode, suburb) 
  13.             VALUES 
  14.             ('12000', 
  15.             '2', 
  16.             '2', 
  17.             '4', 
  18.             'logos', 
  19.             'image', 
  20.             'address', 
  21.             'blur4', 
  22.             'info', 
  23.             'nsw', 
  24.             '1100', 
  25.             'liverpool' )
  26.             ON DUPLICATE KEY UPDATE blur = VALUES(blur)";
  27.  
  28.         if (!mysql_query($sql,$con))
  29.           {
  30.               die('Error: ' . mysql_error());
  31.           }
  32.         echo "Your Form Successfully submitted" . '<br />';
  33.  
  34.         mysql_close($con);
  35.  
  36.  
  37. }
-------------------

and table is like this but can't draw a actual table sorry:

Prices| Beds| Baths| Car| Logo| Houseimage| Propadd| Blur| Moreinfo| Index| State| Pcode| Suburb|

thats the best way i can desiplay;

The colums that are currently unique are prices, beds, baths, car, state, pcode, suburb.

[EDIT]

I got that show table thing to work heres the results, looks a mess though sorry,

CREATE TABLE `search_results` (\n `prices` varchar(30) NOT NULL,\n `beds` int(10) NOT NULL,\n `baths` int(10) NOT NULL,\n `car` int(10) NOT NULL,\n `logo` varchar(200) NOT NULL,\n `houseimage` varchar(200) NOT NULL,\n `propadd` varchar(50) NOT NULL,\n `blur` varchar(400) NOT NULL,\n `moreinfo` varchar(50) NOT NULL,\n `Index` int(11) NOT NULL auto_increment,\n `state` varchar(10) NOT NULL,\n `pcode` int(15) NOT NULL,\n `suburb` varchar(30) NOT NULL,\n UNIQUE KEY `prices` (`prices`),\n UNIQUE KEY `beds` (`beds`),\n UNIQUE KEY `baths` (`baths`),\n UNIQUE KEY `car` (`car`),\n UNIQUE KEY `state` (`state`),\n UNIQUE KEY `pcode` (`pcode`),\n UNIQUE KEY `suburb` (`suburb`),\n KEY `Index` (`Index`)\n) ENGINE=MyISAM AUTO_INCREMENT=482 DEFAULT CHARSET=utf8
Please use [code] tags for any source code that you post in this forum. That would make toe source code look much clear and readable for other users.
Jul 15 '08 #10
coolsti
310 100+
Your dump of the "show create table" shows your error clearly.

Examine again the way you need to define a unique key composed of more than one column. You have done this incorrectly.

You have defined many unique keys each composed of one table column. Instead, you want to have one unique key composed of all columns:
...
UNIQUE KEY `prices` (`prices`,`beds`,`baths`,`car`,`state`,`pcode`,`su burb`),
...

Look to the documentation (e.g. online) on how to add keys to your database tables.

Hint: in the above, the unique key composed of 7 columns has been given the name "prices" but this is not important what it is called.
Jul 15 '08 #11
chazzy69
196 100+
Ok i check the manual and i found a reference for making a single key for more than one colum e.g. -

Expand|Select|Wrap|Line Numbers
  1. UNIQUE KEY (col1, col2)
So then i took your example from above removing all the quotation marks and im getting sql syntax, i also tried it with the quotations; i tried this directly in mysql database rather than a php file just to make sure it was right.

So im little confused as to what i supposed to do.

Note i now understand that the unique key can be for more than one colum.

Thanks for the help
Jul 15 '08 #12
chazzy69
196 100+
I did a little more research and is it possible that you have to make the unique key on the creation of the table or is possible to edit the table.

Thanks again
Jul 15 '08 #13
chazzy69
196 100+
First sorry for the triple posting;

I think i finally figured out how to implement the unique key - used this statement does it look right???


Expand|Select|Wrap|Line Numbers
  1. ALTER TABLE search_results ADD UNIQUE KEY mykey(prices,beds,baths,car,state,pcode,suburb)
[EDIT]

OK i tested ok the my inserting code and it seems to be working,

thank you very much for you help
Jul 15 '08 #14
coolsti
310 100+
Exactly, you found the right syntax for adding a unique key consisting of multiple columns.

Good luck! Glad I could help. I also have learned all of this rather recently, mostly by surfing through the online mysql documentation.

By the way, those were not quotation marks in the create table statements, they were these funny little slanted accent marks. They are not necessary if your column name does not have any spaces in it. Single quotation marks will not work here, as Mysql will interpret this then as a string variable and not a column name.
Jul 15 '08 #15
r035198x
13,262 8TB
Here is a thread that might help about the deleting duplicates part.
Jul 15 '08 #16

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: ScottH | last post by:
I was looking for thw SQL to delete dupes from a table, and came across this. All who saw it agreed in principle, but I can't quite figure out the logic. If we are deleting all rows whose rowid...
2
by: Zak McGregor | last post by:
Hi all I have a table, for simplicity's sake containing one field, called unid. for example, select unid, oid from table gives me something like this: unid | oid ---------+---------
3
by: AK | last post by:
Hi Our product uses MS-SQL Server 2000. One of our customer has 10 installations with each installation stroring data in its own database. Now the customer wants to consolidate these databases...
4
by: Newbie | last post by:
Can someone help: Is there a way with ado.net and it's disconnected dataset to delete a record in a table if you can't specify a unique field in an SQL WHERE clause. The current Dbase .dbf...
3
by: skennd | last post by:
Here's my problem in exact replication: I have used the find duplicate query in Access, and the query determined the following duplicate values by the following query: In (SELECT FROM As...
3
by: Maury | last post by:
Hello, I have a stored procedure that deletes duplicates in one table: ..... AS BEGIN DELETE FROM mytable WHERE Id IN( SELECT Max(id)
6
by: write2ashokkumar | last post by:
hi... i have the table values with duplicate records, i want to delete the duplicate records, How to delete the duplicate records? I need the query to delete the duplicate rows Example: ...
40
by: kylie991 | last post by:
Hi I am stuck on checking if a 2D array has duplicates. I have to write a function to check if a column has duplicates and then another function to check if the rows have duplicates from a file. ...
1
by: tskmjk55 | last post by:
Recently, I have a requirement to develop a vb.net application wherein the input excel sheet data which has an average of 5000 records should be checked for Internal duplicates (duplicates within the...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.