By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,491 Members | 3,273 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,491 IT Pros & Developers. It's quick & easy.

De-Duplicate SQL Script?

P: n/a
I just put together a PHP mailing list sign-up page that posts to a
mySQL DB. The PHP script prevents INSERTs when the email address is
already located in the database.

Problem: I need to import some flat-files that stored the signups,
prior to this new form. Email addresses weren't checked, so there are
a lot of records that have the same email address.

Once I import these into the table, how would I go about putting
together a SQL statement against that table that would locate
duplicate records (ie. records that share the email address field) and
delete all subsequent ones, but not the first record with that email
address?

Any help would be appreciated.
Jul 20 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
JStrummer wrote:
I just put together a PHP mailing list sign-up page that posts to a
mySQL DB. The PHP script prevents INSERTs when the email address is
already located in the database.

Problem: I need to import some flat-files that stored the signups,
prior to this new form. Email addresses weren't checked, so there are
a lot of records that have the same email address.

Once I import these into the table, how would I go about putting
together a SQL statement against that table that would locate
duplicate records (ie. records that share the email address field) and
delete all subsequent ones, but not the first record with that email
address?


One solution would be to update rather than insert, when a duplicate
exists. There seem to be two ways to do this with MySQL (using MySQL
extensions to standard SQL):

REPLACE, which works like INSERT but performs an UPDATE instead if the
data you are entering matches an existing record by primary key.
See http://dev.mysql.com/doc/mysql/en/REPLACE.html

INSERT... ON DUPLICATE KEY UPDATE
See http://dev.mysql.com/doc/mysql/en/INSERT.html

If this doesn't work in your case (for instance, if the email field
isn't the primary key), here's another solution: create a second table
with similar structure to the old email addresses table. Import your
flat file to this new table. Then do an outer join between the two
tables to find the newly imported addresses that _aren't_ in the old table:
select n.email
from new_emails n left outer join old_emails o on n.email = o.email
where o.email is null

The result is the set that you need to add to the table. You can insert
directly from a select query:
insert into old_emails select email ...

Regards,
Bill K.
Jul 20 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.