OK, say you're trying to find all the non-exact duplicates in a table, and especially
do it with a single query, so that the check can be part of a user interface.
Then naturally you're going to have a slow query, because it has to compare
every row with every other row.
You ought to be able to cut out exactly half of the records in the second table,
by just saying:
SELECT * FROM TABLE
LEFT JOIN users dups ON users.created<dups.created
But I can't get MySQL to use the index I have on 'created' (same results for
integer-primary key, & a string field).
My best guess why not is that (according to the manual) "MySQL will not use
an index when ... the use of the index would require MySQL to access more than
30% of the rows in the table" - is this the reason?
Another option: say that I thought that my duplicates will always have the same
letter in their surname, then I could make an index on the first letter of the
surname, and make the join just on *that*. This should make the number of rows
needed up to 26 times less, right?
But exactly how is this done? MySQL won't use my index on the following query:
SELECT * FROM TABLE
LEFT JOIN users dups
ON SUBSTRING(users.name,0,1)=SUBSTRING(dups.name,0,1)
Thank you all in advance.
Tom Cunningham.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/my***********...ie.nctu.edu.tw