r***@citidc.com
2007-03-19 03:49:23 UTC
I'm trying to setup a way to find possible duplicate or near-duplicate
records of contact people. I have a table of 50k records doing an
inner join with itself, something like this:
SELECT p1.fields, p2.fields
FROM table AS p1 INNER JOIN table AS p2 ON p1.lastname = p2.lastname
WHERE p1.id <> p2.id AND
(other criteria)
The id and name and other criteria fields are indexed as well as they
can be. It works fine as long as I add a LIMIT statement to keep it
down to just 10 or 20 records, but if I take this off, it runs on
forever -- so I can't even tell how many potentail records there will
be. If the table grows to 100k or 200k, it'll only get worse.
So my question is -- are there any better ways to go about this?
records of contact people. I have a table of 50k records doing an
inner join with itself, something like this:
SELECT p1.fields, p2.fields
FROM table AS p1 INNER JOIN table AS p2 ON p1.lastname = p2.lastname
WHERE p1.id <> p2.id AND
(other criteria)
The id and name and other criteria fields are indexed as well as they
can be. It works fine as long as I add a LIMIT statement to keep it
down to just 10 or 20 records, but if I take this off, it runs on
forever -- so I can't even tell how many potentail records there will
be. If the table grows to 100k or 200k, it'll only get worse.
So my question is -- are there any better ways to go about this?