467,917 Members | 1,346 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 467,917 developers. It's quick & easy.

It's Me again!

Okay,
I have a table with a LONGTEXT field. Some records have 1000+ characters in the suspect field. Since it's a LONGTEXT field, i can't figure out how to set a no-dupe index on it.

I get duplicate records where the first 100 characters are the same. If the first 100 are the same, the entire record is the same. Guaranteed. I have a field that includes the first 50 chars, but sometimes they are the same. (fld 1 "James d. Smith" fld 2= "James d. Smith has deployed" or "James d. Smith has been promoted"

I would like to eliminate existing duplicates. My thought is to create a TEXT field with 100 length, use a query to copy the first 100 chars over to it for each record. THEN, copy the structure, make that field UNIQUE then copy the records over, thus eliminating the duplicates.

Anyone have any better ideas? anyone have any code that does this?
Sep 15 '08 #1
  • viewed: 888
Share:
1 Reply
iburyak
Expert 512MB
Here is my little test to simulate what you have and remove duplicates. I worked with first 14 characters just to make it small.


Expand|Select|Wrap|Line Numbers
  1. 1. Create a table
  2. create table #a (name varchar(100))
  3. 2. Insert some records
  4. insert into #a values ('James d. Smith has deployed')
  5. insert into #a values ('James d. Smith has deployed')
  6. insert into #a values ('b James d. Smith')
  7. insert into #a values ('a James d. Smith has deployed')
  8. insert into #a values ('James d. Smith has deployed')
  9. insert into #a values ('James d. Smith has deployed')
  10. insert into #a values ('b James d. Smith')
  11. insert into #a values ('a James d. Smith has deployed')
  12.  
  13. 3. Run code: 
  14. set rowcount 1
  15.  
  16. select 1
  17. While @@Rowcount > 0
  18. BEGIN
  19. delete from #a where substring(name,1,14) = (select top 1 substring(name,1,14)
  20.                          from #a
  21.                          group by substring(name,1,14)
  22.                          having count(*) > 1)
  23. END
  24. set rowcount 0 
  25.  
  26. 4. Check result
  27. select * from #a order by 1 
Your result should be:

a James d. Smith has deployed
b James d. Smith
James d. Smith has deployed

Then you can add a column to the table with first 100 characters to be able to have unique records.

Good Luck.
Sep 16 '08 #2

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

18 posts views Thread by Phill Long | last post: by
1 post views Thread by Elaine | last post: by
16 posts views Thread by junky_fellow | last post: by
1 post views Thread by ]-[aTc]-[ | last post: by
4 posts views Thread by ad | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.