Help | Site Map
Connecting Tech Pros Worldwide
Reply
 
LinkBack Thread Tools
  #1  
Old September 15th, 2008, 11:16 PM
Newbie
 
Join Date: Dec 2007
Posts: 17
Default It's Me again!

Okay,
I have a table with a LONGTEXT field. Some records have 1000+ characters in the suspect field. Since it's a LONGTEXT field, i can't figure out how to set a no-dupe index on it.

I get duplicate records where the first 100 characters are the same. If the first 100 are the same, the entire record is the same. Guaranteed. I have a field that includes the first 50 chars, but sometimes they are the same. (fld 1 "James d. Smith" fld 2= "James d. Smith has deployed" or "James d. Smith has been promoted"

I would like to eliminate existing duplicates. My thought is to create a TEXT field with 100 length, use a query to copy the first 100 chars over to it for each record. THEN, copy the structure, make that field UNIQUE then copy the records over, thus eliminating the duplicates.

Anyone have any better ideas? anyone have any code that does this?
Reply
  #2  
Old September 16th, 2008, 06:42 PM
iburyak's Avatar
Expert
 
Join Date: Nov 2006
Posts: 1,013
Default

Here is my little test to simulate what you have and remove duplicates. I worked with first 14 characters just to make it small.


Code:
1. Create a table
create table #a (name varchar(100))
2. Insert some records
insert into #a values ('James d. Smith has deployed')
insert into #a values ('James d. Smith has deployed')
insert into #a values ('b James d. Smith')
insert into #a values ('a James d. Smith has deployed')
insert into #a values ('James d. Smith has deployed')
insert into #a values ('James d. Smith has deployed')
insert into #a values ('b James d. Smith')
insert into #a values ('a James d. Smith has deployed')

3. Run code: 
set rowcount 1

select 1
While @@Rowcount > 0
BEGIN
delete from #a where substring(name,1,14) = (select top 1 substring(name,1,14)
					     from #a
					     group by substring(name,1,14)
					     having count(*) > 1)
END
set rowcount 0 

4. Check result
select * from #a order by 1
Your result should be:

a James d. Smith has deployed
b James d. Smith
James d. Smith has deployed

Then you can add a column to the table with first 100 characters to be able to have unique records.

Good Luck.
Reply
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles