Connecting Tech Pros Worldwide Forums | Help | Site Map

Deduping

DaveL
Guest
 
Posts: n/a
#1: Aug 6 '08
Hi Hows it going , I need a Fast way to check for duplicate rows in a single
table

the identity keys and Primary Keys dont play in this scenario


i have like 10 columns
Column1 thru column10 as the columns that make it a duplicate row,

any help is welcome
Thanks



DaveL
Guest
 
Posts: n/a
#2: Aug 27 '08

re: Deduping


The Responses gets counts and that lets me know i do have duplicates

What i really am after is the best Way to traverse the
duplicates to mark them as Rejected, i can't delete them
i also need to keep the oldest and reject the newest records

since i have a thread calling the Storedproc and this thread is a
MultiThread 1 to many

for instance thread 1 mite call rowset based on groupid
thread 1 groupid = 11111111
thread 2 groupid = 44444444
etc

each group has a set of rows 1 to 5000

the main table has millions of rows

so the query not only has the 10 columns but returns datecreated also

Im thinking a TmpFile is too much resource on the machine since its a multi
threaded application...
maybe a TmpFile named via the Threadid is better

Select all the dups into a tmp (physical) table and
work from there,

I would like to here all ideas pertaining to this
Thanks Very much
DaveL




"DaveL" <dvs_bis@sbcglobal.netwrote in message
news:uximk.9259$vn7.3415@flpi147.ffdc.sbc.com...
Quote:
Hi Hows it going , I need a Fast way to check for duplicate rows in a
single table
>
the identity keys and Primary Keys dont play in this scenario
>
>
i have like 10 columns
Column1 thru column10 as the columns that make it a duplicate row,
>
any help is welcome
Thanks
>

--CELKO--
Guest
 
Posts: n/a
#3: Aug 27 '08

re: Deduping


>What i really am after is the best Way to traverse [traverse? in a set oriented language?] the duplicates to mark them as Rejected, I can't delete them I also need to keep the oldest and reject the newest records [sic: rows are not records] <<

You are talking about a sequential file system and not SQL. We select
sets; we do not traverse records. Apparently, there is a column
which holds a timestamp from which we can determine which row in each
grouping is the oldest.

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are.

If you know how, follow ISO-11179 data element naming conventions and
formatting rules. Put temporal data in ISO-8601 formats. Avoid
proprietary syntax when standard syntax is available.

Sample data is also a good idea, along with clear specifications. It
is very hard to debug code when you do not let us see it. If you want
to learn how to ask a question on a Newsgroup, look at:
http://www.catb.org/~esr/faqs/smart-questions.html


Plamen Ratchev
Guest
 
Posts: n/a
#4: Aug 27 '08

re: Deduping


The WITH keyword defines a Common Table Expression (available only on
SQL Server 2005/2008). It is pretty much the same as using a derived
table, with a few extra additions like reusing multiple times within the
query, stacking multiple CTEs, and recursive features.

--
Plamen Ratchev
http://www.SQLStudio.com
DaveL
Guest
 
Posts: n/a
#5: Aug 27 '08

re: Deduping


thank you very much
DaveL

"Plamen Ratchev" <Plamen@SQLStudio.comwrote in message
news:vNWdncj8areDJCjVnZ2dnUVZ_jKdnZ2d@speakeasy.ne t...
Quote:
The WITH keyword defines a Common Table Expression (available only on SQL
Server 2005/2008). It is pretty much the same as using a derived table,
with a few extra additions like reusing multiple times within the query,
stacking multiple CTEs, and recursive features.
>
--
Plamen Ratchev
http://www.SQLStudio.com

Closed Thread