467,878 Members | 1,269 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 467,878 developers. It's quick & easy.

what is best algorithm to check duplicated rows

Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user to
pick any DataColumn(s) to check for duplicated lines, the data is so large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan
May 12 '07 #1
  • viewed: 1408
Share:
6 Replies
JR
SELECT?

JR

"Ryan Liu" <Ry*******@gmail.com
:OJ**************@TK2MSFTNGP06.phx.gbl...
Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user
to pick any DataColumn(s) to check for duplicated lines, the data is so
large, is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan

May 12 '07 #2
Hello Ryan,

Hmm,
I see two ways - using the hashtable or sorting + binary search

---
WBR, Michael Nemtsev [.NET/C# MVP].
My blog: http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

RLHi,
RL>
RLIf I have tens of thousands DataRow in a DataTable and allow the end
RLuser to pick any DataColumn(s) to check for duplicated lines, the
RLdata is so large, is there a better API, algorithm can be used for
RLthis purpose?
RL>
RLThanks a lot!
RLRyan
May 12 '07 #3
Ryan,
The first question I would ask is "how did you get tens of thousands of
rows" into this Datatable? If they came out of a database, shouldn't that be
where you are enforcing your referential and unique column integrity?
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user to
pick any DataColumn(s) to check for duplicated lines, the data is so large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan
May 12 '07 #4
Hi Peter,

The data is imported by end user from external file, most time it is a csv
text file.

After import to datatable, then the end user specify the criteria which is
used to pick rows from datatable.

Then for selected rows, the end user want to check duplicated lines before
insert them to database. The criteria for checking duplicated lines is
also specified by the end user.

And I am also required check duplicated entries against data already in
database.

Thank you and everyone replyed to this message!

Ryan

"Peter Bromberg [C# MVP]" <pb*******@yahoo.yabbadabbadoo.comwrote in
message news:55**********************************@microsof t.com...
Ryan,
The first question I would ask is "how did you get tens of thousands of
rows" into this Datatable? If they came out of a database, shouldn't that
be
where you are enforcing your referential and unique column integrity?
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
>Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user
to
pick any DataColumn(s) to check for duplicated lines, the data is so
large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan

May 13 '07 #5
Thanks!

Just hope hash algorithm for string is as efficient as int.

And the criteria for checking duplicated datarows could be based on multiple
dataColumns (AND logic), this make it difficult for me to come out a hash
algorithm.

Ryan

"Michael Nemtsev" <ne*****@msn.comwrote in message
news:a2***************************@msnews.microsof t.com...
Hello Ryan,

Hmm,
I see two ways - using the hashtable or sorting + binary search

---
WBR, Michael Nemtsev [.NET/C# MVP]. My blog:
http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

RLHi,
RLRLIf I have tens of thousands DataRow in a DataTable and allow the
end
RLuser to pick any DataColumn(s) to check for duplicated lines, the
RLdata is so large, is there a better API, algorithm can be used for
RLthis purpose?
RLRLThanks a lot!
RLRyan


May 13 '07 #6
You can try adding a unique constraint to one or more columns in the DataTable.
I'm not sure exactly how to treat exceptions or error messages during an
import, but I'm sure if you look it up in the MSDN documentation you can find
some examples.
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
Hi Peter,

The data is imported by end user from external file, most time it is a csv
text file.

After import to datatable, then the end user specify the criteria which is
used to pick rows from datatable.

Then for selected rows, the end user want to check duplicated lines before
insert them to database. The criteria for checking duplicated lines is
also specified by the end user.

And I am also required check duplicated entries against data already in
database.

Thank you and everyone replyed to this message!

Ryan

"Peter Bromberg [C# MVP]" <pb*******@yahoo.yabbadabbadoo.comwrote in
message news:55**********************************@microsof t.com...
Ryan,
The first question I would ask is "how did you get tens of thousands of
rows" into this Datatable? If they came out of a database, shouldn't that
be
where you are enforcing your referential and unique column integrity?
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user
to
pick any DataColumn(s) to check for duplicated lines, the data is so
large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan


May 13 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

46 posts views Thread by Keith K | last post: by
8 posts views Thread by sandeep | last post: by
9 posts views Thread by Jeff Dege | last post: by
3 posts views Thread by Sejoro | last post: by
3 posts views Thread by ryadav | last post: by
69 posts views Thread by Yee.Chuang | last post: by
reply views Thread by MrMoon | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.