473,505 Members | 15,798 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

what is best algorithm to check duplicated rows

Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user to
pick any DataColumn(s) to check for duplicated lines, the data is so large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan
May 12 '07 #1
6 1631
JR
SELECT?

JR

"Ryan Liu" <Ry*******@gmail.comëúá
áäåãòä:OJ**************@TK2MSFTNGP06.phx.gbl...
Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user
to pick any DataColumn(s) to check for duplicated lines, the data is so
large, is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan

May 12 '07 #2
Hello Ryan,

Hmm,
I see two ways - using the hashtable or sorting + binary search

---
WBR, Michael Nemtsev [.NET/C# MVP].
My blog: http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

RLHi,
RL>
RLIf I have tens of thousands DataRow in a DataTable and allow the end
RLuser to pick any DataColumn(s) to check for duplicated lines, the
RLdata is so large, is there a better API, algorithm can be used for
RLthis purpose?
RL>
RLThanks a lot!
RLRyan
May 12 '07 #3
Ryan,
The first question I would ask is "how did you get tens of thousands of
rows" into this Datatable? If they came out of a database, shouldn't that be
where you are enforcing your referential and unique column integrity?
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user to
pick any DataColumn(s) to check for duplicated lines, the data is so large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan
May 12 '07 #4
Hi Peter,

The data is imported by end user from external file, most time it is a csv
text file.

After import to datatable, then the end user specify the criteria which is
used to pick rows from datatable.

Then for selected rows, the end user want to check duplicated lines before
insert them to database. The criteria for checking duplicated lines is
also specified by the end user.

And I am also required check duplicated entries against data already in
database.

Thank you and everyone replyed to this message!

Ryan

"Peter Bromberg [C# MVP]" <pb*******@yahoo.yabbadabbadoo.comwrote in
message news:55**********************************@microsof t.com...
Ryan,
The first question I would ask is "how did you get tens of thousands of
rows" into this Datatable? If they came out of a database, shouldn't that
be
where you are enforcing your referential and unique column integrity?
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
>Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user
to
pick any DataColumn(s) to check for duplicated lines, the data is so
large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan

May 13 '07 #5
Thanks!

Just hope hash algorithm for string is as efficient as int.

And the criteria for checking duplicated datarows could be based on multiple
dataColumns (AND logic), this make it difficult for me to come out a hash
algorithm.

Ryan

"Michael Nemtsev" <ne*****@msn.comwrote in message
news:a2***************************@msnews.microsof t.com...
Hello Ryan,

Hmm,
I see two ways - using the hashtable or sorting + binary search

---
WBR, Michael Nemtsev [.NET/C# MVP]. My blog:
http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

RLHi,
RLRLIf I have tens of thousands DataRow in a DataTable and allow the
end
RLuser to pick any DataColumn(s) to check for duplicated lines, the
RLdata is so large, is there a better API, algorithm can be used for
RLthis purpose?
RLRLThanks a lot!
RLRyan


May 13 '07 #6
You can try adding a unique constraint to one or more columns in the DataTable.
I'm not sure exactly how to treat exceptions or error messages during an
import, but I'm sure if you look it up in the MSDN documentation you can find
some examples.
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
Hi Peter,

The data is imported by end user from external file, most time it is a csv
text file.

After import to datatable, then the end user specify the criteria which is
used to pick rows from datatable.

Then for selected rows, the end user want to check duplicated lines before
insert them to database. The criteria for checking duplicated lines is
also specified by the end user.

And I am also required check duplicated entries against data already in
database.

Thank you and everyone replyed to this message!

Ryan

"Peter Bromberg [C# MVP]" <pb*******@yahoo.yabbadabbadoo.comwrote in
message news:55**********************************@microsof t.com...
Ryan,
The first question I would ask is "how did you get tens of thousands of
rows" into this Datatable? If they came out of a database, shouldn't that
be
where you are enforcing your referential and unique column integrity?
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user
to
pick any DataColumn(s) to check for duplicated lines, the data is so
large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan


May 13 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

46
4154
by: Keith K | last post by:
Having developed with VB since 1992, I am now VERY interested in C#. I've written several applications with C# and I do enjoy the language. What C# Needs: There are a few things that I do...
2
7522
by: Joe | last post by:
Anyone can suggest the best method of reading XML and adding data to ListView? Here is the xml data structure:: <xml> <site> <url>http://www.yahoo.com</url> <lastupdate></lastupdate>...
9
2281
by: totalgeekdom | last post by:
Background: The problem I'm trying to solve is. There is a 5x5 grid. You need to fit 5 queens on the board such that when placed there are three spots left that are not threatened by the queen. ...
8
2362
by: sandeep | last post by:
Our team is developing proxy server(in VC++)which can handle 5000 clients. I have to implement cache part so when ever a new request com from client I have to check the request URL content is in...
9
2210
by: Jeff Dege | last post by:
I've been programming in C++ for a good long while, but there are aspects of the language I've never needed, and hence never bothered to really learn. It's the curse of working on a developed...
3
5554
by: Sejoro | last post by:
Hey again, I'm trying to write a program that will output a random valid Latin Square (9x9 square of numbers 1 - 9 with no repetition in the rows and columns) and can't get my numbers to not...
3
5835
by: ryadav | last post by:
Hi I really hope someone can help me, I am working on a report and I chose "Suppress if Duplicated" for one of my fields, I got blanks in that field on some of the rows. How do I tell crystal not...
69
3128
by: Yee.Chuang | last post by:
When I began to learn C, My teacher told me that pointer is the most difficult part of C, it makes me afraid of it. After finishing C program class, I found that all the code I wrote in C contains...
12
2868
ahmedtharwat19
by: ahmedtharwat19 | last post by:
hi, every one for delete duplicated rows can any one up to us an example to see that because i`m beginning to ms access and i have a problem about that thank you for all abo mroan
0
7370
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7021
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7478
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5614
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
3188
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3177
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1532
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
755
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
409
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.