473,785 Members | 2,412 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

what is best algorithm to check duplicated rows

Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user to
pick any DataColumn(s) to check for duplicated lines, the data is so large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan
May 12 '07 #1
6 1644
JR
SELECT?

JR

"Ryan Liu" <Ry*******@gmai l.comëúá
áäåãòä:OJ****** ********@TK2MSF TNGP06.phx.gbl. ..
Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user
to pick any DataColumn(s) to check for duplicated lines, the data is so
large, is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan

May 12 '07 #2
Hello Ryan,

Hmm,
I see two ways - using the hashtable or sorting + binary search

---
WBR, Michael Nemtsev [.NET/C# MVP].
My blog: http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

RLHi,
RL>
RLIf I have tens of thousands DataRow in a DataTable and allow the end
RLuser to pick any DataColumn(s) to check for duplicated lines, the
RLdata is so large, is there a better API, algorithm can be used for
RLthis purpose?
RL>
RLThanks a lot!
RLRyan
May 12 '07 #3
Ryan,
The first question I would ask is "how did you get tens of thousands of
rows" into this Datatable? If they came out of a database, shouldn't that be
where you are enforcing your referential and unique column integrity?
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user to
pick any DataColumn(s) to check for duplicated lines, the data is so large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan
May 12 '07 #4
Hi Peter,

The data is imported by end user from external file, most time it is a csv
text file.

After import to datatable, then the end user specify the criteria which is
used to pick rows from datatable.

Then for selected rows, the end user want to check duplicated lines before
insert them to database. The criteria for checking duplicated lines is
also specified by the end user.

And I am also required check duplicated entries against data already in
database.

Thank you and everyone replyed to this message!

Ryan

"Peter Bromberg [C# MVP]" <pb*******@yaho o.yabbadabbadoo .comwrote in
message news:55******** *************** ***********@mic rosoft.com...
Ryan,
The first question I would ask is "how did you get tens of thousands of
rows" into this Datatable? If they came out of a database, shouldn't that
be
where you are enforcing your referential and unique column integrity?
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
>Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user
to
pick any DataColumn(s) to check for duplicated lines, the data is so
large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan

May 13 '07 #5
Thanks!

Just hope hash algorithm for string is as efficient as int.

And the criteria for checking duplicated datarows could be based on multiple
dataColumns (AND logic), this make it difficult for me to come out a hash
algorithm.

Ryan

"Michael Nemtsev" <ne*****@msn.co mwrote in message
news:a2******** *************** ****@msnews.mic rosoft.com...
Hello Ryan,

Hmm,
I see two ways - using the hashtable or sorting + binary search

---
WBR, Michael Nemtsev [.NET/C# MVP]. My blog:
http://spaces.live.com/laflour
Team blog: http://devkids.blogspot.com/

"The greatest danger for most of us is not that our aim is too high and we
miss it, but that it is too low and we reach it" (c) Michelangelo

RLHi,
RLRLIf I have tens of thousands DataRow in a DataTable and allow the
end
RLuser to pick any DataColumn(s) to check for duplicated lines, the
RLdata is so large, is there a better API, algorithm can be used for
RLthis purpose?
RLRLThanks a lot!
RLRyan


May 13 '07 #6
You can try adding a unique constraint to one or more columns in the DataTable.
I'm not sure exactly how to treat exceptions or error messages during an
import, but I'm sure if you look it up in the MSDN documentation you can find
some examples.
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
Hi Peter,

The data is imported by end user from external file, most time it is a csv
text file.

After import to datatable, then the end user specify the criteria which is
used to pick rows from datatable.

Then for selected rows, the end user want to check duplicated lines before
insert them to database. The criteria for checking duplicated lines is
also specified by the end user.

And I am also required check duplicated entries against data already in
database.

Thank you and everyone replyed to this message!

Ryan

"Peter Bromberg [C# MVP]" <pb*******@yaho o.yabbadabbadoo .comwrote in
message news:55******** *************** ***********@mic rosoft.com...
Ryan,
The first question I would ask is "how did you get tens of thousands of
rows" into this Datatable? If they came out of a database, shouldn't that
be
where you are enforcing your referential and unique column integrity?
Peter

--
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short urls & more: http://ittyurl.net


"Ryan Liu" wrote:
Hi,

If I have tens of thousands DataRow in a DataTable and allow the end user
to
pick any DataColumn(s) to check for duplicated lines, the data is so
large,
is there a better API, algorithm can be used for this purpose?

Thanks a lot!
Ryan


May 13 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

46
4263
by: Keith K | last post by:
Having developed with VB since 1992, I am now VERY interested in C#. I've written several applications with C# and I do enjoy the language. What C# Needs: There are a few things that I do believe MSFT should do to improve C#, however. I know that in the "Whidbey" release of VS.NET currently
2
7535
by: Joe | last post by:
Anyone can suggest the best method of reading XML and adding data to ListView? Here is the xml data structure:: <xml> <site> <url>http://www.yahoo.com</url> <lastupdate></lastupdate> <check>1</check>
9
2296
by: totalgeekdom | last post by:
Background: The problem I'm trying to solve is. There is a 5x5 grid. You need to fit 5 queens on the board such that when placed there are three spots left that are not threatened by the queen. My thinking: I created a list, named brd, that represents the board. I made it such that brd would be the first square on the grid, and brd would be the bottom right end of the grid.
8
2388
by: sandeep | last post by:
Our team is developing proxy server(in VC++)which can handle 5000 clients. I have to implement cache part so when ever a new request com from client I have to check the request URL content is in cache of proxy and send to client if it is cache, if it is not there then it have to get data from web server and store in proxy server cache. so i am thinking to use binary tree search(or AVL tree) to search request URL content in cache if it...
9
2226
by: Jeff Dege | last post by:
I've been programming in C++ for a good long while, but there are aspects of the language I've never needed, and hence never bothered to really learn. It's the curse of working on a developed product - many fundamental issues were set long ago, and there's no reason to go back and revisit them just because the language has come out with a new set of tools. Case in point - the Standard Template Library. We fixed on a set of collection...
3
5570
by: Sejoro | last post by:
Hey again, I'm trying to write a program that will output a random valid Latin Square (9x9 square of numbers 1 - 9 with no repetition in the rows and columns) and can't get my numbers to not repeat. I figured that the code I have now would be enough, but it doesn't seem to be working. Could somebody help me out? #include <iostream> #include <ctime> using namespace std; int main(){
3
5852
by: ryadav | last post by:
Hi I really hope someone can help me, I am working on a report and I chose "Suppress if Duplicated" for one of my fields, I got blanks in that field on some of the rows. How do I tell crystal not to return any records (rows) where there are blanks in that field? I really hope someone knows. thanks
69
3217
by: Yee.Chuang | last post by:
When I began to learn C, My teacher told me that pointer is the most difficult part of C, it makes me afraid of it. After finishing C program class, I found that all the code I wrote in C contains little pointers, obviously I avoid using them. A few days ago when I was reading a book about programming, I was told that pointers are the very essence of C language, if I couldn't use it well, I'm a bad programmer, it's a big shock. So now I'm...
12
2895
ahmedtharwat19
by: ahmedtharwat19 | last post by:
hi, every one for delete duplicated rows can any one up to us an example to see that because i`m beginning to ms access and i have a problem about that thank you for all abo mroan
0
9647
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10357
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10104
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8988
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6744
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5397
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5532
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4063
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2894
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.