Hi all
I have a DataTable containing around 25000 rows. For each row I want to
query the entire DT to find out if potentially duplicate items exist, problem
is - this is reeeeeeaaaaal slow.
I'm using the Select method:
DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'", "''"),
(row["cl_fname"] as string).Replace("'", "''")));
My question - is there a quicker way of doing this? BTW, the reason why I'm
doing this locally is because the DT is made up of data from 7 different
sources.
Thanks
Kev 5 1673
Mantorok,
Why not cycle through the rows, and then keep a record of which rows are
duplicates?
I would have a Dictionary<<type of unique identifier>, List<int>. The
<type of unique identifier> would be the type of a field in the row which
would be duplicated among rows. Either this, or some sort of key which
indicates the values in the row that is duplicated (a structure might work
well here since it will generate the same hash code for the same values in
it, whereas the hashcode generated for the DataRow will not).
Then, the list would be a list of row indexes which share that value.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com
"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ... Hi all
I have a DataTable containing around 25000 rows. For each row I want to query the entire DT to find out if potentially duplicate items exist, problem is - this is reeeeeeaaaaal slow.
I'm using the Select method:
DataRow[] found = allClients.Select( string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND cl_fname = '{2}'", row["ClientId"], (row["cl_sname"] as string).Replace("'", "''"), (row["cl_fname"] as string).Replace("'", "''")));
My question - is there a quicker way of doing this? BTW, the reason why I'm doing this locally is because the DT is made up of data from 7 different sources.
Thanks Kev
Sort the table based on your key (test) values. While iterating via loop,
see if your current row fields match the fields of the next row.
"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ... Hi all
I have a DataTable containing around 25000 rows. For each row I want to query the entire DT to find out if potentially duplicate items exist, problem is - this is reeeeeeaaaaal slow.
I'm using the Select method:
DataRow[] found = allClients.Select( string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND cl_fname = '{2}'", row["ClientId"], (row["cl_sname"] as string).Replace("'", "''"), (row["cl_fname"] as string).Replace("'", "''")));
My question - is there a quicker way of doing this? BTW, the reason why I'm doing this locally is because the DT is made up of data from 7 different sources.
Thanks Kev
Hi
How do you sort a DataTable?
Thanks
Kev Sort the table based on your key (test) values. While iterating via loop, see if your current row fields match the fields of the next row.
"Mantorok" <sp******@spam.com> wrote in message news:cb**************************@news.rmplc.co.uk ...
Hi all
I have a DataTable containing around 25000 rows. For each row I want to query the entire DT to find out if potentially duplicate items exist, problem is - this is reeeeeeaaaaal slow.
I'm using the Select method:
DataRow[] found = allClients.Select( string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND cl_fname = '{2}'", row["ClientId"], (row["cl_sname"] as string).Replace("'", "''"), (row["cl_fname"] as string).Replace("'", "''"))); My question - is there a quicker way of doing this? BTW, the reason why I'm doing this locally is because the DT is made up of data from 7 different sources.
Thanks Kev
Hi
Although I understand that there are work-arounds, my main gripe is that
I shouldn't really have to, the DataTable provides a Select method which
should handle [almost] anything I throw at it, it's quite dissapointing really.
When I said slow, I mean real slow, as in abnormal, it was looking to take
around 1/2 hour if i'd left it to continue.
As it goes I've managed to speed it up by removing the "ClientId <> " criteria
which sped it up immensly (by about 20x), I've had a quick search on the
MSDN forums and it seems the DataTable does seem to have an issue with the
Select method.
Thanks for the help.
Kev Mantorok,
Why not cycle through the rows, and then keep a record of which rows are duplicates?
I would have a Dictionary<<type of unique identifier>, List<int>. The <type of unique identifier> would be the type of a field in the row which would be duplicated among rows. Either this, or some sort of key which indicates the values in the row that is duplicated (a structure might work well here since it will generate the same hash code for the same values in it, whereas the hashcode generated for the DataRow will not).
Then, the list would be a list of row indexes which share that value.
Hope this helps.
"Mantorok" <sp******@spam.com> wrote in message news:cb**************************@news.rmplc.co.uk ...
Hi all
I have a DataTable containing around 25000 rows. For each row I want to query the entire DT to find out if potentially duplicate items exist, problem is - this is reeeeeeaaaaal slow.
I'm using the Select method:
DataRow[] found = allClients.Select( string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND cl_fname = '{2}'", row["ClientId"], (row["cl_sname"] as string).Replace("'", "''"), (row["cl_fname"] as string).Replace("'", "''"))); My question - is there a quicker way of doing this? BTW, the reason why I'm doing this locally is because the DT is made up of data from 7 different sources.
Thanks Kev
You don't. Use the DefaultView of the DataTable, or create a new
DataView, to sort the data.
Mantorok wrote: Hi
How do you sort a DataTable?
Thanks Kev
Sort the table based on your key (test) values. While iterating via loop, see if your current row fields match the fields of the next row.
"Mantorok" <sp******@spam.com> wrote in message news:cb**************************@news.rmplc.co.uk ...
Hi all
I have a DataTable containing around 25000 rows. For each row I want to query the entire DT to find out if potentially duplicate items exist, problem is - this is reeeeeeaaaaal slow.
I'm using the Select method:
DataRow[] found = allClients.Select( string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND cl_fname = '{2}'", row["ClientId"], (row["cl_sname"] as string).Replace("'", "''"), (row["cl_fname"] as string).Replace("'", "''"))); My question - is there a quicker way of doing this? BTW, the reason why I'm doing this locally is because the DT is made up of data from 7 different sources.
Thanks Kev
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Beren |
last post by:
Hello,
Can anyone give some tips to efficiently update a remote project ?
I prefer to keep my projects locally, compile as release and then copy
everything it to the remote server.
What is...
|
by: Curious |
last post by:
Hi,
I am searching for a data structure that stores key-value pairs in it.
This data structure is to hold large amounts of key-value pairs, and so
needs to be efficient both in insertion and...
|
by: utkarsh |
last post by:
Hi All,
I am using the following method "FireAsync" (i got the following
information from the google groups) to fire the event for all the
subscribers.
Is there another way to fire the event...
|
by: Brian Wotherspoon |
last post by:
I have a table with data that is refreshed regularly but I still need to
store the old data. I have created a seperate table with a foreign key
to the table and the date on which it was replaced. ...
|
by: py_genetic |
last post by:
Hello,
I'm importing large text files of data using csv. I would like to add
some more auto sensing abilities. I'm considing sampling the data
file and doing some fuzzy logic scoring on the...
|
by: Francisco |
last post by:
Hello,
Is there any code faster than this array position manipulation (some
code omitted for brevity)?:
internal struct TreeNodeTableItem {
public int a;
public int b;
public int c;
public...
|
by: pedagani |
last post by:
Dear comp.lang.c++,
Could you make this snippet more efficient? As you see I have too many
variables introduced in the code.
//Read set of integers from a file on line by line basis in a STL...
|
by: =?Utf-8?B?UVNJRGV2ZWxvcGVy?= |
last post by:
Using .NET 2.0 is it more efficient to copy files to a single folder versus
spreading them across multiple folders.
For instance if we have 100,000 files to be copied,
Do we copy all of them to...
|
by: Ken Fine |
last post by:
This is a question that someone familiar with ASP.NET and ADO.NET DataSets
and DataTables should be able to answer fairly easily. The basic question is
how I can efficiently match data from one...
|
by: Bill David |
last post by:
SUBJECT: How to make this program more efficient?
In my program, a thread will check update from server periodically and
generate a stl::map for other part of this program to read data from....
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
| |