473,395 Members | 1,756 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Efficient local data-handling

Hi all

I have a DataTable containing around 25000 rows. For each row I want to
query the entire DT to find out if potentially duplicate items exist, problem
is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'", "''"),
(row["cl_fname"] as string).Replace("'", "''")));

My question - is there a quicker way of doing this? BTW, the reason why I'm
doing this locally is because the DT is made up of data from 7 different
sources.

Thanks
Kev
May 3 '06 #1
5 1673
Mantorok,

Why not cycle through the rows, and then keep a record of which rows are
duplicates?

I would have a Dictionary<<type of unique identifier>, List<int>. The
<type of unique identifier> would be the type of a field in the row which
would be duplicated among rows. Either this, or some sort of key which
indicates the values in the row that is duplicated (a structure might work
well here since it will generate the same hash code for the same values in
it, whereas the hashcode generated for the DataRow will not).

Then, the list would be a list of row indexes which share that value.

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...
Hi all

I have a DataTable containing around 25000 rows. For each row I want to
query the entire DT to find out if potentially duplicate items exist,
problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));

My question - is there a quicker way of doing this? BTW, the reason why
I'm doing this locally is because the DT is made up of data from 7
different sources.

Thanks
Kev

May 3 '06 #2
Sort the table based on your key (test) values. While iterating via loop,
see if your current row fields match the fields of the next row.

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...
Hi all

I have a DataTable containing around 25000 rows. For each row I want to
query the entire DT to find out if potentially duplicate items exist,
problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));

My question - is there a quicker way of doing this? BTW, the reason why
I'm doing this locally is because the DT is made up of data from 7
different sources.

Thanks
Kev

May 3 '06 #3
Hi

How do you sort a DataTable?

Thanks
Kev
Sort the table based on your key (test) values. While iterating via
loop, see if your current row fields match the fields of the next row.

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...
Hi all

I have a DataTable containing around 25000 rows. For each row I want
to query the entire DT to find out if potentially duplicate items
exist, problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));
My question - is there a quicker way of doing this? BTW, the reason
why I'm doing this locally is because the DT is made up of data from
7 different sources.

Thanks
Kev

May 4 '06 #4
Hi

Although I understand that there are work-arounds, my main gripe is that
I shouldn't really have to, the DataTable provides a Select method which
should handle [almost] anything I throw at it, it's quite dissapointing really.
When I said slow, I mean real slow, as in abnormal, it was looking to take
around 1/2 hour if i'd left it to continue.

As it goes I've managed to speed it up by removing the "ClientId <> " criteria
which sped it up immensly (by about 20x), I've had a quick search on the
MSDN forums and it seems the DataTable does seem to have an issue with the
Select method.

Thanks for the help.

Kev
Mantorok,

Why not cycle through the rows, and then keep a record of which
rows are duplicates?

I would have a Dictionary<<type of unique identifier>, List<int>.
The <type of unique identifier> would be the type of a field in the
row which would be duplicated among rows. Either this, or some sort
of key which indicates the values in the row that is duplicated (a
structure might work well here since it will generate the same hash
code for the same values in it, whereas the hashcode generated for the
DataRow will not).

Then, the list would be a list of row indexes which share that
value.

Hope this helps.

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...
Hi all

I have a DataTable containing around 25000 rows. For each row I want
to query the entire DT to find out if potentially duplicate items
exist, problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));
My question - is there a quicker way of doing this? BTW, the reason
why I'm doing this locally is because the DT is made up of data from
7 different sources.

Thanks
Kev

May 4 '06 #5
You don't. Use the DefaultView of the DataTable, or create a new
DataView, to sort the data.

Mantorok wrote:
Hi

How do you sort a DataTable?

Thanks
Kev
Sort the table based on your key (test) values. While iterating via
loop, see if your current row fields match the fields of the next row.

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...
Hi all

I have a DataTable containing around 25000 rows. For each row I want
to query the entire DT to find out if potentially duplicate items
exist, problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));
My question - is there a quicker way of doing this? BTW, the reason
why I'm doing this locally is because the DT is made up of data from
7 different sources.

Thanks
Kev


May 4 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Beren | last post by:
Hello, Can anyone give some tips to efficiently update a remote project ? I prefer to keep my projects locally, compile as release and then copy everything it to the remote server. What is...
22
by: Curious | last post by:
Hi, I am searching for a data structure that stores key-value pairs in it. This data structure is to hold large amounts of key-value pairs, and so needs to be efficient both in insertion and...
6
by: utkarsh | last post by:
Hi All, I am using the following method "FireAsync" (i got the following information from the google groups) to fire the event for all the subscribers. Is there another way to fire the event...
3
by: Brian Wotherspoon | last post by:
I have a table with data that is refreshed regularly but I still need to store the old data. I have created a seperate table with a foreign key to the table and the date on which it was replaced. ...
21
by: py_genetic | last post by:
Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the...
8
by: Francisco | last post by:
Hello, Is there any code faster than this array position manipulation (some code omitted for brevity)?: internal struct TreeNodeTableItem { public int a; public int b; public int c; public...
12
by: pedagani | last post by:
Dear comp.lang.c++, Could you make this snippet more efficient? As you see I have too many variables introduced in the code. //Read set of integers from a file on line by line basis in a STL...
1
by: =?Utf-8?B?UVNJRGV2ZWxvcGVy?= | last post by:
Using .NET 2.0 is it more efficient to copy files to a single folder versus spreading them across multiple folders. For instance if we have 100,000 files to be copied, Do we copy all of them to...
3
by: Ken Fine | last post by:
This is a question that someone familiar with ASP.NET and ADO.NET DataSets and DataTables should be able to answer fairly easily. The basic question is how I can efficiently match data from one...
82
by: Bill David | last post by:
SUBJECT: How to make this program more efficient? In my program, a thread will check update from server periodically and generate a stl::map for other part of this program to read data from....
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.