Efficient local data-handling

Mantorok

Hi all

I have a DataTable containing around 25000 rows. For each row I want to
query the entire DT to find out if potentially duplicate items exist, problem
is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'", "''"),
(row["cl_fname"] as string).Replace("'", "''")));

My question - is there a quicker way of doing this? BTW, the reason why I'm
doing this locally is because the DT is made up of data from 7 different
sources.

Thanks
Kev

May 3 '06 #1

Subscribe Post Reply

1673

Nicholas Paldino [.NET/C# MVP]

Mantorok,

Why not cycle through the rows, and then keep a record of which rows are
duplicates?

I would have a Dictionary<<type of unique identifier>, List<int>. The
<type of unique identifier> would be the type of a field in the row which
would be duplicated among rows. Either this, or some sort of key which
indicates the values in the row that is duplicated (a structure might work
well here since it will generate the same hash code for the same values in
it, whereas the hashcode generated for the DataRow will not).

Then, the list would be a list of row indexes which share that value.

Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...

Hi all

I have a DataTable containing around 25000 rows. For each row I want to
query the entire DT to find out if potentially duplicate items exist,
problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));

My question - is there a quicker way of doing this? BTW, the reason why
I'm doing this locally is because the DT is made up of data from 7
different sources.

Thanks
Kev

May 3 '06 #2

Mark Newmister

Sort the table based on your key (test) values. While iterating via loop,
see if your current row fields match the fields of the next row.

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...

Hi all

I have a DataTable containing around 25000 rows. For each row I want to
query the entire DT to find out if potentially duplicate items exist,
problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));

My question - is there a quicker way of doing this? BTW, the reason why
I'm doing this locally is because the DT is made up of data from 7
different sources.

Thanks
Kev

May 3 '06 #3

Mantorok

Hi

How do you sort a DataTable?

Thanks
Kev

Sort the table based on your key (test) values. While iterating via
loop, see if your current row fields match the fields of the next row.

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...
Hi all

I have a DataTable containing around 25000 rows. For each row I want
to query the entire DT to find out if potentially duplicate items
exist, problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));
My question - is there a quicker way of doing this? BTW, the reason
why I'm doing this locally is because the DT is made up of data from
7 different sources.

Thanks
Kev

May 4 '06 #4

Mantorok

Hi

Although I understand that there are work-arounds, my main gripe is that
I shouldn't really have to, the DataTable provides a Select method which
should handle [almost] anything I throw at it, it's quite dissapointing really.
When I said slow, I mean real slow, as in abnormal, it was looking to take
around 1/2 hour if i'd left it to continue.

As it goes I've managed to speed it up by removing the "ClientId <> " criteria
which sped it up immensly (by about 20x), I've had a quick search on the
MSDN forums and it seems the DataTable does seem to have an issue with the
Select method.

Thanks for the help.

Kev

Mantorok,

Why not cycle through the rows, and then keep a record of which
rows are duplicates?

I would have a Dictionary<<type of unique identifier>, List<int>.
The <type of unique identifier> would be the type of a field in the
row which would be duplicated among rows. Either this, or some sort
of key which indicates the values in the row that is duplicated (a
structure might work well here since it will generate the same hash
code for the same values in it, whereas the hashcode generated for the
DataRow will not).

Then, the list would be a list of row indexes which share that
value.

Hope this helps.

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...
Hi all

I have a DataTable containing around 25000 rows. For each row I want
to query the entire DT to find out if potentially duplicate items
exist, problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));
My question - is there a quicker way of doing this? BTW, the reason
why I'm doing this locally is because the DT is made up of data from
7 different sources.

Thanks
Kev

May 4 '06 #5

Göran Andersson

You don't. Use the DefaultView of the DataTable, or create a new
DataView, to sort the data.

Mantorok wrote:

Hi

How do you sort a DataTable?

Thanks
Kev
Sort the table based on your key (test) values. While iterating via
loop, see if your current row fields match the fields of the next row.

"Mantorok" <sp******@spam.com> wrote in message
news:cb**************************@news.rmplc.co.uk ...
Hi all

I have a DataTable containing around 25000 rows. For each row I want
to query the entire DT to find out if potentially duplicate items
exist, problem is - this is reeeeeeaaaaal slow.

I'm using the Select method:

DataRow[] found = allClients.Select(
string.Format("ClientId <> '{0}' AND cl_sname = '{1}' AND
cl_fname = '{2}'",
row["ClientId"],
(row["cl_sname"] as string).Replace("'",
"''"),
(row["cl_fname"] as string).Replace("'",
"''")));
My question - is there a quicker way of doing this? BTW, the reason
why I'm doing this locally is because the DT is made up of data from
7 different sources.

Thanks
Kev

May 4 '06 #6

Similar topics

Efficient site maintenance with VS.NET

by: Beren | last post by:

Hello, Can anyone give some tips to efficiently update a remote project ? I prefer to keep my projects locally, compile as release and then copy everything it to the remote server. What is...

ASP.NET

Efficient Data Structures

by: Curious | last post by:

Hi, I am searching for a data structure that stores key-value pairs in it. This data structure is to hold large amounts of key-value pairs, and so needs to be efficient both in insertion and...

C# / C Sharp

Efficient way to firing event.....

by: utkarsh | last post by:

Hi All, I am using the following method "FireAsync" (i got the following information from the google groups) to fire the event for all the subscribers. Is there another way to fire the event...

C# / C Sharp

More efficient than LEFT JOIN

by: Brian Wotherspoon | last post by:

I have a table with data that is refreshed regularly but I still need to store the old data. I have created a seperate table with a foreign key to the table and the date on which it was replaced. ...

Microsoft SQL Server

converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

by: py_genetic | last post by:

Hello, I'm importing large text files of data using csv. I would like to add some more auto sensing abilities. I'm considing sampling the data file and doing some fuzzy logic scoring on the...

Python

Efficient Array<> of valuetype entry manipulation

by: Francisco | last post by:

Hello, Is there any code faster than this array position manipulation (some code omitted for brevity)?: internal struct TreeNodeTableItem { public int a; public int b; public int c; public...

C# / C Sharp

STL: Could you make this snippet more efficient

by: pedagani | last post by:

Dear comp.lang.c++, Could you make this snippet more efficient? As you see I have too many variables introduced in the code. //Read set of integers from a file on line by line basis in a STL...

C / C++

Most efficient way to copy large volume of files

by: =?Utf-8?B?UVNJRGV2ZWxvcGVy?= | last post by:

Using .NET 2.0 is it more efficient to copy files to a single folder versus spreading them across multiple folders. For instance if we have 100,000 files to be copied, Do we copy all of them to...

.NET Framework

How can I make this more efficient? (combining DataSet results with the results of a DB lookup.)

by: Ken Fine | last post by:

This is a question that someone familiar with ASP.NET and ADO.NET DataSets and DataTables should be able to answer fairly easily. The basic question is how I can efficiently match data from one...

ASP.NET

How to make this program more efficient?

by: Bill David | last post by:

SUBJECT: How to make this program more efficient? In my program, a thread will check update from server periodically and generate a stl::map for other part of this program to read data from....

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General