473,473 Members | 2,144 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Data Comparison Strategy Question

All,

I'm hoping one of you Xml or Data gurus can offer an opinion. I'm working
on an app to basically compare two semi-structured data files (e.g. Excel /
CSV) to one another. But I need to compare them as if they were
datatables... (ie a simple diff type tool won't work)...

More specifically, the process I envision so far is:
1) Read in two files -- if Excel select appropriate sheet (would also
require users to structure their worksheets as tables -- ie no mixed data so
common with Excel)
2) Determine worksheet -- if Excel
4) Scan for columns
5) Have user determine which is source and which is target (or file being
compared)
6) Have user match one or more "key" columns between two files --
eg Source File "ID" column = Target File "UserID" column (allows for
arbitray column names)
7) Have user select (similar to above) columns to compare (simple text,
integer or boolean equality for now)
8) Run the comparison determining --
a) Rows in one but not the other file (based on the Keys)
b) Rows with matching keys but non-matching (changed) comparison columns
c) Rows that match key / compare columns

Here's what I'm asking:
- I know how to do / have written 1 - 7.
- I also know how to do 8 above via brute force -- eg going from the source,
read row by row, column by column and comparing keys and comparison columns
between the two...
* What do you recommend as the data structure? Currently using an OleDB
generated dataset. Does Xml or a collection make more sense?
* If dataset or Xml, is there a better way than brute force? Dataview with
Find? Merge and detect differences?

Would appreciate any advice from anyone who's been there before. One
concern with brute force approach is that these files could contain hundreds
to thousands of rows (yes, I wish my fellow employees would do more with
databases) -- which, worst case scenario of, say, a thousand rows in each
file results in potentially, I believe, 1 Million comparisons....

Any thoughts, comments, suggestions? Conversion of source data to Access,
SQL Server, Oracle not an option......tried it already (politics)....

thanks in advance,
tim
Nov 12 '05 #1
0 1539

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: ajsiegel | last post by:
Viile writes - >Type declarations are a feature that might benefit IronPython and >Jython more than they would CPython. How much is this part of Guido's decisionmaking process? Guido is ,...
5
by: Framework fan | last post by:
Hello, If I wrote the next ebay (yes I know, yawn-snore) and I had a database with 5 million auction items in it, what would be a really good strategy to get a search done very quickly? Would...
4
by: Maur | last post by:
Hi all, I have 2 tables say t_OLD and t_NEW. The new has corrections for audit purposes. They are identical in all respects (i.e. new is a copy of old and then changes are made to t_new) ...
2
by: Jono | last post by:
Simple application - 2 tables, 1 qry and 1 form only. The main table is tblProjects and describes each of about 100 business projects. Each and every Project is developed under a particular...
46
by: yadurajj | last post by:
Hello i am newbie trying to learn C..I need to know about string comparisons in C, without using a library function,...recently I was asked this in an interview..I can write a small program but I...
13
by: lane straatman | last post by:
I'm trying to figure out what data type is appropriate to represent a card in a game. The idea that I thought was going to work was a struct, foo, with two integer fields and two fields of char...
4
by: bcomeara | last post by:
I am writing a program which needs to include a large amount of data. Basically, the data are p values for different possible outcomes from trials with different number of observations (the p...
37
by: Michele Simionato | last post by:
At work we are shopping for a Web framework, so I have been looking at the available options on the current market. In particular I have looked at Paste and Pylons and I have written my...
8
by: Angelwings | last post by:
Hi everyone, I've to write my own definition of a BST with polymorphic data, as an university course project. I have troubles about comparing data when it's defined as polymorphic pointer. In my...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.