All,
I'm hoping one of you Xml or Data gurus can offer an opinion. I'm working
on an app to basically compare two semi-structured data files (e.g. Excel /
CSV) to one another. But I need to compare them as if they were
datatables... (ie a simple diff type tool won't work)...
More specifically, the process I envision so far is:
1) Read in two files -- if Excel select appropriate sheet (would also
require users to structure their worksheets as tables -- ie no mixed data so
common with Excel)
2) Determine worksheet -- if Excel
4) Scan for columns
5) Have user determine which is source and which is target (or file being
compared)
6) Have user match one or more "key" columns between two files --
eg Source File "ID" column = Target File "UserID" column (allows for
arbitray column names)
7) Have user select (similar to above) columns to compare (simple text,
integer or boolean equality for now)
8) Run the comparison determining --
a) Rows in one but not the other file (based on the Keys)
b) Rows with matching keys but non-matching (changed) comparison columns
c) Rows that match key / compare columns
Here's what I'm asking:
- I know how to do / have written 1 - 7.
- I also know how to do 8 above via brute force -- eg going from the source,
read row by row, column by column and comparing keys and comparison columns
between the two...
* What do you recommend as the data structure? Currently using an OleDB
generated dataset. Does Xml or a collection make more sense?
* If dataset or Xml, is there a better way than brute force? Dataview with
Find? Merge and detect differences?
Would appreciate any advice from anyone who's been there before. One
concern with brute force approach is that these files could contain hundreds
to thousands of rows (yes, I wish my fellow employees would do more with
databases) -- which, worst case scenario of, say, a thousand rows in each
file results in potentially, I believe, 1 Million comparisons....
Any thoughts, comments, suggestions? Conversion of source data to Access,
SQL Server, Oracle not an option......tried it already (politics)....
thanks in advance,
tim