473,614 Members | 2,321 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Data Comparison Strategy Question

All,

I'm hoping one of you Xml or Data gurus can offer an opinion. I'm working
on an app to basically compare two semi-structured data files (e.g. Excel /
CSV) to one another. But I need to compare them as if they were
datatables... (ie a simple diff type tool won't work)...

More specifically, the process I envision so far is:
1) Read in two files -- if Excel select appropriate sheet (would also
require users to structure their worksheets as tables -- ie no mixed data so
common with Excel)
2) Determine worksheet -- if Excel
4) Scan for columns
5) Have user determine which is source and which is target (or file being
compared)
6) Have user match one or more "key" columns between two files --
eg Source File "ID" column = Target File "UserID" column (allows for
arbitray column names)
7) Have user select (similar to above) columns to compare (simple text,
integer or boolean equality for now)
8) Run the comparison determining --
a) Rows in one but not the other file (based on the Keys)
b) Rows with matching keys but non-matching (changed) comparison columns
c) Rows that match key / compare columns

Here's what I'm asking:
- I know how to do / have written 1 - 7.
- I also know how to do 8 above via brute force -- eg going from the source,
read row by row, column by column and comparing keys and comparison columns
between the two...
* What do you recommend as the data structure? Currently using an OleDB
generated dataset. Does Xml or a collection make more sense?
* If dataset or Xml, is there a better way than brute force? Dataview with
Find? Merge and detect differences?

Would appreciate any advice from anyone who's been there before. One
concern with brute force approach is that these files could contain hundreds
to thousands of rows (yes, I wish my fellow employees would do more with
databases) -- which, worst case scenario of, say, a thousand rows in each
file results in potentially, I believe, 1 Million comparisons....

Any thoughts, comments, suggestions? Conversion of source data to Access,
SQL Server, Oracle not an option......tri ed it already (politics)....

thanks in advance,
tim
Nov 12 '05 #1
0 1547

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1409
by: ajsiegel | last post by:
Viile writes - >Type declarations are a feature that might benefit IronPython and >Jython more than they would CPython. How much is this part of Guido's decisionmaking process? Guido is , IMO, very much a strategist, as well as a language designer. That's good, I think.
5
2446
by: Framework fan | last post by:
Hello, If I wrote the next ebay (yes I know, yawn-snore) and I had a database with 5 million auction items in it, what would be a really good strategy to get a search done very quickly? Would it involve something called OLAP and/or "data mining"? The only technology I am familiar with is simply SQL Server databases with stored procedures. I think I'd be guessing correctly and say that this technology simply wouldn't be fast enough *on...
4
52857
by: Maur | last post by:
Hi all, I have 2 tables say t_OLD and t_NEW. The new has corrections for audit purposes. They are identical in all respects (i.e. new is a copy of old and then changes are made to t_new) I would like a quick way to cycle through all the the fields in each table and compare the values to see if there are
2
1414
by: Jono | last post by:
Simple application - 2 tables, 1 qry and 1 form only. The main table is tblProjects and describes each of about 100 business projects. Each and every Project is developed under a particular business Goal/Strategy combination. I store Goal/Strategy codes in each record of tbl Projects. The other table, tblGoalsAndStrategies, is simply a reference table with Goal/Strategy codes and their associated descriptions.
46
5129
by: yadurajj | last post by:
Hello i am newbie trying to learn C..I need to know about string comparisons in C, without using a library function,...recently I was asked this in an interview..I can write a small program but I was told that wouldn't it be wise to first get the length of the strings..if it doesn't match then they are not the same..I agreed...then he said..but again that would be an overhead first measuring the length...and then doing a character by...
13
2195
by: lane straatman | last post by:
I'm trying to figure out what data type is appropriate to represent a card in a game. The idea that I thought was going to work was a struct, foo, with two integer fields and two fields of char arrays: index cardno description suit ( 1, 1,Two of clubs ,'c') ( 2, 2,Three of clubs ,'c') ( 3, 3,Four of clubs ,'c') ( 4, 4,Five of clubs ,'c') ( 5, 5,Six of clubs ,'c')
4
1921
by: bcomeara | last post by:
I am writing a program which needs to include a large amount of data. Basically, the data are p values for different possible outcomes from trials with different number of observations (the p values are necessarily based on slow simulations rather than on a standard function, so I estimated them once and want the program to include this information). Currently, I have this stored as a vector of vectors of varying sizes (first vector is...
37
2555
by: Michele Simionato | last post by:
At work we are shopping for a Web framework, so I have been looking at the available options on the current market. In particular I have looked at Paste and Pylons and I have written my impressions here: http://www.phyast.pitt.edu/~micheles/python/yet-another-comparison-of-web-frameworks.html I do not speak too well of Pylons, so if you thing I am wrong feel free to correct me here ;)
8
1642
by: Angelwings | last post by:
Hi everyone, I've to write my own definition of a BST with polymorphic data, as an university course project. I have troubles about comparing data when it's defined as polymorphic pointer. In my situation I've something like: class A {} class B : public A {} class C : public A {}
0
8142
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8640
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
8287
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8443
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7114
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6093
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5548
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4136
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1757
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.