I'm restarting this thread with a different focus.
The project I'm working on now id coming along and will be made to work, and
it's too late to start over with a new strategy. Still, I'm not coming to a
firm conclusion over whether it was the better approach, and wonder if I
should do it differently the next time I'm faced with a similar issue.
I needed an app to automatically import from spreadsheets with a semi-dynamic
structure, and I needed to handle 2 different, but similar spreadheet formats
implying different normalized target structures.
The obvious part of the design which is what I did do and would do again is to
create a set of data dictionary tables that describe the target tables and
fields, and how they extend from a central table representing the fixed
import->row mapped part. Then, I created a rules table that can be updated by
users over time to correctly drive the importer to tell it the meanings of new
or differently named columns.
Now, the part I'm not so certain about, and the real question I'm mulling over
is whether I was right to design this as query driven or whether it should
have been loop driven. Today, during debugging, I realized a whole new can of
worms I opened when I chose to do it primarily by building and running dynamic
insert queries rather than loop.
So - I already brought up the issue that the pure query approach didn't nicely
handle our strange case of some columns with long strings of delimited entries
that needed to be split out into 1-M branches, and we've covered that somewhat
in the other thread. Today, though, I realized the other issue wchich is
error handling.
When the new data doesn't conform to the old rules, you have to fix the rules
in the rules table, but you often discover the broken rules when the import
fails (such as when the alias for Null in the input is now a dash instead of
"NA"), but the error description and code information aren't specific enough
to use that to determine which column and which row or rows had what problem!
So as I'm testing it myself, I have to debug the code, rip out the offending
SQL, test it in vitro, and diagnose the problem. That's not hard for -me-,
but users can't do that every month when they run an import! So, with the
approach I took on this, it looks like I'll have to have my code generate even
more queries and pre-checks to catch and meaningfully report error conditions
before running the actual import queries, and hope I guess all the common,
general cases correctly.
Based on this issue, the cards are seeming to stack more in favor of a simple,
loop-based system if I ever do it again.
Now, I wouldn't bother bringing this issue to this forum if I didn't still
have a nagging discomfort with the conclusion. After all, if I looked at
someone else's code for this kind of system, and saw nested loops, my thought
would usually be that the programmer was incompetent at database programming.
After all, every database programmer knows you try to let the database engine
handle most all data transformations.
So, am I questioning my own conclusions simply because I've grown accustomed
to thinking there's only one right way to handle relational data operations,
or is there some real reason for my discomfort that I have simply forgotten
with my conscious mind because it's so long since I've attempted to do
anything like this the "wrong" way with loops.
Your thoughts? 4 2958
Steve Jorgensen wrote: I'm restarting this thread with a different focus.
The project I'm working on now id coming along and will be made to work, and it's too late to start over with a new strategy. Still, I'm not coming to a firm conclusion over whether it was the better approach, and wonder if I should do it differently the next time I'm faced with a similar issue.
I needed an app to automatically import from spreadsheets with a semi-dynamic structure, and I needed to handle 2 different, but similar spreadheet formats implying different normalized target structures.
The obvious part of the design which is what I did do and would do again is to create a set of data dictionary tables that describe the target tables and fields, and how they extend from a central table representing the fixed import->row mapped part. Then, I created a rules table that can be updated by users over time to correctly drive the importer to tell it the meanings of new or differently named columns.
Now, the part I'm not so certain about, and the real question I'm mulling over is whether I was right to design this as query driven or whether it should have been loop driven. Today, during debugging, I realized a whole new can of worms I opened when I chose to do it primarily by building and running dynamic insert queries rather than loop.
So - I already brought up the issue that the pure query approach didn't nicely handle our strange case of some columns with long strings of delimited entries that needed to be split out into 1-M branches, and we've covered that somewhat in the other thread. Today, though, I realized the other issue wchich is error handling.
When the new data doesn't conform to the old rules, you have to fix the rules in the rules table, but you often discover the broken rules when the import fails (such as when the alias for Null in the input is now a dash instead of "NA"), but the error description and code information aren't specific enough to use that to determine which column and which row or rows had what problem! So as I'm testing it myself, I have to debug the code, rip out the offending SQL, test it in vitro, and diagnose the problem. That's not hard for -me-, but users can't do that every month when they run an import! So, with the approach I took on this, it looks like I'll have to have my code generate even more queries and pre-checks to catch and meaningfully report error conditions before running the actual import queries, and hope I guess all the common, general cases correctly.
Based on this issue, the cards are seeming to stack more in favor of a simple, loop-based system if I ever do it again.
Now, I wouldn't bother bringing this issue to this forum if I didn't still have a nagging discomfort with the conclusion. After all, if I looked at someone else's code for this kind of system, and saw nested loops, my thought would usually be that the programmer was incompetent at database programming. After all, every database programmer knows you try to let the database engine handle most all data transformations.
I like forms to handle my data transforations. Lots of good validation.
You have to look at what you are doing. Instead of doing validation one record at
a time like a form you are performing a batch update. That's nice if you have
clean data or are changing all dates from today to tomorrow. But you have dirty
data....perhaps lots of it.
I further doubt that if any of us were to follow you and maintain what you've down
we'd say "Damn, look at what Steve did, he used a loop Can you believe he ever
helped others on CDMA?"
So, am I questioning my own conclusions simply because I've grown accustomed to thinking there's only one right way to handle relational data operations, or is there some real reason for my discomfort that I have simply forgotten with my conscious mind because it's so long since I've attempted to do anything like this the "wrong" way with loops.
Your thoughts?
Im the old days you might perform a validation check first. If the data was
dirty, the entire job was rejected....to be resubmitted when the data was clean.
That is my first impression. It appears there is little validation/checks in
your Excel sheets when data is entered and you are discovering the errors during
the import process. Can't Excel be set up to have better data entry checks?
In your case you may be better off appending the records in a temp table and
validate each record. In your case, tag those records that pass and those that
fail. You would export back to Xcel the bad rows. If the entire set needs to be
clean, tell them the import process in Access failed. You could even providet
them with a report of all records that failed and the reason.
I would think that looping records gives you the control you want and need. You
can add your validataion checks easily and they can be modified easily. Once I
have a complex query set up I hate to go back and modify the thing.
In your case I would be more concerned about accuracy then the speed of the
process.....or what some non-existent programmer is going to think of you years
from now because you opted to use a loop. Once he futzes around with some queries
he may think "Steve was a genius!"
One last thing. If you do need to use queries, you can have your queries perform
validation checks via functions. You can pass your fields to the function(s) and
return true/false. The only negative I've encountered using functions is that the
the datasheets has a slower repaint.
One other thing. Are you importing from Excel or linking? If linking, you could
query the sheet on your bad data. If you have the file name, I'd think you could
set up a temp link and run some queries to pinpoint the problem to the user. Or
do the updates right then and there. Ex: Anything not between 1 and 10 is NA.
Any rows where the column is blank is not updated to Unknown. Once you do the
update queries on the linked spreadsheet import it and kill the link.
Steve Jorgensen wrote: Your thoughts?
Ditch the static notions of competent and incompetent and base your
decisions on an analysis of the problem at hand.
Sounds to me like you made up your mind how you were going to approach
the solution before you looked at the problem.
Steve Jorgensen <no****@nospam.nospam> wrote in
news:83********************************@4ax.com: So, am I questioning my own conclusions simply because I've grown accustomed to thinking there's only one right way to handle relational data operations, or is there some real reason for my discomfort that I have simply forgotten with my conscious mind because it's so long since I've attempted to do anything like this the "wrong" way with loops.
Well, there's nothing wrong with looping where it is appropriate to
the task. It's just that many people use it when it's not, because
it's the only way they can think of (usually because they coming
from procedural database environments like xBase).
That said, I still think you should stay with your present method
and simply insert a validation stage before the actual import.
I designed an application for a client who was importing data
exported from the UNOS organ transplant database. I did it like
this:
1. link to the text file.
2. check a few things in it to see that it's (has the expected
column names and so forth).
3. if it passes the structural tests of #2, import the text file
into the buffer import table.
4. now run tests on the data in the fields that are problematic.
5. create discrepancy report records for all the failures.
6. import the data and retain the attachment to the discrepancy
report record.
7. the user can then go through all the discrepancy records and fix
the problems manually.
Actually, this was a system where the client's application was
capturing some data that overlapped with the imported data (to
insure accuracy of some crucial fields) and many of the
discrepancies were between the two data sources -- one of the most
common such discrepancies was reported data of birth! The
discrepancy resolution system included a system for creating an
email to the data supervisor of the institution where the data came
from so they could check their records and provide a definitive
answer as to the correct value.
Maybe that gives you some ideas.
I can't remember if I built in a threshold for number of
discrepancies that would abort the import if, say 90% of the records
had a discrepancy in a single field. But you could consider such a
mechanism.
All that said, if I were in your position, I'd not do anything of
the sort. If you're chasing a moving target, I don't know that
there's any way to ever program for anything, so it's probably
cheaper to fix problems as they occur than it is to attempt to build
a system that will accomodate the errors.
--
David W. Fenton http://www.bway.net/~dfenton
dfenton at bway dot net http://www.bway.net/~dfassoc
On Sat, 31 Jan 2004 16:55:28 GMT, "David W. Fenton"
<dX********@bway.net.invalid> wrote: Steve Jorgensen <no****@nospam.nospam> wrote in news:83********************************@4ax.com :
So, am I questioning my own conclusions simply because I've grown accustomed to thinking there's only one right way to handle relational data operations, or is there some real reason for my discomfort that I have simply forgotten with my conscious mind because it's so long since I've attempted to do anything like this the "wrong" way with loops. Well, there's nothing wrong with looping where it is appropriate to the task. It's just that many people use it when it's not, because it's the only way they can think of (usually because they coming from procedural database environments like xBase).
That said, I still think you should stay with your present method and simply insert a validation stage before the actual import.
Well, for the current project, certainly. I certainly can't get a budget to
code 1/2 the app a second time!
I designed an application for a client who was importing data exported from the UNOS organ transplant database. I did it like this:
1. link to the text file.
2. check a few things in it to see that it's (has the expected column names and so forth).
Right, in my case, that's making sure every column has exactly rule that
matches on its name.
3. if it passes the structural tests of #2, import the text file into the buffer import table.
Yup. First, I copy the spreadsheet, check how many header rows there are,
mush them together, insert a row under the header with "!!!"in each cell, so
Access won't try and fail to treat columns as something other than text, then
create a temporary staging database, import the sheet copy into there, and
link from my importer app to the staging database. After that, I do some
basic pre-cleanup like changing blank strings to Nulls, changing Null alias
text to Nulls, etc.
4. now run tests on the data in the fields that are problematic.
Here's the fun part, eh? One example of an issue I had was that a field with
mostly numberic data had a special code in just 3 rows. Until now, I had been
making no attempt to validate type conversions in columns prior to import
because the import query does type conversions, and I'd been assuming I would
just trap errors there. Obviously, that idea should have been tested in a
prototype. Since this system is table-driven, I guess my code will have to
inspect each destination column type, then run a check on the source column
that will feed it, applying the same transformation expression that will be
used in the import stage, and see if there's a problem.
5. create discrepancy report records for all the failures.
Clearly
6. import the data and retain the attachment to the discrepancy report record.
Actually, we'll want to fail if there are any discrepancies, and have the user
fix the import rules and/or target schema to eliminate the discrepancies.
Apply, lather, rinse, repeat.
7. the user can then go through all the discrepancy records and fix the problems manually.
We keep the old export files, and it's nice if the process is repeatable in
case we find any important errors or omissions in the import process later.
Actually, this was a system where the client's application was capturing some data that overlapped with the imported data (to insure accuracy of some crucial fields) and many of the discrepancies were between the two data sources -- one of the most common such discrepancies was reported data of birth! The discrepancy resolution system included a system for creating an email to the data supervisor of the institution where the data came from so they could check their records and provide a definitive answer as to the correct value.
Maybe that gives you some ideas.
I can't remember if I built in a threshold for number of discrepancies that would abort the import if, say 90% of the records had a discrepancy in a single field. But you could consider such a mechanism.
All that said, if I were in your position, I'd not do anything of the sort. If you're chasing a moving target, I don't know that there's any way to ever program for anything, so it's probably cheaper to fix problems as they occur than it is to attempt to build a system that will accomodate the errors.
Well, what I did was try to make a system that could handle all expected
categories of discrepancies by changing only the schema, the dictionary data,
and the rules, without needing to deal with the VBA code. I think, except for
validation, I've succeeeded pretty well. I've batch imported a sequence of 4
months so far, and that included a month in which they changed the alias for
Null from "NA" to "-" and changed a few column names. There are bigger
changes in later data, but the way my code is structured, if the known cases
of these would be problematic, my code would not be working now, so I'm pretty
confident. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Christos TZOTZIOY Georgiou |
last post by:
I found at least one case where decombining and recombining a unicode
character does not result in the same character (see at end).
I have no extensive knowledge about Unicode, yet I believe that...
|
by: Buck Woody |
last post by:
Hello -
I have three feeds from sources around the world, each coming in at a
separate time.
These feeds move into a large table (3GB) that is queried by managers.
The feeds are loaded...
|
by: Ben Jeurissen |
last post by:
Hello,
I have to deal with the following issue in C++:
Two threads are started from the main thread, both capturing images from
a different firewire camera. Both threads take shots of 460800...
|
by: Steve Jorgensen |
last post by:
Hi all,
I've just finished almost all of what has turned out to be a real bear of a
project. It has to import data from a monthly spreadsheet export from another
program, and convert that into...
|
by: Colleyville Alan |
last post by:
I need to import a spreadsheet into an Access table. The spreadsheet has
performance for mutual funds for various periods. The problem is that if
there is no info for a particular period, the...
|
by: Jeff |
last post by:
Looking for your highly subjective opinions (from those of you who have been
working extensively with VS 2005).
I have a client that needs/wants a complete rewrite of an existing
application. I...
|
by: robert |
last post by:
I get python crashes and (in better cases) strange Python exceptions when (in most cases) importing and using cookielib lazy on demand in a thread.
It is mainly with cookielib, but remember the...
|
by: Earl Anderson |
last post by:
I guess I missed the boat on the logic for this one. Immediately upon
hitting "Import" in an
attempt to import an Excel file containing 7 columns of 'txt' formatted data
into AXP, I got a "Type...
|
by: John Ladasky |
last post by:
Hi folks,
Running Python 2.5 on both a Windows XP laptop, and an Ubuntu Linux
7.04 desktop.
I've gotten tired of maintaining multiple copies of my personal
modules that I use over and over. ...
|
by: DJRhino |
last post by:
Was curious if anyone else was having this same issue or not....
I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: Aliciasmith |
last post by:
In an age dominated by smartphones, having a mobile app for your business is no longer an option; it's a necessity. Whether you're a startup or an established enterprise, finding the right mobile app...
|
by: tracyyun |
last post by:
Hello everyone,
I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
|
by: NeoPa |
last post by:
Hello everyone.
I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report).
I know it can be done by selecting :...
|
by: NeoPa |
last post by:
Introduction
For this article I'll be using a very simple database which has Form (clsForm) & Report (clsReport) classes that simply handle making the calling Form invisible until the Form, or all...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 1 Nov 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM)
Please note that the UK and Europe revert to winter time on...
|
by: isladogs |
last post by:
The next online meeting of the Access Europe User Group will be on Wednesday 6 Dec 2023 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, Mike...
|
by: GKJR |
last post by:
Does anyone have a recommendation to build a standalone application to replace an Access database? I have my bookkeeping software I developed in Access that I would like to make available to other...
| |