473,770 Members | 1,899 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Import/Normalize approach - column-DML, or loop

Hi all,

I've just finished almost all of what has turned out to be a real bear of a
project. It has to import data from a monthly spreadsheet export from another
program, and convert that into normalized data. The task is made more
difficult by the fact that the structure itself can vary from month to month
(in well defined ways).

So, I used the SQL-centric approach, taking vertical stripes at a time so
that, for instance, for each field with simple, repeating text data, I make a
group-by pass, inserting into the destination lookup table, then I do another
query, joining from the input table to the text fields in the lookups to get
the foreign keys for the main destination table. When I have multiple columns
that need to become 1-M, I make a pass for each of those columns, inserting
lookup record that identifies the split, then inserting the rows into the
many-side table, yadda, yadda, yadda.

All that was going swimmingly, and performing pretty well until I got to the
fields containing multiple, delimited values. My whole dedign is based on
using SQL/DML passes for everything, but the only way I could figure out to
make that work was to call a user defined function from within the query to
pull out, in successive query slices, argument 1, argument 2, etc. I was
going to use a where clause to exclude null results (input doesn't have
arguments n or above), and quit after the first pass with .RecordsAffecte d =
0.

Sounds good, but with a mere 8000 input rows, I had to cancel the first pass
after waiting about 20 minutes. Note that a whole import process takes about
3 minutes if this part is ommitted, and that includes about 40 vertical split
passes. The UDF is very simple, and performs quite fast from VB, but I guess
the overhead of calling this function from a query is VERY SEVERE. Just to
get this thing out the door, I've decided to just tnot split these for now
since we're not doing queries of that data yet.

So, my next thought is that, perhaps this step is better done by simply,
brute-force, cycling through a recordset of the input table, and inserting
rows into the destination tables. If I'm soing that, though, then am I really
getting any benefit worth speaking of by doing everything -else- as SQL-DML in
vertical slices, or would I have been much better off, just doing it the
procedural way, walking through the input table, and creating destination rows
one row at a time? I know it would have been easier to write, but here I was
trying to do things the "right" way.

If I do something like this again, would my code end up performing just as
well, and being easier to write and maintain using simple iteration rather
than SQL-DML?

Thanks for any opinions,

- Steve J
Nov 12 '05
20 2767
On Sat, 31 Jan 2004 22:35:32 GMT, Steve Jorgensen
<no****@nospam. nospam> wrote:
I got the impression your SQL statements were executing VBA functions.
I was talking (unclearly) of building SQL statements that contain only
literal values, no function calls.


Oh, right. I see what you're getting at.

The issue was that the only way I could figure out to use a query to split up
the multi-vlaued fields was to call a UDF.


And you were trying to approach this in a set-based way. Sorry; I
lost the context somewhere.

Did you consider using third-party software like DataJunction? I had
a project that involved moving loads of data between a SQL system and
Lotus Notes, and a third-party program really made my life easier.
(It wasn't DataJunction, but it was the same kind of thing.)

--
Mike Sherrill
Information Management Systems
Nov 12 '05 #21

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
4148
by: Christos TZOTZIOY Georgiou | last post by:
I found at least one case where decombining and recombining a unicode character does not result in the same character (see at end). I have no extensive knowledge about Unicode, yet I believe that this must be a problem of the Unicode 3.2 specification and not Python's. However, I haven't found out how the decomp_data (in unicodedata_db.h) is built, and neither did I find much more info about the specifics of Unicode 3.2. I thought about...
4
1615
by: Torsten Bronger | last post by:
Hallöchen! I have a file that looks a little bit like a C header file with a long list of variables (actually constants) definitions, e.g. VI_ATTR_TIMO = 0x54378 .... Actually I need this in a couple of low-level modules that are imported into the main module, and in the main module itself. They
3
6272
by: Doug Baroter | last post by:
Hi, One of my clients has the following situation. They use Access DB for data update etc. some business functions while they also want to view the Access data quickly and more efficiently in SQL Server 2000. Huge Access db with over 100 user tables, over 60 MB data. The DTS package that comes with SQL Server 2000 seems pretty "messy" in the sense that it assumes that one needs to do one time import only or accurately it does not...
4
3025
by: Steve Jorgensen | last post by:
I'm restarting this thread with a different focus. The project I'm working on now id coming along and will be made to work, and it's too late to start over with a new strategy. Still, I'm not coming to a firm conclusion over whether it was the better approach, and wonder if I should do it differently the next time I'm faced with a similar issue. I needed an app to automatically import from spreadsheets with a semi-dynamic structure,...
5
1931
by: Colleyville Alan | last post by:
I need to import a spreadsheet into an Access table. The spreadsheet has performance for mutual funds for various periods. The problem is that if there is no info for a particular period, the spreadsheet contains a dash in the cell (for example if the fun is only 4 years old and I look in the columns for 5-year and 10-year performance). I cannot change the dashes to blank with a global change, the negative numbers would lose the...
2
4533
by: lwhite | last post by:
MS SQL 2000 sp4 on WinXp Pro SP2 I am very new to this so please let me know what I can do to make it easier for you to understand the problem. I have a non delimited text file. This text file has several columns that for the most part are fixed length but.. The fixed format starts with a variable length number( char 10) as the first column and a max (char 30) description field and another (char 50) long description field.
6
3095
by: Hemant Shah | last post by:
Folks, Today, I was exporting a table in one database and then importing it in another database. The table in destination database was missing one column (my mistake while creating the table), but import did not complain about it. Source table: Column Type Type
8
3003
by: bbcrock | last post by:
I have three tables with a relationship I've never worked with before. Can anyone suggest/comment on the best way to create a third normal form relationship between these tables? The tables basically are: TRAIN (TRAIN_ID and 15 columns about train specs, etc) TRUCK (TRUCK_ID and 12 columns about truck specs, etc) TRANSPORTATION_ITEM This table has, among others, two columns, TRUCK_ID and TRAIN_ID. If
0
1012
bugboy
by: bugboy | last post by:
At what point is normalizing worth forgetting? I have a column varchar (4) that will repeat one of only eight possible values for each row. Is it really more efficient to break this out into it's own table and link it with keys than to just have it repeat? ... it's only 4 characters.. I suppose if there are only 8 values i could make the column a char (1).. would this be more efficient or am i just splitting hairs here? Everywhere i...
7
1096
by: Frank Millman | last post by:
Hi all I am familiar enough with the normal use of 'import'. However, I have found a use for it which seems effective, but I have not seen it used like this before, so I am not sure if there are any downsides. I know that when a module is imported the first time, it is 'executed'. This normally entails setting up constants, classes, functions, etc, that you want to make available to the importer.
0
9617
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10254
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9904
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8929
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7451
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5354
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5481
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4007
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3607
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.