By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,558 Members | 1,607 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,558 IT Pros & Developers. It's quick & easy.

Combining data from differnet Excel sheets

P: n/a
Hi all
I need to add data from two Excel sheets (both on the same workbook)
to an existing table in my SQL DB.
The problem is that each sheet holds different fields for the same
record, though the records are sorted by row numbers.
(I had to split the fields to different sheets because Excel has a
limit of 256 fields in each sheet)
My sheets are quite large (~55,000 rows and 200 columns each) and I'll
have to repeat this action many time, so some kind of an automation
will nice (VB, DTS package?)

I am new to SQLServer, so an example will be most helpful.

Many thanks,
Ilan
Jul 20 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a

"Ilan" <le***@walla.co.il> wrote in message
news:78**************************@posting.google.c om...
Hi all
I need to add data from two Excel sheets (both on the same workbook)
to an existing table in my SQL DB.
The problem is that each sheet holds different fields for the same
record, though the records are sorted by row numbers.
(I had to split the fields to different sheets because Excel has a
limit of 256 fields in each sheet)
My sheets are quite large (~55,000 rows and 200 columns each) and I'll
have to repeat this action many time, so some kind of an automation
will nice (VB, DTS package?)

I am new to SQLServer, so an example will be most helpful.

Many thanks,
Ilan


I don't really understand your description, but you might look at using DTS
to load the data into staging tables, then use TSQL to move the data to the
production tables. That's a common general approach, but without more
specific details of what you're trying to do, perhaps including a more exact
description of the data and the destination table structure, it's not easy
to be more precise.

Simon
Jul 20 '05 #2

P: n/a
Simon,
Sorry for not being clear enough.
What I have are two sheets, each holding 200 columns (fields 1-200 at
the first and fields 201-400 at the second), where each row (or
record) starts at the first sheet and continues at the second. All the
data is numeric.
What I need to do is somehow concatenate the data on the corresponding
rows from both sheets and (eventualy) append them to an existing table
in MSSQL.
I could try and export each sheet as a seperate CSV file and then work
with them as text file from some DOS manipulation, but since they are
very long (50,000 rows) it doesn't look easier.
The data in the sheets in arranged according to the target SQL table,
that is columns 1-200 in the first sheet are fields 1-200 in the
table, and columns 1-200 in the second sheet are fields 201-400.
I am new to MSSQL so I don't know how to work with the tables within
SQL, but if you show me the path, I'll follow your leads.

Thanks,
Ilan

"Simon Hayes" <sq*@hayes.ch> wrote in message news:<40**********@news.bluewin.ch>...
"Ilan" <le***@walla.co.il> wrote in message
news:78**************************@posting.google.c om...
Hi all
I need to add data from two Excel sheets (both on the same workbook)
to an existing table in my SQL DB.
The problem is that each sheet holds different fields for the same
record, though the records are sorted by row numbers.
(I had to split the fields to different sheets because Excel has a
limit of 256 fields in each sheet)
My sheets are quite large (~55,000 rows and 200 columns each) and I'll
have to repeat this action many time, so some kind of an automation
will nice (VB, DTS package?)

I am new to SQLServer, so an example will be most helpful.

Many thanks,
Ilan


I don't really understand your description, but you might look at using DTS
to load the data into staging tables, then use TSQL to move the data to the
production tables. That's a common general approach, but without more
specific details of what you're trying to do, perhaps including a more exact
description of the data and the destination table structure, it's not easy
to be more precise.

Simon

Jul 20 '05 #3

P: n/a
Here's how I might go about this:

1) Create the 400-field table on SQL Server. (I don't have an easy way to that
w/o knowing the data). If you don't have reall field names, then use Fld001,
Fld002, Fld003, ... Fld400

2) Add a 401st field, RowID, int NOT NULL

3) Open your Excel workbook, and put the formula "=ROW(A1)" in cell A1 on both
sheets, and fill it down.

4) Insert a row at row 1, and put in the same field names you had in you SQL
table, RowID, Fld001, Fld002, etc

5) Place that Excel workbook either on the SQL Server directly, or on a machine
that the SQL Server can access w/ Admin privs.

6) Import the data (by right-clicking the destination table and choosing All
tasks>import data). When the wizard comes up, choose Excel file as source, and
the destination.

In the next wizard screem choose Use a query to copy data.

In the next screen, add Sheet1 and Sheet2, all fields.

In the next screen (I think it's the sort screen), you can blow by it.

In the next screen, specify the condition as Sheet1.RowID = Sheet2.RowID

The next screen you tell it which table to append to. Pick the destination
table, and finish (run it!)

That should do it...assuming your destination table and your Excel file have
field names exactly the same.
Jul 20 '05 #4

P: n/a
le***@walla.co.il (Ilan) wrote in message news:<78**************************@posting.google. com>...
Simon,
Sorry for not being clear enough.
What I have are two sheets, each holding 200 columns (fields 1-200 at
the first and fields 201-400 at the second), where each row (or
record) starts at the first sheet and continues at the second. All the
data is numeric.
What I need to do is somehow concatenate the data on the corresponding
rows from both sheets and (eventualy) append them to an existing table
in MSSQL.
I could try and export each sheet as a seperate CSV file and then work
with them as text file from some DOS manipulation, but since they are
very long (50,000 rows) it doesn't look easier.
The data in the sheets in arranged according to the target SQL table,
that is columns 1-200 in the first sheet are fields 1-200 in the
table, and columns 1-200 in the second sheet are fields 201-400.
I am new to MSSQL so I don't know how to work with the tables within
SQL, but if you show me the path, I'll follow your leads.

Thanks,
Ilan

"Simon Hayes" <sq*@hayes.ch> wrote in message news:<40**********@news.bluewin.ch>...
"Ilan" <le***@walla.co.il> wrote in message
news:78**************************@posting.google.c om...
Hi all
I need to add data from two Excel sheets (both on the same workbook)
to an existing table in my SQL DB.
The problem is that each sheet holds different fields for the same
record, though the records are sorted by row numbers.
(I had to split the fields to different sheets because Excel has a
limit of 256 fields in each sheet)
My sheets are quite large (~55,000 rows and 200 columns each) and I'll
have to repeat this action many time, so some kind of an automation
will nice (VB, DTS package?)

I am new to SQLServer, so an example will be most helpful.

Many thanks,
Ilan


I don't really understand your description, but you might look at using DTS
to load the data into staging tables, then use TSQL to move the data to the
production tables. That's a common general approach, but without more
specific details of what you're trying to do, perhaps including a more exact
description of the data and the destination table structure, it's not easy
to be more precise.

Simon


First of all, if you have a 400 column table, then you should probably
review your data model - there aren't many situations where there is a
genuine need for such a table. In fact, depending on what your data
is, you might find that you can't do this anyway, since you are
limited to 8060 bytes per row.

But if you really do need to do this, then one possible solution is
along the following lines:

1. Add a column in Excel to identify each row. Make sure that this
correctly 'links' the data on each sheet, ie. row 1 on sheet A is
'continued' on row 1 on sheet B.

2. Create two staging tables, each with 201 columns, for the ID column
plus the 200 data columns.

3. Load the sheets into the staging tables with DTS

4. Insert the data into the final table:

INSERT INTO dbo.Destination
SELECT a.col1, a.col2, ..., a.col200, b.col201, b.col202, ...,
b.col400
FROM dbo.StagingA a
JOIN dbo.Staging B b
ON A.RowID = b.RowID

However, I strongly suggest your review your data model, for the
reasons I mentioned above.

Simon
Jul 20 '05 #5

P: n/a
sq*@hayes.ch (Simon Hayes) wrote in message news:<60**************************@posting.google. com>...
le***@walla.co.il (Ilan) wrote in message news:<78**************************@posting.google. com>...
Simon,
Sorry for not being clear enough.
What I have are two sheets, each holding 200 columns (fields 1-200 at
the first and fields 201-400 at the second), where each row (or
record) starts at the first sheet and continues at the second. All the
data is numeric.
What I need to do is somehow concatenate the data on the corresponding
rows from both sheets and (eventualy) append them to an existing table
in MSSQL.
I could try and export each sheet as a seperate CSV file and then work
with them as text file from some DOS manipulation, but since they are
very long (50,000 rows) it doesn't look easier.
The data in the sheets in arranged according to the target SQL table,
that is columns 1-200 in the first sheet are fields 1-200 in the
table, and columns 1-200 in the second sheet are fields 201-400.
I am new to MSSQL so I don't know how to work with the tables within
SQL, but if you show me the path, I'll follow your leads.

Thanks,
Ilan

"Simon Hayes" <sq*@hayes.ch> wrote in message news:<40**********@news.bluewin.ch>...
"Ilan" <le***@walla.co.il> wrote in message
news:78**************************@posting.google.c om...
> Hi all
> I need to add data from two Excel sheets (both on the same workbook)
> to an existing table in my SQL DB.
> The problem is that each sheet holds different fields for the same
> record, though the records are sorted by row numbers.
> (I had to split the fields to different sheets because Excel has a
> limit of 256 fields in each sheet)
> My sheets are quite large (~55,000 rows and 200 columns each) and I'll
> have to repeat this action many time, so some kind of an automation
> will nice (VB, DTS package?)
>
> I am new to SQLServer, so an example will be most helpful.
>
> Many thanks,
> Ilan

I don't really understand your description, but you might look at using DTS
to load the data into staging tables, then use TSQL to move the data to the
production tables. That's a common general approach, but without more
specific details of what you're trying to do, perhaps including a more exact
description of the data and the destination table structure, it's not easy
to be more precise.

Simon


First of all, if you have a 400 column table, then you should probably
review your data model - there aren't many situations where there is a
genuine need for such a table. In fact, depending on what your data
is, you might find that you can't do this anyway, since you are
limited to 8060 bytes per row.

But if you really do need to do this, then one possible solution is
along the following lines:

1. Add a column in Excel to identify each row. Make sure that this
correctly 'links' the data on each sheet, ie. row 1 on sheet A is
'continued' on row 1 on sheet B.

2. Create two staging tables, each with 201 columns, for the ID column
plus the 200 data columns.

3. Load the sheets into the staging tables with DTS

4. Insert the data into the final table:

INSERT INTO dbo.Destination
SELECT a.col1, a.col2, ..., a.col200, b.col201, b.col202, ...,
b.col400
FROM dbo.StagingA a
JOIN dbo.Staging B b
ON A.RowID = b.RowID

However, I strongly suggest your review your data model, for the
reasons I mentioned above.

Simon


Simon,
I spent the last weekend trying to get Excel to work with this amount
of data, without success. I guess I've reached his limits (there is a
good description of Excel memory limits in:
http://www.decisionmodels.com/memlimitsc.htm , FYI).
So I decided I'll export the data to 10 separate CSV files (each
holding ~50 columns) and then use another application (e.g. MathLab)
to join them back together. Another option is to run all the
calculations in MathLab.
As for the data, what I have is meteorological data every 5 minutes
for 6 months (hence the 52,560 rows). For this data I'm calculating 4
other parameters for every 1 hour, 2 hours, ..., 96 hours (hence the
4x96 plus 20 original columns, giving 404 columns). Then I have a
total of 4 years data for each meteorological station (yielding
420,480 rows) that needs to be multiplied by 15 meteorological
station. Altogether I have over 6,000,000 records and 400 fields to
analyze. The only solution I can think of for this amount of data is
to put it on an SQL DB, and run queries that will be analyzed with a
statistical program (e.g. SPSS, JMP or other). For the record, this is
part of my PhD thesis on air pollution and meteorology in a coastal
region.
If anyone has any experience analyzing this amount of data that he is
willing to share, I'll be happy to learn.

Ilan
Jul 20 '05 #6

P: n/a
Ilan,

You didn't say if the data load procedure that I suggested (and Simon
reiterated, almost word-for-word) got your data into SQL Server.

Now you seem to be asking a different question, like "Gee, what tool should I
be using for this data?"

We can't make that decision unless we know exactly what kinds of calculations
you plan to do. For if you're doing complex math analysis (using advanced
statistical models, complex numbers, or hyperbolic trigonometry), then you're
probably better-off with a product like MathLab.

If you're just doing normal stats (Avg, StDev, basic trig), then SQL Server
will shine.

Now, about your data model:

<<As for the data, what I have is meteorological data every 5 minutes
for 6 months (hence the 52,560 rows). For this data I'm calculating 4
other parameters for every 1 hour, 2 hours, ..., 96 hours (hence the
4x96 plus 20 original columns, giving 404 columns>>

OK, so you have 20 values coming in every 5 minutes, and then 4 calculated
values every hour up to 96 hours. I have to assume that your 4 calucalted
values are "rolling" calucaltions; otherwise, your model would contain 4X96
duplicate columns of data every 12 rows...

....do I have it right so far?

Without knowing how the data is going to be used, it's hard to sugeest a good
data model. Maybe the one-table is good because you know you're always showing
all 400+ fields in your reports! (I don't know how any human could
intelligently parse such printed data, but meteorologists might be special...)

More likely, you're pulling reports like this: Show the 24/48/72/96 hour trends
for these 3 locations from May to August for [special calculation 1 of the 4].
That's a 12-cell report...much more manageable, useful and meaningful. If
that's more like how the data will be used, then you might break the massive
table into 5 tables:

Table1: The 20 fields
Table2: 96 fields for the first calc (+ the key field(s) from Table1)
Table3: 96 fields for the 2nd calc (+ the key field(s) from Table1)
Table4: 96 fields for the 3rd calc (+ the key field(s) from Table1)
Table5: 96 fields for the 4th calc (+ the key field(s) from Table1)

Another approach would be to ditch the 96-field idea altogether, instead using
4X96 RECORDS in a realted table, using sound relational models, just in case
your 4 calc fields becomes 5 some day! And what if your 20 fields becomes 21,
22?

That might be ultra-ugly in your case, because you'd have a parent table with 6
million records, and a child table with 2.4 BILLION records. (ea of the 6 mil
would have ~400 child records)

I only sugest that as a matter of completeness...a true relational approach
probably ISN'T what you need. Technically, the relational model dictates that
you'd have 120,000,000 Measurement records (6 Mil X 20), with everything else
(the 4X96) being calcualted as needed! That's the absolute most-flexible way to
do this, because it doesn't care how many original (20) or calc (4) fields
there are.

But it might also be the SLOWEST execution. (if your calcs are just averages,
that's not a problem)

Anyway, I'm just throwing all of this out as food for thought. This much I
know:

SQL Server can certainly load the data from those Excel files...
Jul 20 '05 #7

P: n/a
dc****@aol.comSPNOAM (DCM Fan) wrote in message news:<20***************************@mb-m19.aol.com>...
Ilan,

You didn't say if the data load procedure that I suggested (and Simon
reiterated, almost word-for-word) got your data into SQL Server.
I did try your procedure with a sample file and it works greate,
except for two problems: First, Excel couldn't handle the calculations
of files this size (it stopped responding), so obviously I cannot use
it for my needs (I only found this limit AFTER posting my original
message). Second, I have to repeat this action many times, so I'm
hopping for a way to automate the proccess like using VB.
Now you seem to be asking a different question, like "Gee, what tool should I
be using for this data?"

We can't make that decision unless we know exactly what kinds of calculations
you plan to do. For if you're doing complex math analysis (using advanced
statistical models, complex numbers, or hyperbolic trigonometry), then you're
probably better-off with a product like MathLab.

If you're just doing normal stats (Avg, StDev, basic trig), then SQL Server
will shine.
Well, this is why I wanted to use Excel in the first place. I am
calculating some basic trig (sin, cos) and algebra both on multiple
rows (e.g. every hour) and on columns. This is easier to do on a
Spreadsheet or a matrix (hence MatLab) then on a Database.
Now, about your data model:

<<As for the data, what I have is meteorological data every 5 minutes
for 6 months (hence the 52,560 rows). For this data I'm calculating 4
other parameters for every 1 hour, 2 hours, ..., 96 hours (hence the
4x96 plus 20 original columns, giving 404 columns>>

OK, so you have 20 values coming in every 5 minutes, and then 4 calculated
values every hour up to 96 hours. I have to assume that your 4 calucalted
values are "rolling" calucaltions; otherwise, your model would contain 4X96
duplicate columns of data every 12 rows...

...do I have it right so far?
Yes.
Without knowing how the data is going to be used, it's hard to sugeest a good
data model. Maybe the one-table is good because you know you're always showing
all 400+ fields in your reports! (I don't know how any human could
intelligently parse such printed data, but meteorologists might be special...)
Thats where 3D graphics come in, displaying the data in such a way
that a human eye can SEE it and make some sense of it all.
More likely, you're pulling reports like this: Show the 24/48/72/96 hour trends
for these 3 locations from May to August for [special calculation 1 of the 4].
That's a 12-cell report...much more manageable, useful and meaningful. If
that's more like how the data will be used, then you might break the massive
table into 5 tables:

Table1: The 20 fields
Table2: 96 fields for the first calc (+ the key field(s) from Table1)
Table3: 96 fields for the 2nd calc (+ the key field(s) from Table1)
Table4: 96 fields for the 3rd calc (+ the key field(s) from Table1)
Table5: 96 fields for the 4th calc (+ the key field(s) from Table1)
Yes, I will also be making these kind of analysis/queries on the data,
but I'm not sure I understand why 5 tables are better then putting all
the data in one. This is another issue I know I need the advise of
experts.
Another approach would be to ditch the 96-field idea altogether, instead using
4X96 RECORDS in a realted table, using sound relational models, just in case
your 4 calc fields becomes 5 some day! And what if your 20 fields becomes 21,
22?


Hmmm. Well, this is certainly a food for thought, and I see I'll have
to re-think the way I want to use the database and what kind of data
I'll need to pull out of it.
On a second thought, this is one of the attraction of doing pure
research: you don't always know what kind of answers you'll get, and
what new questions these answers will raise.

Many thanks,
Ilan
PS, I'll be doing this thinking in the next week in Paris, France on
vacation. Please excuse if I don't post replies.
Jul 20 '05 #8

P: n/a
<<First, Excel couldn't handle the calculations
of files this size (it stopped responding), so obviously I cannot use
it for my needs (I only found this limit AFTER posting my original
message). Second, I have to repeat this action many times, so I'm
hopping for a way to automate the proccess like using VB.>>

Oh, so you still need the Excel files, but you also need a way to load
historical data into SQL Server ultimately. I was under the impression that
Excel would go away completely....

<<Well, this is why I wanted to use Excel in the first place. I am
calculating some basic trig (sin, cos) and algebra both on multiple
rows (e.g. every hour) and on columns. This is easier to do on a
Spreadsheet or a matrix (hence MatLab) then on a Database.>>

Yes, easier for humans b/c of its 2-dimensional element. Databases are more
3-D-ish, and a bit more difficult to grasp AT FIRST, but once you see the
light, you'll wish you had started in SQL. The programming language of SQL
Server (T-SQL) will give you ever set operation you need....

<<Thats where 3D graphics come in, displaying the data in such a way
that a human eye can SEE it and make some sense of it all.>>

No doubt....then you're obviously going to be using some other tool for that!

<<Yes, I will also be making these kind of analysis/queries on the data,
but I'm not sure I understand why 5 tables are better then putting all
the data in one. This is another issue I know I need the advise of
experts.>>

Well, if you're the only person who's going to use this tool, it might not
matter. But if you're going to build a web front-end where hundreds of users
can query the data and get reports, then it makes sense to break it up. If you
ever need data from all 5 tables, great, join them on the key fields. But if
you don't, then you can save processing time by limiting the domain to the
target calcuations.

<<On a second thought, this is one of the attraction of doing pure
research: you don't always know what kind of answers you'll get, and
what new questions these answers will raise.>>

But it seems that the technology is actually getting in the way of your
research...

Jul 20 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.