By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,361 Members | 1,980 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,361 IT Pros & Developers. It's quick & easy.

BCP Order on SQL Server

P: n/a
I have two SQL Server 2000 machines. The same file is sent nightly to
each server and a stored proc uses BULK INSERT to load it into a
staging table for processing.

Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
column appended to it. (I need this identity column to group by later
on to remove duplicates.)

ie

select tempo.*,
IDENTITY(int, 1,1) AS ID_Num
into #test1
from tempExtract tempo

My question is : can I expect the ID_Num and the corresponding line of
the file copied to the table to be the same on each server? Ie will
each BCP into the staging table occur in the same order on both
servers given that the file, the BULK INSERT command and the indexes
are the same on each server.
Jul 20 '05 #1
Share this Question
Share on Google+
10 Replies


P: n/a
Thomas Richards (to**********@rocketmail.com) writes:
I have two SQL Server 2000 machines. The same file is sent nightly to
each server and a stored proc uses BULK INSERT to load it into a
staging table for processing.

Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
column appended to it. (I need this identity column to group by later
on to remove duplicates.)

ie

select tempo.*,
IDENTITY(int, 1,1) AS ID_Num
into #test1
from tempExtract tempo

My question is : can I expect the ID_Num and the corresponding line of
the file copied to the table to be the same on each server? Ie will
each BCP into the staging table occur in the same order on both
servers given that the file, the BULK INSERT command and the indexes
are the same on each server.


No, you would need to have the identity column on the table you load
the file into. I don't know for sure that you can trust the IDENTITY
value to match the input file exactly, and if it works, it is likely
to by mere chance. That is, there is no committment from Microsoft
that it should work, and it could change in a future version of SQL
Server.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #2

P: n/a
Erland Sommarskog <es****@sommarskog.se> wrote in message news:<Xn**********************@127.0.0.1>...
Thomas Richards (to**********@rocketmail.com) writes:
I have two SQL Server 2000 machines. The same file is sent nightly to
each server and a stored proc uses BULK INSERT to load it into a
staging table for processing.

Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
column appended to it. (I need this identity column to group by later
on to remove duplicates.)

ie

select tempo.*,
IDENTITY(int, 1,1) AS ID_Num
into #test1
from tempExtract tempo

My question is : can I expect the ID_Num and the corresponding line of
the file copied to the table to be the same on each server? Ie will
each BCP into the staging table occur in the same order on both
servers given that the file, the BULK INSERT command and the indexes
are the same on each server.


No, you would need to have the identity column on the table you load
the file into. I don't know for sure that you can trust the IDENTITY
value to match the input file exactly, and if it works, it is likely
to by mere chance. That is, there is no committment from Microsoft
that it should work, and it could change in a future version of SQL
Server.


Thanks for that. The problem that I'm trying to get round is that I
have a key field and then one or more addresses. The key field and the
fields that make up the address are all chars/varchars. I want to pick
one arbitrary address to associate with the key and put in another
table. There are no business rules (eg always take the one with the
lowest street number) that will always identify just one of the
addresses. Originally, I thought group by key and line number and pick
the one with the lowest number. I would prefer to do this as it would
get the first entry in the file which would more than likely give me
the better address details. However as you've pointed out I can't
depend on the order when bcp'ed in. Is there any other way to do this
or would I have to get line number added to the file before SQL Server
processes it?
Jul 20 '05 #3

P: n/a
Thomas Richards (to**********@rocketmail.com) writes:
Thanks for that. The problem that I'm trying to get round is that I
have a key field and then one or more addresses. The key field and the
fields that make up the address are all chars/varchars. I want to pick
one arbitrary address to associate with the key and put in another
table. There are no business rules (eg always take the one with the
lowest street number) that will always identify just one of the
addresses. Originally, I thought group by key and line number and pick
the one with the lowest number. I would prefer to do this as it would
get the first entry in the file which would more than likely give me
the better address details. However as you've pointed out I can't
depend on the order when bcp'ed in. Is there any other way to do this
or would I have to get line number added to the file before SQL Server
processes it?


The only way to be sure is to add the line numbers yourself. This can be
done in two ways: 1) Manipulate the file, by running it through a program
that adds a line number. 2) Instead of writing a to file, have the program
to insert the data. In fact, you can still use bulk load, but you would
bulk from variables, using the BCP API.

However, BULK INSERT into a table with an IDENTITY gives you fairly good
odds, and as I understand your case, it does not seem to be a disaster,
if number is not what you expect. So I would go for that.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #4

P: n/a
Thanks,

I'm going to have problems adding the line numbers to the file unless
there's an easy way to do it that uses windows built-in functionality.
Unfortunately the file comes from a mainframe extract so the I can't
change the program either.

Just to be clear about what I'm doing, I bulk insert into a table that
has a KEY field and then one or more addresses eg:

KEY, ADDRESS1, ADDRESS2
-----------------------------
FRED, HOG STREET, HOGLAND
FRED, HOG STREET, HOGLANDIO

I need to take the key (ie FRED) and one address (doesn't matter
which) and put it into another table. However, I have to get the same
address on each server. That's more important than trying to get the
first one in the file.

I'm going to try out the bulk insert with identity. If I put a
clustered key on the table that is bulk inserted to on all columns, I
would assume that would force the order in the table to be the same on
both servers - what do you think?

Cheers
Tom
The only way to be sure is to add the line numbers yourself. This can be
done in two ways: 1) Manipulate the file, by running it through a program
that adds a line number. 2) Instead of writing a to file, have the program
to insert the data. In fact, you can still use bulk load, but you would
bulk from variables, using the BCP API.

However, BULK INSERT into a table with an IDENTITY gives you fairly good
odds, and as I understand your case, it does not seem to be a disaster,
if number is not what you expect. So I would go for that.

Jul 20 '05 #5

P: n/a
My latest thinking on this is to create a table the same as the table
holding the 'key' and address components but with an extra identity
type field.

Then insert into this table ordering by key + all columns. This will
force the sequence number to match the same row on each server and the
'key' fields to be sequentially next to each other. Then I can do a
group by, picking up the lowest sequence number.

eg

SELECT KEY,
ADDRESS1,
ADDRESS2,
IDENTITY(int,1,1) as Seq
INTO tempTable
FROM tempExtract
WHERE 1=2

INSERT INTO tempTable
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempExtract
ORDER BY KEY,
ADDRESS1,
ADDRESS2

-- Finally get a key with just one address
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempTable
WHERE SEQ = (SELECT MIN (Seq)
FROM tempTable sub
WHERE sub.KEY = tempTable.KEY)

Can you see any holes in that?!

Cheers
Tom
Jul 20 '05 #6

P: n/a
Thomas Richards (to**********@rocketmail.com) writes:
I'm going to have problems adding the line numbers to the file unless
there's an easy way to do it that uses windows built-in functionality.
Adding such line numbers is a very simple program that can be written
VBscript, Perl, C or whatever your preference is. The one catch is that
this is not very effcient if the file is huge.
I'm going to try out the bulk insert with identity. If I put a
clustered key on the table that is bulk inserted to on all columns, I
would assume that would force the order in the table to be the same on
both servers - what do you think?
What matters is the order that the rows hit the tables. My guess is that
a completely indexless table is better.
INSERT INTO tempTable
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempExtract
ORDER BY KEY,
ADDRESS1,
ADDRESS2
...

Can you see any holes in that?!


The sad case is that neither is there any guarantee with an INSERT
statement that the IDENITTY value will reflect the ORDER BY statement.
But if you add OPTION (MAXDOP 1) to the query it usually works. MAXDOP 1
turns off parallelism which is the major reason the ORDER BY gets messed up.

But maybe you should rethink completely. Maybe you should bulk into one
server, remove the duplicates, and the replicate the result to the
second server. This could be done by a linked query, or bulking out and
in again.


--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #7

P: n/a
My latest thinking on this is to create a table the same as the table
holding the 'key' and address components but with an extra identity
type field.

Then insert into this table ordering by key + all columns. This will
force the sequence number to match the same row on each server and the
'key' fields to be sequentially next to each other. Then I can do a
group by, picking up the lowest sequence number.

eg

SELECT KEY,
ADDRESS1,
ADDRESS2,
IDENTITY(int,1,1) as Seq
INTO tempTable
FROM tempExtract
WHERE 1=2

INSERT INTO tempTable
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempExtract
ORDER BY KEY,
ADDRESS1,
ADDRESS2

-- Finally get a key with just one address
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempTable
WHERE SEQ = (SELECT MIN (Seq)
FROM tempTable sub
WHERE sub.KEY = tempTable.KEY)

Can you see any holes in that?!

Cheers
Tom
Jul 20 '05 #8

P: n/a
Thomas Richards (to**********@rocketmail.com) writes:
My latest thinking on this is to create a table the same as the table
holding the 'key' and address components but with an extra identity
type field.

Then insert into this table ordering by key + all columns. This will
force the sequence number to match the same row on each server and the
'key' fields to be sequentially next to each other. Then I can do a
group by, picking up the lowest sequence number.
...


That seems to the same suggestion, to which I answered once, so I
simply repear that answer:

The sad case is that neither is there any guarantee with an INSERT
statement that the IDENITTY value will reflect the ORDER BY statement.
But if you add OPTION (MAXDOP 1) to the query it usually works. MAXDOP 1
turns off parallelism which is the major reason the ORDER BY gets messed up.

But maybe you should rethink completely. Maybe you should bulk into one
server, remove the duplicates, and the replicate the result to the
second server. This could be done by a linked query, or bulking out and
in again.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #9

P: n/a
Sorry, what I meant to post was:

I'm thinking of doing the following to ensure I get the same row on
each server (assuming identical collations).
It is supposed to only bring back the row that matches the TOP value
of all the fields concatenated. I've tested it on identical servers
and it appears to work and the theory seems fine to me. Can you see
anything wrong with this??

Thanks
Tom

SELECT KEY,
ADDRESS1,
ADDRESS2
FROM TABLE main
WHERE ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
(SELECT TOP 1 ISNULL(KEY, 'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
FROM TABLE sub
WHERE sub.KEY = main.KEY
ORDER BY ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z'))
Jul 20 '05 #10

P: n/a
Thomas Richards (to**********@rocketmail.com) writes:
I'm thinking of doing the following to ensure I get the same row on
each server (assuming identical collations).
It is supposed to only bring back the row that matches the TOP value
of all the fields concatenated. I've tested it on identical servers
and it appears to work and the theory seems fine to me. Can you see
anything wrong with this??

Thanks
Tom

SELECT KEY,
ADDRESS1,
ADDRESS2
FROM TABLE main
WHERE ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
(SELECT TOP 1 ISNULL(KEY, 'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
FROM TABLE sub
WHERE sub.KEY = main.KEY
ORDER BY ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z'))


You could get duplicates if you have some really weird data which
gives the same result for two concatenations, but I guess that is
a calculated risk. And you would get the same duplicates on both
servers.

Ah, there is one more catch - you must make sure that both databases
have the same collation. But since you can specify the collation per
column when you create the table, you can take care of that.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #11

This discussion thread is closed

Replies have been disabled for this discussion.