473,386 Members | 2,129 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

BCP Order on SQL Server

I have two SQL Server 2000 machines. The same file is sent nightly to
each server and a stored proc uses BULK INSERT to load it into a
staging table for processing.

Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
column appended to it. (I need this identity column to group by later
on to remove duplicates.)

ie

select tempo.*,
IDENTITY(int, 1,1) AS ID_Num
into #test1
from tempExtract tempo

My question is : can I expect the ID_Num and the corresponding line of
the file copied to the table to be the same on each server? Ie will
each BCP into the staging table occur in the same order on both
servers given that the file, the BULK INSERT command and the indexes
are the same on each server.
Jul 20 '05 #1
10 4473
Thomas Richards (to**********@rocketmail.com) writes:
I have two SQL Server 2000 machines. The same file is sent nightly to
each server and a stored proc uses BULK INSERT to load it into a
staging table for processing.

Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
column appended to it. (I need this identity column to group by later
on to remove duplicates.)

ie

select tempo.*,
IDENTITY(int, 1,1) AS ID_Num
into #test1
from tempExtract tempo

My question is : can I expect the ID_Num and the corresponding line of
the file copied to the table to be the same on each server? Ie will
each BCP into the staging table occur in the same order on both
servers given that the file, the BULK INSERT command and the indexes
are the same on each server.


No, you would need to have the identity column on the table you load
the file into. I don't know for sure that you can trust the IDENTITY
value to match the input file exactly, and if it works, it is likely
to by mere chance. That is, there is no committment from Microsoft
that it should work, and it could change in a future version of SQL
Server.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #2
Erland Sommarskog <es****@sommarskog.se> wrote in message news:<Xn**********************@127.0.0.1>...
Thomas Richards (to**********@rocketmail.com) writes:
I have two SQL Server 2000 machines. The same file is sent nightly to
each server and a stored proc uses BULK INSERT to load it into a
staging table for processing.

Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
column appended to it. (I need this identity column to group by later
on to remove duplicates.)

ie

select tempo.*,
IDENTITY(int, 1,1) AS ID_Num
into #test1
from tempExtract tempo

My question is : can I expect the ID_Num and the corresponding line of
the file copied to the table to be the same on each server? Ie will
each BCP into the staging table occur in the same order on both
servers given that the file, the BULK INSERT command and the indexes
are the same on each server.


No, you would need to have the identity column on the table you load
the file into. I don't know for sure that you can trust the IDENTITY
value to match the input file exactly, and if it works, it is likely
to by mere chance. That is, there is no committment from Microsoft
that it should work, and it could change in a future version of SQL
Server.


Thanks for that. The problem that I'm trying to get round is that I
have a key field and then one or more addresses. The key field and the
fields that make up the address are all chars/varchars. I want to pick
one arbitrary address to associate with the key and put in another
table. There are no business rules (eg always take the one with the
lowest street number) that will always identify just one of the
addresses. Originally, I thought group by key and line number and pick
the one with the lowest number. I would prefer to do this as it would
get the first entry in the file which would more than likely give me
the better address details. However as you've pointed out I can't
depend on the order when bcp'ed in. Is there any other way to do this
or would I have to get line number added to the file before SQL Server
processes it?
Jul 20 '05 #3
Thomas Richards (to**********@rocketmail.com) writes:
Thanks for that. The problem that I'm trying to get round is that I
have a key field and then one or more addresses. The key field and the
fields that make up the address are all chars/varchars. I want to pick
one arbitrary address to associate with the key and put in another
table. There are no business rules (eg always take the one with the
lowest street number) that will always identify just one of the
addresses. Originally, I thought group by key and line number and pick
the one with the lowest number. I would prefer to do this as it would
get the first entry in the file which would more than likely give me
the better address details. However as you've pointed out I can't
depend on the order when bcp'ed in. Is there any other way to do this
or would I have to get line number added to the file before SQL Server
processes it?


The only way to be sure is to add the line numbers yourself. This can be
done in two ways: 1) Manipulate the file, by running it through a program
that adds a line number. 2) Instead of writing a to file, have the program
to insert the data. In fact, you can still use bulk load, but you would
bulk from variables, using the BCP API.

However, BULK INSERT into a table with an IDENTITY gives you fairly good
odds, and as I understand your case, it does not seem to be a disaster,
if number is not what you expect. So I would go for that.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #4
Thanks,

I'm going to have problems adding the line numbers to the file unless
there's an easy way to do it that uses windows built-in functionality.
Unfortunately the file comes from a mainframe extract so the I can't
change the program either.

Just to be clear about what I'm doing, I bulk insert into a table that
has a KEY field and then one or more addresses eg:

KEY, ADDRESS1, ADDRESS2
-----------------------------
FRED, HOG STREET, HOGLAND
FRED, HOG STREET, HOGLANDIO

I need to take the key (ie FRED) and one address (doesn't matter
which) and put it into another table. However, I have to get the same
address on each server. That's more important than trying to get the
first one in the file.

I'm going to try out the bulk insert with identity. If I put a
clustered key on the table that is bulk inserted to on all columns, I
would assume that would force the order in the table to be the same on
both servers - what do you think?

Cheers
Tom
The only way to be sure is to add the line numbers yourself. This can be
done in two ways: 1) Manipulate the file, by running it through a program
that adds a line number. 2) Instead of writing a to file, have the program
to insert the data. In fact, you can still use bulk load, but you would
bulk from variables, using the BCP API.

However, BULK INSERT into a table with an IDENTITY gives you fairly good
odds, and as I understand your case, it does not seem to be a disaster,
if number is not what you expect. So I would go for that.

Jul 20 '05 #5
My latest thinking on this is to create a table the same as the table
holding the 'key' and address components but with an extra identity
type field.

Then insert into this table ordering by key + all columns. This will
force the sequence number to match the same row on each server and the
'key' fields to be sequentially next to each other. Then I can do a
group by, picking up the lowest sequence number.

eg

SELECT KEY,
ADDRESS1,
ADDRESS2,
IDENTITY(int,1,1) as Seq
INTO tempTable
FROM tempExtract
WHERE 1=2

INSERT INTO tempTable
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempExtract
ORDER BY KEY,
ADDRESS1,
ADDRESS2

-- Finally get a key with just one address
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempTable
WHERE SEQ = (SELECT MIN (Seq)
FROM tempTable sub
WHERE sub.KEY = tempTable.KEY)

Can you see any holes in that?!

Cheers
Tom
Jul 20 '05 #6
Thomas Richards (to**********@rocketmail.com) writes:
I'm going to have problems adding the line numbers to the file unless
there's an easy way to do it that uses windows built-in functionality.
Adding such line numbers is a very simple program that can be written
VBscript, Perl, C or whatever your preference is. The one catch is that
this is not very effcient if the file is huge.
I'm going to try out the bulk insert with identity. If I put a
clustered key on the table that is bulk inserted to on all columns, I
would assume that would force the order in the table to be the same on
both servers - what do you think?
What matters is the order that the rows hit the tables. My guess is that
a completely indexless table is better.
INSERT INTO tempTable
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempExtract
ORDER BY KEY,
ADDRESS1,
ADDRESS2
...

Can you see any holes in that?!


The sad case is that neither is there any guarantee with an INSERT
statement that the IDENITTY value will reflect the ORDER BY statement.
But if you add OPTION (MAXDOP 1) to the query it usually works. MAXDOP 1
turns off parallelism which is the major reason the ORDER BY gets messed up.

But maybe you should rethink completely. Maybe you should bulk into one
server, remove the duplicates, and the replicate the result to the
second server. This could be done by a linked query, or bulking out and
in again.


--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #7
My latest thinking on this is to create a table the same as the table
holding the 'key' and address components but with an extra identity
type field.

Then insert into this table ordering by key + all columns. This will
force the sequence number to match the same row on each server and the
'key' fields to be sequentially next to each other. Then I can do a
group by, picking up the lowest sequence number.

eg

SELECT KEY,
ADDRESS1,
ADDRESS2,
IDENTITY(int,1,1) as Seq
INTO tempTable
FROM tempExtract
WHERE 1=2

INSERT INTO tempTable
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempExtract
ORDER BY KEY,
ADDRESS1,
ADDRESS2

-- Finally get a key with just one address
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempTable
WHERE SEQ = (SELECT MIN (Seq)
FROM tempTable sub
WHERE sub.KEY = tempTable.KEY)

Can you see any holes in that?!

Cheers
Tom
Jul 20 '05 #8
Thomas Richards (to**********@rocketmail.com) writes:
My latest thinking on this is to create a table the same as the table
holding the 'key' and address components but with an extra identity
type field.

Then insert into this table ordering by key + all columns. This will
force the sequence number to match the same row on each server and the
'key' fields to be sequentially next to each other. Then I can do a
group by, picking up the lowest sequence number.
...


That seems to the same suggestion, to which I answered once, so I
simply repear that answer:

The sad case is that neither is there any guarantee with an INSERT
statement that the IDENITTY value will reflect the ORDER BY statement.
But if you add OPTION (MAXDOP 1) to the query it usually works. MAXDOP 1
turns off parallelism which is the major reason the ORDER BY gets messed up.

But maybe you should rethink completely. Maybe you should bulk into one
server, remove the duplicates, and the replicate the result to the
second server. This could be done by a linked query, or bulking out and
in again.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #9
Sorry, what I meant to post was:

I'm thinking of doing the following to ensure I get the same row on
each server (assuming identical collations).
It is supposed to only bring back the row that matches the TOP value
of all the fields concatenated. I've tested it on identical servers
and it appears to work and the theory seems fine to me. Can you see
anything wrong with this??

Thanks
Tom

SELECT KEY,
ADDRESS1,
ADDRESS2
FROM TABLE main
WHERE ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
(SELECT TOP 1 ISNULL(KEY, 'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
FROM TABLE sub
WHERE sub.KEY = main.KEY
ORDER BY ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z'))
Jul 20 '05 #10
Thomas Richards (to**********@rocketmail.com) writes:
I'm thinking of doing the following to ensure I get the same row on
each server (assuming identical collations).
It is supposed to only bring back the row that matches the TOP value
of all the fields concatenated. I've tested it on identical servers
and it appears to work and the theory seems fine to me. Can you see
anything wrong with this??

Thanks
Tom

SELECT KEY,
ADDRESS1,
ADDRESS2
FROM TABLE main
WHERE ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
(SELECT TOP 1 ISNULL(KEY, 'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
FROM TABLE sub
WHERE sub.KEY = main.KEY
ORDER BY ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z'))


You could get duplicates if you have some really weird data which
gives the same result for two concatenations, but I guess that is
a calculated risk. And you would get the same duplicates on both
servers.

Ah, there is one more catch - you must make sure that both databases
have the same collation. But since you can specify the collation per
column when you create the table, you can take care of that.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 20 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Bennett Haselton | last post by:
I want to get an ASP.Net hosting account with my ISP, and I'm trying to find out what level of access to the server is requried in order for me to view the server in Server Explorer in Visual...
4
by: Tryfon Gavriel | last post by:
Hi all I recently noticed when trying to optimise a major query of a chess website I am the webmaster of, that adding an order by for "gamenumber" which is a clustered index field as in for...
8
by: Bill Sonia | last post by:
I've written a Windows Service to send e-mails on events like OnStart, OnStop, OnShutDown using System.Web.Mail. It works for everything but OnShutdown. My guess is that for OnShutDown, once my...
0
by: Mamatha | last post by:
Hi i have a code to retrieve table structures of SQL Server database. It gives all the table names in particular databse with the column names and datatypes also. But the output displays not in...
104
by: Beowulf | last post by:
I have the view below and if I use vwRouteReference as the rowsource for a combo box in an MS Access form or run "SELECT * FROM vwRouteReference" in SQL Query Analyzer, the rows don't come through...
13
by: bevanward | last post by:
Hi All I am finding unexpected results when inserted into a newly created table that has a field of datatype int identity (1,1). Basically the order I sort on when inserting into the table is...
24
by: Hurricane | last post by:
When I create a view in SQL and include an ORDER BY clause i can see it in Management Studio. However, when I call the same view from an ASP page the order goes completely haywire. Any ideas?
10
by: Sharif Islam | last post by:
this is my query= "SELECT i.itemid,title,SortKey from Items AS i JOIN Links AS L ON (i.ItemID=L.ItemID) WHERE L.instructorID='12232' AND courseID='12' AND type='Audio' order by CASE WHEN Sortkey...
3
by: Shum | last post by:
Hi .. another quey... i just noticed that the entries being done to the sql server tables ( insert query in C# )are not in the correct order.. the last entry is not always the last record in the...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.