BCP Order on SQL Server

Thomas Richards

I have two SQL Server 2000 machines. The same file is sent nightly to
each server and a stored proc uses BULK INSERT to load it into a
staging table for processing.

Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
column appended to it. (I need this identity column to group by later
on to remove duplicates.)

ie

select tempo.*,
IDENTITY(int, 1,1) AS ID_Num
into #test1
from tempExtract tempo

My question is : can I expect the ID_Num and the corresponding line of
the file copied to the table to be the same on each server? Ie will
each BCP into the staging table occur in the same order on both
servers given that the file, the BULK INSERT command and the indexes
are the same on each server.

Jul 20 '05 #1

Subscribe Post Reply

4473

Erland Sommarskog

Thomas Richards (to**********@rocketmail.com) writes:

I have two SQL Server 2000 machines. The same file is sent nightly to
each server and a stored proc uses BULK INSERT to load it into a
staging table for processing.

Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
column appended to it. (I need this identity column to group by later
on to remove duplicates.)

ie

select tempo.*,
IDENTITY(int, 1,1) AS ID_Num
into #test1
from tempExtract tempo

My question is : can I expect the ID_Num and the corresponding line of
the file copied to the table to be the same on each server? Ie will
each BCP into the staging table occur in the same order on both
servers given that the file, the BULK INSERT command and the indexes
are the same on each server.

No, you would need to have the identity column on the table you load
the file into. I don't know for sure that you can trust the IDENTITY
value to match the input file exactly, and if it works, it is likely
to by mere chance. That is, there is no committment from Microsoft
that it should work, and it could change in a future version of SQL
Server.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #2

Thomas Richards

Erland Sommarskog <es****@sommarskog.se> wrote in message news:<Xn**********************@127.0.0.1>...

Thomas Richards (to**********@rocketmail.com) writes:
I have two SQL Server 2000 machines. The same file is sent nightly to
each server and a stored proc uses BULK INSERT to load it into a
staging table for processing.

Once I've bcp'ed it in, I put it into a temp table with an IDENTITY
column appended to it. (I need this identity column to group by later
on to remove duplicates.)

ie

select tempo.*,
IDENTITY(int, 1,1) AS ID_Num
into #test1
from tempExtract tempo

My question is : can I expect the ID_Num and the corresponding line of
the file copied to the table to be the same on each server? Ie will
each BCP into the staging table occur in the same order on both
servers given that the file, the BULK INSERT command and the indexes
are the same on each server.

No, you would need to have the identity column on the table you load
the file into. I don't know for sure that you can trust the IDENTITY
value to match the input file exactly, and if it works, it is likely
to by mere chance. That is, there is no committment from Microsoft
that it should work, and it could change in a future version of SQL
Server.

Thanks for that. The problem that I'm trying to get round is that I
have a key field and then one or more addresses. The key field and the
fields that make up the address are all chars/varchars. I want to pick
one arbitrary address to associate with the key and put in another
table. There are no business rules (eg always take the one with the
lowest street number) that will always identify just one of the
addresses. Originally, I thought group by key and line number and pick
the one with the lowest number. I would prefer to do this as it would
get the first entry in the file which would more than likely give me
the better address details. However as you've pointed out I can't
depend on the order when bcp'ed in. Is there any other way to do this
or would I have to get line number added to the file before SQL Server
processes it?

Jul 20 '05 #3

Erland Sommarskog

Thomas Richards (to**********@rocketmail.com) writes:

Thanks for that. The problem that I'm trying to get round is that I
have a key field and then one or more addresses. The key field and the
fields that make up the address are all chars/varchars. I want to pick
one arbitrary address to associate with the key and put in another
table. There are no business rules (eg always take the one with the
lowest street number) that will always identify just one of the
addresses. Originally, I thought group by key and line number and pick
the one with the lowest number. I would prefer to do this as it would
get the first entry in the file which would more than likely give me
the better address details. However as you've pointed out I can't
depend on the order when bcp'ed in. Is there any other way to do this
or would I have to get line number added to the file before SQL Server
processes it?

The only way to be sure is to add the line numbers yourself. This can be
done in two ways: 1) Manipulate the file, by running it through a program
that adds a line number. 2) Instead of writing a to file, have the program
to insert the data. In fact, you can still use bulk load, but you would
bulk from variables, using the BCP API.

However, BULK INSERT into a table with an IDENTITY gives you fairly good
odds, and as I understand your case, it does not seem to be a disaster,
if number is not what you expect. So I would go for that.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #4

Thomas Richards

Thanks,

I'm going to have problems adding the line numbers to the file unless
there's an easy way to do it that uses windows built-in functionality.
Unfortunately the file comes from a mainframe extract so the I can't
change the program either.

Just to be clear about what I'm doing, I bulk insert into a table that
has a KEY field and then one or more addresses eg:

KEY, ADDRESS1, ADDRESS2
-----------------------------
FRED, HOG STREET, HOGLAND
FRED, HOG STREET, HOGLANDIO

I need to take the key (ie FRED) and one address (doesn't matter
which) and put it into another table. However, I have to get the same
address on each server. That's more important than trying to get the
first one in the file.

I'm going to try out the bulk insert with identity. If I put a
clustered key on the table that is bulk inserted to on all columns, I
would assume that would force the order in the table to be the same on
both servers - what do you think?

Cheers
Tom

The only way to be sure is to add the line numbers yourself. This can be
done in two ways: 1) Manipulate the file, by running it through a program
that adds a line number. 2) Instead of writing a to file, have the program
to insert the data. In fact, you can still use bulk load, but you would
bulk from variables, using the BCP API.

However, BULK INSERT into a table with an IDENTITY gives you fairly good
odds, and as I understand your case, it does not seem to be a disaster,
if number is not what you expect. So I would go for that.

Jul 20 '05 #5

Thomas Richards

My latest thinking on this is to create a table the same as the table
holding the 'key' and address components but with an extra identity
type field.

Then insert into this table ordering by key + all columns. This will
force the sequence number to match the same row on each server and the
'key' fields to be sequentially next to each other. Then I can do a
group by, picking up the lowest sequence number.

eg

SELECT KEY,
ADDRESS1,
ADDRESS2,
IDENTITY(int,1,1) as Seq
INTO tempTable
FROM tempExtract
WHERE 1=2

INSERT INTO tempTable
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempExtract
ORDER BY KEY,
ADDRESS1,
ADDRESS2

-- Finally get a key with just one address
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempTable
WHERE SEQ = (SELECT MIN (Seq)
FROM tempTable sub
WHERE sub.KEY = tempTable.KEY)

Can you see any holes in that?!

Cheers
Tom

Jul 20 '05 #6

Erland Sommarskog

Thomas Richards (to**********@rocketmail.com) writes:

I'm going to have problems adding the line numbers to the file unless
there's an easy way to do it that uses windows built-in functionality.
Adding such line numbers is a very simple program that can be written
VBscript, Perl, C or whatever your preference is. The one catch is that
this is not very effcient if the file is huge.
I'm going to try out the bulk insert with identity. If I put a
clustered key on the table that is bulk inserted to on all columns, I
would assume that would force the order in the table to be the same on
both servers - what do you think?
What matters is the order that the rows hit the tables. My guess is that
a completely indexless table is better.
INSERT INTO tempTable
SELECT KEY,
ADDRESS1,
ADDRESS2
FROM tempExtract
ORDER BY KEY,
ADDRESS1,
ADDRESS2
...

Can you see any holes in that?!

The sad case is that neither is there any guarantee with an INSERT
statement that the IDENITTY value will reflect the ORDER BY statement.
But if you add OPTION (MAXDOP 1) to the query it usually works. MAXDOP 1
turns off parallelism which is the major reason the ORDER BY gets messed up.

But maybe you should rethink completely. Maybe you should bulk into one
server, remove the duplicates, and the replicate the result to the
second server. This could be done by a linked query, or bulking out and
in again.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #7

Thomas Richards

Jul 20 '05 #8

Erland Sommarskog

Thomas Richards (to**********@rocketmail.com) writes:

My latest thinking on this is to create a table the same as the table
holding the 'key' and address components but with an extra identity
type field.

Then insert into this table ordering by key + all columns. This will
force the sequence number to match the same row on each server and the
'key' fields to be sequentially next to each other. Then I can do a
group by, picking up the lowest sequence number.
...

That seems to the same suggestion, to which I answered once, so I
simply repear that answer:

The sad case is that neither is there any guarantee with an INSERT
statement that the IDENITTY value will reflect the ORDER BY statement.
But if you add OPTION (MAXDOP 1) to the query it usually works. MAXDOP 1
turns off parallelism which is the major reason the ORDER BY gets messed up.

But maybe you should rethink completely. Maybe you should bulk into one
server, remove the duplicates, and the replicate the result to the
second server. This could be done by a linked query, or bulking out and
in again.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #9

Thomas Richards

Sorry, what I meant to post was:

I'm thinking of doing the following to ensure I get the same row on
each server (assuming identical collations).
It is supposed to only bring back the row that matches the TOP value
of all the fields concatenated. I've tested it on identical servers
and it appears to work and the theory seems fine to me. Can you see
anything wrong with this??

Thanks
Tom

SELECT KEY,
ADDRESS1,
ADDRESS2
FROM TABLE main
WHERE ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
(SELECT TOP 1 ISNULL(KEY, 'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
FROM TABLE sub
WHERE sub.KEY = main.KEY
ORDER BY ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z'))

Jul 20 '05 #10

Erland Sommarskog

Thomas Richards (to**********@rocketmail.com) writes:

I'm thinking of doing the following to ensure I get the same row on
each server (assuming identical collations).
It is supposed to only bring back the row that matches the TOP value
of all the fields concatenated. I've tested it on identical servers
and it appears to work and the theory seems fine to me. Can you see
anything wrong with this??

Thanks
Tom

SELECT KEY,
ADDRESS1,
ADDRESS2
FROM TABLE main
WHERE ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
(SELECT TOP 1 ISNULL(KEY, 'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z') =
FROM TABLE sub
WHERE sub.KEY = main.KEY
ORDER BY ISNULL(KEY,'Z') +
ISNULL(ADDRESS1, 'Z') +
ISNULL(ADDRESS2, 'Z'))

You could get duplicates if you have some really weird data which
gives the same result for two concatenations, but I guess that is
a calculated risk. And you would get the same duplicates on both
servers.

Ah, there is one more catch - you must make sure that both databases
have the same collation. But since you can specify the collation per
column when you create the table, you can take care of that.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp

Jul 20 '05 #11

by: Bennett Haselton | last post by:

I want to get an ASP.Net hosting account with my ISP, and I'm trying to find out what level of access to the server is requried in order for me to view the server in Server Explorer in Visual...

.NET Framework

Adding Order by on clustered index

by: Tryfon Gavriel | last post by:

Hi all I recently noticed when trying to optimise a major query of a chess website I am the webmaster of, that adding an order by for "gamenumber" which is a clustered index field as in for...

Microsoft SQL Server

Windows Services shutdown order

by: Bill Sonia | last post by:

I've written a Windows Service to send e-mails on events like OnStart, OnStop, OnShutDown using System.Web.Mail. It works for everything but OnShutdown. My guess is that for OnShutDown, once my...

C# / C Sharp

Exactly like SQL Server table order

by: Mamatha | last post by:

Hi i have a code to retrieve table structures of SQL Server database. It gives all the table names in particular databse with the column names and datatypes also. But the output displays not in...

Visual Basic .NET

104

ORDER BY in VIEW not working

by: Beowulf | last post by:

I have the view below and if I use vwRouteReference as the rowsource for a combo box in an MS Access form or run "SELECT * FROM vwRouteReference" in SQL Query Analyzer, the rows don't come through...

Microsoft SQL Server

out of order identity field - sql2000

by: bevanward | last post by:

Hi All I am finding unexpected results when inserted into a newly created table that has a field of datatype int identity (1,1). Basically the order I sort on when inserting into the table is...

Microsoft SQL Server

View Order Dissappearing

by: Hurricane | last post by:

When I create a view in SQL and include an ORDER BY clause i can see it in Management Studio. However, when I call the same view from an ASP page the order goes completely haywire. Any ideas?

Microsoft SQL Server

ORDER BY CASE

by: Sharif Islam | last post by:

this is my query= "SELECT i.itemid,title,SortKey from Items AS i JOIN Links AS L ON (i.ItemID=L.ItemID) WHERE L.instructorID='12232' AND courseID='12' AND type='Audio' order by CASE WHEN Sortkey...

Microsoft SQL Server

Order of insertion.

by: Shum | last post by:

Hi .. another quey... i just noticed that the entries being done to the sql server tables ( insert query in C# )are not in the correct order.. the last entry is not always the last record in the...

C# / C Sharp

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Similar topics