By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,631 Members | 1,144 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,631 IT Pros & Developers. It's quick & easy.

Preventing Duplicate Rows on Insert

P: n/a
I have a table using an identity column as its Primary Key and two
columns (table reduced for simplicity) EmployeeNumber and ArrivalTime.

CREATE TABLE [tblRecords] (
[ID] [bigint] IDENTITY (1, 1) NOT NULL ,
[EmployeeNumber] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS
NOT NULL ,
[ArrivalTime] [datetime] NOT NULL ,
CONSTRAINT [PK_tblRecords] PRIMARY KEY CLUSTERED
(
[ID]
) ON [PRIMARY]
) ON [PRIMARY]
GO

I have an insert procedure that checks for duplicates before inserting
a new record:

IF (SELECT TOP 1 [ID] FROM tblRecords WHERE EmployeeNumber =
@SocialSecurity) IS NULL
BEGIN
INSERT INTO tblRecords(EmployeeNumber,ArrivalTime)
VALUES (@EmployeeNumber, @ArrivalTime)
SELECT SCOPE_IDENTITY()
END
ELSE
SELECT 0 AS DuplicateRecord

In 99.9% of the cases, this works well. However, in the event that the
insert attempts are literally "ticks" apart, the "SELECT TOP 1..."
command completes on both attempts before the first attempt completes.
So I end up with duplicate entries if the procedure is called multiple
times vey quickly. The system needs to prevent duplicate
EmployeeNumbers within the past 45 days so setting the EmployeeNumber
to UNIQUE would not work. I can check for older entries (45 days or
newer) very easily, but I do not know how to handle the times when the
procedure is called multiple times within milliseconds. Would a
TRANSACTION with a duplicate check after the INSERT with a ROLLBACK
work in this case? Any help is greatly appreciated!

-E

Jul 23 '05 #1
Share this Question
Share on Google+
18 Replies


P: n/a
Try this:

INSERT INTO tblRecords (employeenumber,arrivaltime)
SELECT @employeenumber, @arrivaltime
WHERE NOT EXISTS
(SELECT *
FROM tblRecords
WHERE employeenumber = @employeenumber
AND arrivaltime >= DATEADD(DAY,-45,@arrivaltime)) ;

As posted your table design is flawed because it has no alternate keys.
IDENTITY should never be the only key of a table.

--
David Portas
SQL Server MVP
--
Jul 23 '05 #2

P: n/a
A better solution is probably to use a trigger.

First, add the all-important UNIQUE constraint:

ALTER TABLE tblRecords ADD CONSTRAINT ak_employees UNIQUE
(employeenumber,arrivaltime)

Now here's a trigger which will enforce your business rule for UPDATEs as
well as INSERTs:

CREATE TRIGGER trg_employee ON tblRecords FOR UPDATE, INSERT
AS
IF EXISTS
(SELECT T1.employeenumber
FROM inserted AS T1
JOIN tblrecords AS T2
ON T1.employeenumber = T2.employeenumber
AND T1.arrivaltime <> T2.arrivaltime
AND T2.arrivaltime <= DATEADD(DAY,45,T1.arrivaltime)
AND T2.arrivaltime >= DATEADD(DAY,-45,T1.arrivaltime))
BEGIN
ROLLBACK TRAN
RAISERROR('Employee number repeated within 45 days',16,1)
END

GO

--
David Portas
SQL Server MVP
--
Jul 23 '05 #3

P: n/a
David Portas (RE****************************@acm.org) writes:
Try this:

INSERT INTO tblRecords (employeenumber,arrivaltime)
SELECT @employeenumber, @arrivaltime
WHERE NOT EXISTS
(SELECT *
FROM tblRecords
WHERE employeenumber = @employeenumber
AND arrivaltime >= DATEADD(DAY,-45,@arrivaltime)) ;


You would need serializable isolation level here. I think the UPDLOCK
in the SELECT statement is the best. (With plain HOLDLOCK you could
get a deadlock.)

As for whether a trigger is best... You don't have to lose sleep over
concurrent inserts, but in case of an error you do more work (INSERT +
ROLLBACK). Whether an error message is desired we don't know. Then again,
you can have both. The trigger to enforce the business rule, and the
application code to give the nice behaviour.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #4

P: n/a
As mentioned by Erland, you could place your batch in a transaction, and
serialize the action either by settings the TRANSACTION ISOLATION LEVEL
to SERIALIZABLE or to use locking hints, as shown below:
BEGIN TRANSACTION

If NOT EXISTS (
SELECT *
FROM tblRecords (UPDLOCK HOLDLOCK)
WHERE EmployeeNumber = @SocialSecurity
AND ArrivalTime >= DATEADD(day,-45,@ArrivalTime)
)
Begin
INSERT INTO tblRecords(EmployeeNumber,ArrivalTime)
VALUES (@EmployeeNumber, @ArrivalTime)
SELECT SCOPE_IDENTITY()
End
Else
SELECT 0 AS DuplicateRecord

COMMIT TRANSACTION
HTH,
Gert-Jan
Elroyskimms wrote:

I have a table using an identity column as its Primary Key and two
columns (table reduced for simplicity) EmployeeNumber and ArrivalTime.

CREATE TABLE [tblRecords] (
[ID] [bigint] IDENTITY (1, 1) NOT NULL ,
[EmployeeNumber] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS
NOT NULL ,
[ArrivalTime] [datetime] NOT NULL ,
CONSTRAINT [PK_tblRecords] PRIMARY KEY CLUSTERED
(
[ID]
) ON [PRIMARY]
) ON [PRIMARY]
GO

I have an insert procedure that checks for duplicates before inserting
a new record:

IF (SELECT TOP 1 [ID] FROM tblRecords WHERE EmployeeNumber =
@SocialSecurity) IS NULL
BEGIN
INSERT INTO tblRecords(EmployeeNumber,ArrivalTime)
VALUES (@EmployeeNumber, @ArrivalTime)
SELECT SCOPE_IDENTITY()
END
ELSE
SELECT 0 AS DuplicateRecord

In 99.9% of the cases, this works well. However, in the event that the
insert attempts are literally "ticks" apart, the "SELECT TOP 1..."
command completes on both attempts before the first attempt completes.
So I end up with duplicate entries if the procedure is called multiple
times vey quickly. The system needs to prevent duplicate
EmployeeNumbers within the past 45 days so setting the EmployeeNumber
to UNIQUE would not work. I can check for older entries (45 days or
newer) very easily, but I do not know how to handle the times when the
procedure is called multiple times within milliseconds. Would a
TRANSACTION with a duplicate check after the INSERT with a ROLLBACK
work in this case? Any help is greatly appreciated!

-E

Jul 23 '05 #5

P: n/a
>> I have a table using an identity column [sic] Primary Key and two columns (table reduced for simplicity) EmployeeNumber and ArrivalTime. <<

Have you ever read the definition of a key??? An IDENTITY can never.
never be a PRIMARY KEY by defintion because it is not a subset of the
entities attributes. It is an exposed physical storage locator derived
from the state of the hardware and not the data model. Why did you
give it the vague, useless name "ID"?

Why did you put ":tbl-" on your table names? Read ISO-11179 for the
right way to name things. Rows are not records. Columns are not
fields. Why do you unneedlessly use proprietary datatypes and the most
procedural T-SQL you could write? Do you see destruction of portable
code and nonrelational programming as a good thing?

How do you have an arrival without a departure? Is the ISO model of
time wrong?

CREATE TABLE Events -- needs a better name
(emp_nbr CHAR (10) NOT NULL
REFERENCES Personnel (emp_nbr)
ON UPDATE CASCADE,
arrival_time DATETIME DEFAULT CURRRENT_TIMESTAMP NOT NULL,
depart_time DATETIME, -- null means current
PRIMARY KEY (emp_nbr, arrival_time));

You have missing the major points of RDBMS and nee to get some help,
more than the kludges you will get a newsgroup.

Jul 23 '05 #6

P: n/a
AK
>> An IDENTITY can never.
never be a PRIMARY KEY by defintion because it is not a subset of the
entities attributes. It is an exposed physical storage locator derived

from the state of the hardware and not the data model.
<<

with all due respect, Oracle's rowid is what are speking about -
"an exposed physical storage locator". Identity is an entirely
different beast...

Jul 23 '05 #7

P: n/a
>> Oracle's rowid is what are speaking about - "an exposed physical storage locator". Identity is an entirely different beast... <<

No, it is not significantly different. Both of them are numbers that
have nothing to do with the data model or the reality of the data
model. Both are created inside the hardware used to store the data.
Both use a bit of software to move the read-write head on a disk drive
directly to a physical location.

The rowid is merely easier programming and faster than IDENTITY, which
is using the indexing code. Both are exposed physical locators, not
surrogates as per Dr. Codd. That is not a logical difference.

The IDENTITY property is a hold-over from the days when the only way we
knew to storage data involved physically contigous records, made of
physically contigous fields, with tree-structured indexing. This is
the worse way to store relational data.

Jul 23 '05 #8

P: n/a
For those of you who offered suggestions to my question, thanks for
your help. It is greatly appreciated.

Celko,
I don't know what to make of your response, but I will try and address
your concerns individually:
Have you ever read the definition of a key??? Yes, I have. Thanks for asking.
An IDENTITY can never never be a PRIMARY KEY by defintion
because it is not a subset of the entities attributes. I appreciate your concern, but you are assuming things about these
entities that are incorrect. In this case the IDENTITY _is_ a subset
of the attributes. I only displayed a simplified version of the tables
because my question was not really specific to the tables in my
database.
Why did you give it the vague, useless name "ID"? Because it is neither vague nor useless to everyone who works with the
data in these tables. Again, you are assuming that you know everything
about this data and how it is being used. I will remind you that I
posted a simplified set of tables.
Why did you put ":tbl-" on your table names? Read ISO-11179 for the
right way to name things. I prefixed my table names with tbl because there is a lot more code
that I work with besides SQL. So to keep all of my objects straight,
their names also reflect what type of object they are. So, my textboxes
are named txt_____ and my buttons are named btn___... I'm sure you get
the idea.
Rows are not records. Columns are not fields. You are right. I actually knew that, but I prefer to use lamens terms
because I hate sounding like a prick no-it-all that is out to teach
everyone something. No one likes those types of people, don't you
agree?
Why do you unneedlessly use proprietary datatypes and the most
procedural T-SQL you could write? Do you see destruction of portable
code and nonrelational programming as a good thing? I'm not sure I understand your concern here. Again, you are assuming
that by seeing these 2 sample tables you have an omniscient view of my
database. My code is portable and my data relational. I know some
people are better than me, I'm OK with that. However, I choose not to
bash them simply because they know less than me. I hate those types of
people, don't you?
How do you have an arrival without a departure? Again, you assume that because you see a couple of tables you know all
about my data. I know you are the smartest person in these groups, but
here is something you didn't know. I did not include a departure time
because each record can have one arrival time and many departure times.
I think that is called a one-to-many relationship. If I'm wrong, I'm
sure you'll point it out. Because of this "relationship" between my
tables, departures are stored in a separate table. I think that is
called relational data, but I could be wrong... no one likes being
right all of the time. I hate those types of people, don't you?
CREATE TABLE Events -- needs a better name Events is a better name? What kind of vague and useless name is
"Event"? For your information, my actual table name is not Record I
just used it to irritate you. Did it work?
(emp_nbr CHAR (10) NOT NULL I prefer to spell out the table name so I would use number instead of
nbr. Sometimes, the people I work with don't read English very well and
they don't know what to do with nbr. But 'number' is in their
vocabulary.
depart_time DATETIME, -- null means current Only one departure per arrival? This doesn't look very relational to
me.
You have missing the major points of RDBMS and
nee to get some help, more than the kludges you
will get a newsgroup.

My data is relational, my code is portable, and my table names are
readable to anyone with a 3rd grade English vocabulary. I'm sorry if
that is not to your liking. I'll try and do better next time.

Jul 23 '05 #9

P: n/a
> In this case the IDENTITY _is_ a subset
of the attributes


That's dangerous. IDENTITY is generally suitable only as an artificial
key. It should not have any business meaning, in fact it should be
completely hidden from users of the system. If you assign any business
meaning to an IDENTITY key you set yourself up for a whole lot of
potential problems to do with migrating and deploying data in a live
environment. Also, your design is fundamental flawed if you lack a
natural candidate key - this is a frequent cause of incorrect results
and invalid data in poorly designed databases.

--
David Portas
SQL Server MVP
--

Jul 23 '05 #10

P: n/a
David Portas (RE****************************@acm.org) writes:
In this case the IDENTITY _is_ a subset
of the attributes
That's dangerous. IDENTITY is generally suitable only as an artificial
key. It should not have any business meaning, in fact it should be
completely hidden from users of the system. If you assign any business
meaning to an IDENTITY key you set yourself up for a whole lot of
potential problems to do with migrating and deploying data in a live
environment.


Another one that competes with Joe Celko to know everything about everyone
else's system, I see.

Well, it's often the case that what you call "natural keys" are in fact
someone else's arbitrary sequential number, assigned with IDENTITY or
something else.

There is a whole of real-worl data out there that does not have any
natural key that obeys the laws of the relational model.
Also, your design is fundamental flawed if you lack a natural candidate
key - this is a frequent cause of incorrect results and invalid data in
poorly designed databases.


Personally, I would be very careful with anyone else's design "fundamentally
flawed" when I have only seen glimpses of that design. Particularly when
it's clear from Elroyskimms that he has quite some experience of system
design. Just because he has not made the design that you or I(*) would
have done, does not mean that the design is wrong. When I think of it,
you should learn Perl. If nothing else, to learn the motto of Perl:
"There's more than one way to do it".
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #11

P: n/a
> Another one that competes with Joe Celko to know everything about everyone
else's system, I see.
I was giving my opinion and suggestions based on the information given.
I can't do more than that and I don't claim omniscience.

There is a whole of real-worl data out there that does not have any
natural key that obeys the laws of the relational model.


No-one says the world has to obey the laws of the relational model
(whatever that means). This is purely a question of how to model the
world in a table. We can represent ANY facts as relations (i.e. with
keys). Whether we choose to use IDENTITY or not is a practical issue in
SQL Server that has absolutely zero to do with the logical data model.
That's why I said "fundamental flaw".

--
David Portas
SQL Server MVP
--

Jul 23 '05 #12

P: n/a
AK
>> There is a whole of real-worl data out there that does not have any
natural key that obeys the laws of the relational model.
<<

I could not agree more.

Jul 23 '05 #13

P: n/a
Actually I may have mis-read Erland's statement. It could be read more
than one way and I'm not sure what interpretation you are agreeing
with.
"There is a whole of real-worl data out there that does not have any
natural key that obeys the laws of the relational model."


If this means [1] "there is a lot of data in database systems that
doesn't have a natural key" then I agree. If it means [2] "there is a
lot of information that cannot be modelled in an RDBMS without a
surrogate key" then I disagree and I think Erland is confusing logical
and physical models.

--
David Portas
SQL Server MVP
--

Jul 23 '05 #14

P: n/a
David Portas (RE****************************@acm.org) writes:
"There is a whole of real-worl data out there that does not have any
natural key that obeys the laws of the relational model."


If this means [1] "there is a lot of data in database systems that
doesn't have a natural key" then I agree. If it means [2] "there is a
lot of information that cannot be modelled in an RDBMS without a
surrogate key" then I disagree and I think Erland is confusing logical
and physical models.


I simply mean that there is a lot of data for which you cannot find
any useful natural primary key. Whether there exists one in theory, or none
at all, is completely uninterestering for us who gets paid to implement such
systems. (Because our customers could not care less.)

Typical examples of entities that do not have any useful natural key:

Persons (For certain subsets a natural key can be found)
Addresses (The entire address is a key in itself, but not useful as one.
And a lot of it can be NULL.)
Financial instruments
(Several competing schemes of natural keys exist, but none of
them are whole-covering.)


--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #15

P: n/a
> Whether there exists one in theory, or none
at all, is completely uninterestering for us who gets paid to implement such
systems. (Because our customers could not care less.)


If customers aren't interested it's surely because they pay database
professionals to worry about such things. That's what keeps you and I
gainfully employed.

--
David Portas
SQL Server MVP
--

Jul 23 '05 #16

P: n/a
David Portas (RE****************************@acm.org) writes:
Whether there exists one in theory, or none at all, is completely
uninterestering for us who gets paid to implement such systems.
(Because our customers could not care less.)


If customers aren't interested it's surely because they pay database
professionals to worry about such things. That's what keeps you and I
gainfully employed.


No, my customers pay me to get a system where they can store transactions
about financial instruments. If I tell them "no, you can't store this
instrument in the database, because there is no identification on it
than I can use for a primary key", they will give me very sinister glances.
Likewise if I tell them "for the system to be able to identify a customer,
you will need to enter, this, this and this". Or, "it will take 50 hours
to find out whether this entity has a potential primary key that we are
likely to be able to use anyway".

Simply put, our customers pays us to get solutions. They don't pay us
for do theoretical investigations. It's part of our professional role
to quickly identify that here is an entity, we are not likely to find
a useful primary key for.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #17

P: n/a
> If I tell them "no, you can't store this
instrument in the database, because there is no identification on it
than I can use for a primary key", they will give me very sinister glances.


Keys don't prevent you storing information. They prevent you storing
redundant data. You lose nothing by eliminating that redundancy and you
may gain a lot because redundancy can cause incorrect results.

Of course it is possible to produce correct results from redundant data
and you may still be able to meet the customer's immediate needs.
However, it's a golden rule that enterprise data outlives and extends
beyond the applications that manage it. That's true without exception
in my experience. Storing redundant data today means that not only your
application but also future developers and other applications have to
cope with the potential anomalies that result. Coping with data quality
issues costs money, plus it may prove impossible to resolve those
issues to a satisfactory level of confidence in the context of some
future set of requirements. Anyone who works on data integration
projects knows those experiences too well.

Customers who understand that data quality is valuable to the business
should also understand that the time taken to design a consistent
logical data model is necessary to guarantee that data quality. That's
the philosophy I work to anyway. There are other imperatives too of
course and it's always interesting and sometimes instructive to hear
how others approach problems.

--
David Portas
SQL Server MVP
--

Jul 23 '05 #18

P: n/a
David Portas (RE****************************@acm.org) writes:
If I tell them "no, you can't store this instrument in the database,
because there is no identification on it than I can use for a primary
key", they will give me very sinister glances.


Keys don't prevent you storing information. They prevent you storing
redundant data. You lose nothing by eliminating that redundancy and you
may gain a lot because redundancy can cause incorrect results.


If I require that when users registers instruments that they must
specify an ISIN code, users have two choices when they have an instrument
that do not have an ISIN code (or the users don't know the code):

1) don't register until they have found an ISIN code for the instrument.
2) make one up.

1) is not not a serious option, since an OTC instrument agreed on between
two parties never will have an ISIN code.

And what with 2)? Well, what if users invents a new code, instead of
using an existing instrument?

So much did that key help from preventing redudant data.

Another example: for customers you get the idea to use the national
registration number. Never mind the fine detail that potential customers
without a national registration number cannot get into the database.
Anyway, at the end of the year we need to produce reporting to the
tax authorities. A customer has exactly one tax country, but oops, here
is this guy who changed his taxation country in the middle of the
year. Of course, rather than making the tax country an attribute of
a customer, we could have a customertaxcountryhistory table, but, eh,
we can't fix that this year. Anyway, the workaround is simple: register
the customer anew, with a new accounts, so his transactions for the
part of the year we need to report him. Since we do not require uniqueness
on national registration number (and even less are foolish to make
this a key), this was possible.

Building a system too much on natural keys means that when you have
goofed on the model and missed the 0.1% exceptional case, mean that
users will have to fund ugly workarounds - which in worst cases can
mean entirely manual procedures. And don't tell me goofs in the design
don't happen. They do.

Bad data can also get into the system in ways you cannot protect the
system from. You can prevent a user from entering that a person is
living in Flance, by having countries as entity, and only permit a
set of defined countries. But you cannot prevent a user from entering
Switzerland for a customer who is living in Sweden.

So while you by means of constraints, keys etc can prevent some crap data
from getting into the database, you can not prevent all sorts of crap.
But if you are too restrictive, you put users in a straightjacket
where your system hinders their job.

In this context is a fallacy to turn every stone to find a natural primary
key. Sometimes it is simply not there. In an order registration system,
the key is the order number generated by the application. That key can
serve as a natural key in other system, but in the order registration
system we should not try to find some other key. If you want to detect
double-registration of orders, it's probably better to develop some sort
of fuzzy-logic report, where the users manually have to consider the
suspects.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #19

This discussion thread is closed

Replies have been disabled for this discussion.