By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,856 Members | 2,163 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,856 IT Pros & Developers. It's quick & easy.

Particularly Challenging SQL Problem.

P: n/a
Table DDL:

create table test
(
inId int primary key identity,
inFK int not null,
inSeq int not null,
dtDate datetime
)

Data Insert:
insert into test
select 1,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 1,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 1,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 2,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 2,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 3,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 4,1, getdate()
If we select on this table:
inId inFK inSeq dtDate

----------- ----------- -----------
------------------------------------------------------
1 1 1 2005-02-01 12:54:40.967
2 1 1 2005-02-01 12:54:41.967
3 1 1 2005-02-01 12:54:42.967
4 2 1 2005-02-01 12:54:43.967
5 2 1 2005-02-01 12:54:44.967
6 3 1 2005-02-01 12:54:45.983
7 4 1 2005-02-01 12:54:47.077

(7 row(s) affected)
Problem:
What I would like to do (using SQL and not a cursor) is to update the
value of inSeq to its ordinal position, this will be based on the # of
occurences of inFK. For Example, I would like to run a sql statement
that would transform the data to:

update test
set inSEQ = (some sql)

select * from test - This would then produce:

inId inFK inSeq dtDate

----------- ----------- -----------
------------------------------------------------------
1 1 1 2005-02-01 12:54:40.967
2 1 2 2005-02-01 12:54:41.967
3 1 3 2005-02-01 12:54:42.967
4 2 1 2005-02-01 12:54:43.967
5 2 2 2005-02-01 12:54:44.967
6 3 1 2005-02-01 12:54:45.983
7 4 1 2005-02-01 12:54:47.077
(7 row(s) affected)
Any help would be greatly appreciated.

TFD

Jul 23 '05 #1
Share this Question
Share on Google+
17 Replies


P: n/a
SSK
Hi

The following solution will work if the inserted records (with
same inFK value ) are in sequential order only.

update test set inSeq = (inid - (select min(inid) from test b where
a.infk = b.infk ) + 1) from test a

Thanks
Ssk

If the Records are
LineVoltageHalogen wrote:
Table DDL:

create table test
(
inId int primary key identity,
inFK int not null,
inSeq int not null,
dtDate datetime
)

Data Insert:
insert into test
select 1,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 1,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 1,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 2,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 2,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 3,1, getdate()

WAITFOR DELAY '000:00:01'

insert into test
select 4,1, getdate()
If we select on this table:
inId inFK inSeq dtDate

----------- ----------- -----------
------------------------------------------------------
1 1 1 2005-02-01 12:54:40.967
2 1 1 2005-02-01 12:54:41.967
3 1 1 2005-02-01 12:54:42.967
4 2 1 2005-02-01 12:54:43.967
5 2 1 2005-02-01 12:54:44.967
6 3 1 2005-02-01 12:54:45.983
7 4 1 2005-02-01 12:54:47.077

(7 row(s) affected)
Problem:
What I would like to do (using SQL and not a cursor) is to update the
value of inSeq to its ordinal position, this will be based on the # of occurences of inFK. For Example, I would like to run a sql statement that would transform the data to:

update test
set inSEQ = (some sql)

select * from test - This would then produce:

inId inFK inSeq dtDate

----------- ----------- -----------
------------------------------------------------------
1 1 1 2005-02-01 12:54:40.967
2 1 2 2005-02-01 12:54:41.967
3 1 3 2005-02-01 12:54:42.967
4 2 1 2005-02-01 12:54:43.967
5 2 2 2005-02-01 12:54:44.967
6 3 1 2005-02-01 12:54:45.983
7 4 1 2005-02-01 12:54:47.077
(7 row(s) affected)
Any help would be greatly appreciated.

TFD


Jul 23 '05 #2

P: n/a
I can probably control this by creating a clusterd primary key on inId,
inFk, and sdDate, do you concur?

TFD

Jul 23 '05 #3

P: n/a
No. A clustered index is irrelevant. It has no impact on the standard,
documented behaviour of an UPDATE statement. The results of certain unusual
and ambiguous UPDATE statements which are undefined in the documentation may
be affected by indexes but the behaviour of those remains the same in each
case: the result is undefined and therefore unreliable.

If inseq is based purely on the values of infk and dtdate then there is no
obvious need to put the column in the table at all. Derive the sequence when
you query the table or add the numbering at client-side:

SELECT T1.inid, T1.infk,
COUNT(*) AS inseq, T1.dtdate
FROM Test AS T1
JOIN Test AS T2
ON T1.infk = T2.infk
AND T1.dtdate >= T2.dtdate
GROUP BY T1.inid, T1.infk, T1.dtdate

--
David Portas
SQL Server MVP
--
Jul 23 '05 #4

P: n/a
On 1 Feb 2005 09:58:01 -0800, LineVoltageHalogen wrote:

(snip)
Problem:
What I would like to do (using SQL and not a cursor) is to update the
value of inSeq to its ordinal position, this will be based on the # of
occurences of inFK. For Example, I would like to run a sql statement
that would transform the data to:

(snip)

Hi TFD,

Thanks for providing the create table and insert statements and the
expected output to clarify your need.

The following update statement will set the inSeq values as requested:

UPDATE test
SET inSeq = (SELECT COUNT(*)
FROM test AS t
WHERE t.inFK = test.inFK
AND t.dtDate <= test.dtDate)

By the way - using datatype-prefixes to column names is really not a
recommended practice. Would you really want to go over all your code if
the datatype of one of your columns has to be changed?

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Jul 23 '05 #5

P: n/a
David, thank you for your response. Doing stuff client side is not an
option for me, the example I put forth is just the gist of what I need
to do. I am actually implementing an ETL for a BI tool and this
example demonstrates the nature of the real problem I am solving. So,
an update statement is required, I also don't have the luxury of
knowing how many inFk,inSeq pairs will be coming in until after the
load completes. Can your query be modified to accomodate an update?
TFD

Jul 23 '05 #6

P: n/a
Hugo, if I understand correctly this query does not depend on the
physical order of the data on disk?

Jul 23 '05 #7

P: n/a
SQL statements never depend on the physical order of the data on disk.
That's a basic principle of an RDBMS.

Why do you ask?

--
David Portas
SQL Server MVP
--
Jul 23 '05 #8

P: n/a
I just want to understand, that is all. I was thinking that if they
had been sorted when I kicked off the update that SQL would just start
with the first record and then rip through them all from begining to
end. That would seem like the most natural action on SQL's part, how
else would it decide on the order in which is was going to execute the
update? Let's assume we have a simple query such as:

update table
set row = 'new value'

Wouldn't SQL just go to the head of the set and begin updating until it
reaches the end?

L

Jul 23 '05 #9

P: n/a
David Portas (RE****************************@acm.org) writes:
If inseq is based purely on the values of infk and dtdate then there is
no obvious need to put the column in the table at all. Derive the
sequence when you query the table or add the numbering at client-side:

SELECT T1.inid, T1.infk,
COUNT(*) AS inseq, T1.dtdate
FROM Test AS T1
JOIN Test AS T2
ON T1.infk = T2.infk
AND T1.dtdate >= T2.dtdate
GROUP BY T1.inid, T1.infk, T1.dtdate


The performance of this type of query is not always that fantastic, so
if you need the row numbering often, it may indeed be a good idea to
compute it once.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #10

P: n/a
The Relational Model and set-based SQL keep the logical results of a any
data manipulation abstracted from the way the data is physically stored. A
table is logically unordered and cannot be "sorted" but irrespective of the
order in which data is physically stored the result of any operation on the
data is the same. This principle is called Physical Data Independence. You
should read a book on relational basics.

--
David Portas
SQL Server MVP
--
Jul 23 '05 #11

P: n/a
David, again thank you for taking the time to inform me. I am quite
aware of Physical Independence and I would bet that the question I am
trying to ask probably won't be answered in a book on the basics of
theory. I am just curious how SQL actually handles this internally,
how it writes to disk, how it chooses an access path, etc. I know it
is not relevant to my problem but sometimes I like to visualize what is
going on at the physical implementation level.

Regards, TFD.

Jul 23 '05 #12

P: n/a
SQL Server's Query Optimizer chooses the access path to the data based on
analysis of indexes and statistics for your table(s). The Optimizer may
choose a different plan on different occassions, even for the same query and
table structure.

If you want to read more about SQL Server internals I recommend "Inside SQL
Server" by Kalen Delaney.

--
David Portas
SQL Server MVP
--
Jul 23 '05 #13

P: n/a
LineVoltageHalogen (tr****************@yahoo.com) writes:
David, again thank you for taking the time to inform me. I am quite
aware of Physical Independence and I would bet that the question I am
trying to ask probably won't be answered in a book on the basics of
theory. I am just curious how SQL actually handles this internally,
how it writes to disk, how it chooses an access path, etc. I know it
is not relevant to my problem but sometimes I like to visualize what is
going on at the physical implementation level.


The bottom line is that you cannot rely on physical ordering.

In 6.5 and previous version if you had a clustered index on a table
and said

SELECT * FROM tbl

you could be pretty sure that you would get the rows ordered according
to the clustered index. Same if you had an unindexed table, you would
get back the rows in the order they had been inserted.

But from SQL7 this is no longer true. This is because the SQL Server
engine has improved, and now can return data from multiple output
streams. If you have a somewhat large table with, say, 20000 rows, and
say SELECT * FRON tbl, you may contiguous blocks of ids, but there
will be some random variation of the blocks.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #14

P: n/a
On 1 Feb 2005 14:22:13 -0800, LineVoltageHalogen wrote:
I just want to understand, that is all. I was thinking that if they
had been sorted when I kicked off the update that SQL would just start
with the first record and then rip through them all from begining to
end.
Hi L,

In theory, an UPDATE statement is carried out instantaneously. One moment,
all rows have the "old" value; the next moment, all rows are changed to
the "new" value.

In practice, nothing can be done instantaneously. The computer needs some
time to fetch data from disk or cache, calculate the new values, push the
new values back to the cache and write the log entries to disk. Depending
on table size, available indexes, workload and lots of other factors, your
UPDATE statement will take anything from microseconds to hours or even
more.

During the time the UPDATE statement is processing, the number of changed
rows will be gradually increasing. Each changed row is also locked - this
means that no other process will get access to the data. All these locks
are held until the entire transaction is finished.

What happens if you fire a long-running UPDATE, then try to read some rows
from another connection? Well, there are basically two possibilities:
either the row you read has not yet been affected by the UPDATE - in that
case, you'll get to see the "old" value. But if the row has already been
changed by the UPDATE, you'll see nothing - the execution of your query is
postponed until the lock on the required row is lifted. Once the UPDATE is
completely done, the required row can be read, and you'll get the new data
returned.

The net effect of this is, that until the UPDATE statement has finished,
other people can't ever see the new data - either they see the old data,
or they are "put on hold" (blocked). After the UPDATE statement has
finished, other people will only see the new data. So even though the
UPDATE can't really be carried out instanteneously, the combination of
locking and blocking ensures that the system behaves as if it is.
That would seem like the most natural action on SQL's part, how
else would it decide on the order in which is was going to execute the
update? Let's assume we have a simple query such as:

update table
set row = 'new value'

Wouldn't SQL just go to the head of the set and begin updating until it
reaches the end?


SQL Server is entirely free to choose it's own strategy for carrying out
the update. The chosen order might depend on many factors, like indexes
available, workload, table statistics or maybe even the weather - but
since the system behaves as if all rows were affected at the same time, it
really doesn't matter what order SQL Server picks.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Jul 23 '05 #15

P: n/a
Understood. Thank You.

Jul 23 '05 #16

P: n/a
Excellent information. Thank you very much. This is exactly what I
was looking for.

TFD

Jul 23 '05 #17

P: n/a
>> Let's assume we have a simple query such as:
update table
set row = 'new value'
Wouldn't SQL just go to the head of the set and begin updati*ng until
it reaches the end? <<

No! And the phrase "head of the set" and "end" make absolutley no
sense. Sets have no ordering. This is high school math, not
programming.

SQL is a set-oriented language. The UPDATE has to act as if it follows
these steps:
1) Mark all the rows that qualify in the WHERE clause
2) Construct a new image of the set of rows to be modified
3) Delete the old rows as a set
4) Insert the new rows as a set
5) Do a commit or rollback as needed

This is parallel behavior, not sequential file behavior. For example,

UPDATE Foobar
SET a=b, b=a;

will swap the the values of a and b because the entire new row is
constructed at once as an element of a set. In a sequential file
system you would the assignments left to right and both columns (fields
in a file system) would get the value of b.

Jul 23 '05 #18

This discussion thread is closed

Replies have been disabled for this discussion.