select * in views

Peter (so*****@someplace.com) writes:

I dont know exactly why but some say:

A select * from makes sql server does a table scan.

Is that true,

No. A query like:

SELECT * FROM sometable WHERE primarykey = 12

will use the index on PK.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 15 '06 #3

Peter wrote:

People are telling me it is bad to put

select * from <atable>

in a view. I better should list all fields of the table inside the
definition of the view.

I dont know exactly why but some say:

A select * from makes sql server does a table scan.

Is that true, even if i put a where on the select on the view? And what if i
dont list all fields in the select on the view?

There are two reasons speaking against select * in views and select * in
general.
1. Typically your app does not need all columns.
So by using select * you will:
a) flow wider rows than needed
b) force the DBMS to access the data page even if, in reality a mere
index access would have been sufficient. Once you access the data page
anyway the optimizer will be tempted to use more table scans, but that's
really secondary damage.
Within teh context of a view you will force the optimizer to do more
work than needed. I.e. it needs to drop unused columns which may or may
not work depending on teh capabilities of the DBMS.

2. When a column is added to the table after the view is created the SQL
standard required "conservative" semantics. That is teh existing view
will not pick up the new column.
If however you drop and recreate the view (for whatever reason) the view
will pick up the new column. This can cause some rather unexpected
behavior which may be hard to debug.
The select * in a way is a time bomb.

I use SELECT * as a convenience for throw-away, ad-hoc queries, but not
for anything related to a production system.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Apr 15 '06 #4

Peter

Serge,

I see you are a IBM expert. Is your story true for DB2 only or do you know
it is true for Microsoft SQL Server 2000 too?

Because, if i do a select on a view (select * from <atable>) with only a
projection (only a few colums in the select list), I see in the execution
plan a clustered index scan with a minimized returned data row size. Also,
when I use a where clause on a indexed column, I see an index seek in the
query plan. Also, in the query plan, I dont see view names, but only table
names. It seems to me it works like documented by Microsoft, in the
execution plan, the view is replaces by the underlying view logics, but only
what is nessecary.

Indeed, when I add a new column to the underlying table, the view is not
changed until I recompile the view. What is the problem with that if
existing code never do a select * on the view? And if existing code does a
select *, then, I think this code wants all columns. So it is better to
recompile.

Do I oversee something? Please give a reaction if I am right or I oversee
something.

"Serge Rielau" <sr*****@ca.ibm.com> wrote in message
news:4a************@individual.net...

Peter wrote:
People are telling me it is bad to put

select * from <atable>

in a view. I better should list all fields of the table inside the
definition of the view.

I dont know exactly why but some say:

A select * from makes sql server does a table scan.

Is that true, even if i put a where on the select on the view? And what
if i dont list all fields in the select on the view?

There are two reasons speaking against select * in views and select * in
general.
1. Typically your app does not need all columns.
So by using select * you will:
a) flow wider rows than needed
b) force the DBMS to access the data page even if, in reality a mere
index access would have been sufficient. Once you access the data page
anyway the optimizer will be tempted to use more table scans, but that's
really secondary damage.
Within teh context of a view you will force the optimizer to do more work
than needed. I.e. it needs to drop unused columns which may or may not
work depending on teh capabilities of the DBMS.

2. When a column is added to the table after the view is created the SQL
standard required "conservative" semantics. That is teh existing view will
not pick up the new column.
If however you drop and recreate the view (for whatever reason) the view
will pick up the new column. This can cause some rather unexpected
behavior which may be hard to debug.
The select * in a way is a time bomb.

I use SELECT * as a convenience for throw-away, ad-hoc queries, but not
for anything related to a production system.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Apr 16 '06 #5

Steve Dassin

Perhaps we can get a hole of an MS developer by posting ms sql server
questions in comp.databases.ibm-db2.

:)

Apr 16 '06 #6

Peter (so*****@someplace.com) writes:

I see you are a IBM expert. Is your story true for DB2 only or do you know
it is true for Microsoft SQL Server 2000 too?

Because, if i do a select on a view (select * from <atable>) with only a
projection (only a few colums in the select list), I see in the
execution plan a clustered index scan with a minimized returned data
row size. Also, when I use a where clause on a indexed column, I see an
index seek in the query plan. Also, in the query plan, I dont see view
names, but only table names. It seems to me it works like documented by
Microsoft, in the execution plan, the view is replaces by the underlying
view logics, but only what is nessecary.

Indeed, when I add a new column to the underlying table, the view is not
changed until I recompile the view. What is the problem with that if
existing code never do a select * on the view? And if existing code does a
select *, then, I think this code wants all columns. So it is better to
recompile.

While Serge has more experienc of DB2 than SQL Server, I don't think his
observations are out of whack. The current implementation may forgive you,
but the next may not.

Since I use views very rarely overall muyself, I'm not sure why people are
so keen on using SELECT * in views.

I can think of two cases where it makes sense to use SELECT * in a view
definition:
1) The view presents a subset of table for row-level security.
2) The view is logically a table, that is implemented as several. That is,
partitioned views.

Then again, since you have to refresh the view when you change the
underlying tables, you could just as well update the source code for
it as well.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 16 '06 #7

Peter

Erland,

Thanks for your reply.

The views are a interface to another database, or are a interface to the
database where they live. To make things can change in the underlaying
table, for example, to put a restriction on rows on them or to place them in
another database.

The concept is the views returns the same columns as the underlaying tables.

In one case, some tables are in different databases, within another
database, those tables are all available by views. The transaction tables
are in its own database. Reference tables are in others.

The problem I have with this solution is I can not make indexed views on the
interface views.

The way I look at it is that when the implementation of SQL Server changes
in a next version, so the select * causes problems, I can change that anyway
and replace the asterix with the column names. I then make the cost of extra
maintenance overhead when nessecary. In the mean time, I didnt put any extra
development time in naming each column. When something change in the table,
I just recompile the view, without worrying about which columns are added.

Let me put my question in another way:

Is there an extra performance overhead now in SQL Server 2000/2005 when I
use select * in views instead of naming each column of the table?

Peter

"Erland Sommarskog" <es****@sommarskog.se> wrote in message
news:Xn**********************@127.0.0.1...

Peter (so*****@someplace.com) writes:
I see you are a IBM expert. Is your story true for DB2 only or do you
know
it is true for Microsoft SQL Server 2000 too?

Because, if i do a select on a view (select * from <atable>) with only a
projection (only a few colums in the select list), I see in the
execution plan a clustered index scan with a minimized returned data
row size. Also, when I use a where clause on a indexed column, I see an
index seek in the query plan. Also, in the query plan, I dont see view
names, but only table names. It seems to me it works like documented by
Microsoft, in the execution plan, the view is replaces by the underlying
view logics, but only what is nessecary.

Indeed, when I add a new column to the underlying table, the view is not
changed until I recompile the view. What is the problem with that if
existing code never do a select * on the view? And if existing code does
a
select *, then, I think this code wants all columns. So it is better to
recompile.

While Serge has more experienc of DB2 than SQL Server, I don't think his
observations are out of whack. The current implementation may forgive you,
but the next may not.

Since I use views very rarely overall muyself, I'm not sure why people are
so keen on using SELECT * in views.

I can think of two cases where it makes sense to use SELECT * in a view
definition:
1) The view presents a subset of table for row-level security.
2) The view is logically a table, that is implemented as several. That is,
partitioned views.

Then again, since you have to refresh the view when you change the
underlying tables, you could just as well update the source code for
it as well.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 16 '06 #8

Peter (so*****@someplace.com) writes:

Is there an extra performance overhead now in SQL Server 2000/2005 when I
use select * in views instead of naming each column of the table?

No.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 16 '06 #9

David Portas

Peter wrote:

The way I look at it is that when the implementation of SQL Server changes
in a next version, so the select * causes problems, I can change that anyway
and replace the asterix with the column names. I then make the cost of extra
maintenance overhead when nessecary. In the mean time, I didnt put any extra
development time in naming each column. When something change in the table,
I just recompile the view, without worrying about which columns are added.

Let me put my question in another way:

Is there an extra performance overhead now in SQL Server 2000/2005 when I
use select * in views instead of naming each column of the table?

If the queries are executed by SQL Server on the same server as the
view then there may not be any measurable performance overhead of
SELECT * (except perhaps during compilation). However, you are mistaken
about something else. Recompiling a view containing SELECT * will NOT
necessarily cause it to reflect changes made to the base tables. For
this reason alone, it is a bad idea to use SELECT * in views. See the
following example, which was tested on 2000 SP4 and 2005.

CREATE TABLE dbo.t1 (x INT NOT NULL PRIMARY KEY, z1 INT NULL);
GO
CREATE VIEW dbo.v1 AS
SELECT * FROM dbo.t1
GO
ALTER TABLE dbo.t1 DROP COLUMN z1 ;
ALTER TABLE dbo.t1 ADD z2 INT ;
GO
EXEC dbo.sp_recompile 'dbo.v1' ;
/* Notice that the second column still exists as Z1 in the view */
SELECT x,z1 FROM dbo.v1 ;

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/m...S,SQL.90).aspx
--

Apr 16 '06 #10

Well, first of all IBM is not a product. I could very well be an IBM SQL
Server expert (working for IGS or Websphere for example), but I am a DB2
expert, which I take it was what you meant. :-)

My comments were generic and I'm confident they apply to any SQL RDBMS
out there.

Couple of observations:
1. A thumb rule for using TBSCAN over (index scan & page fetch) is when
the selectivity is worse than, say 5%. Sure any DBSM will pick it's own
secret number, but I don't doubt for a moment that SQL Server, having a
cost based optimizer will eventually say: "Crap. You make me go after
columns not in my index and I'm going to touch next to at least one row
in any page anyway!".
Enabling queries to use covering indexes is important. SELECT * is
counter productive on that end.

2. If all you did in you experiment was a simple CREATE VIEW V AS SELECT
* FROM T then this doesn't say too much.
Thsi is way I made my statement relative. If you SQL is more complex
then mileage will vary depending e.g. on the version of your RDBMS since
you are relying on the optimizer.
SQL is about:
You tell the RDBMS WHAT you want. The RDBMS takes care of the HOW to get it.
As developers it is our responsibility to specify the WHAT correctly.

I'm referring back to the view vs. stored procedure thread here. In that
thread users didn't trust views at all. I don't go that far, but I
wouldn't trust any optimizer ( SQL Server, DB2, you name it) to fix up
all my lazyness through divine intuition.

Lastly I respectfully disagree with the subsequent note in this thread
that over specification of columns in a view has no performance impact.
While this may be true in most cases once the statement invoking the
view is compiled, you are relying on the compiler to optimize out the
unused columns. And that is code path and costs you CPU on first
compile. Whether you can actually "feel" that depends on cache friendly
your app is.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Apr 16 '06 #11

Serge Rielau (sr*****@ca.ibm.com) writes:

Lastly I respectfully disagree with the subsequent note in this thread
that over specification of columns in a view has no performance impact.

I think Peter's question actually was whether there was a performance
impact of saying "SELECT *" instead of listing all columns in the table
explicitly, and for all my dislike for SELECT * in production code, I
can't think of any reason why SELECT * should be any more expensive in
SQL Server.

I completely agree with that best is to include exactly those columns
for which there is an actual need.

One problem I often fight at work is that I want to drop a column, or
drastically change the meaning of it. I suspect that it is not really
in use, but still I find a bunch of stored procedures that return this
column in a result set. But I can also see that these procedures aim
at returning the universe, so I have no idea whether the value is used
for something. (Usually, I end up dropping the column anyway.)
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 16 '06 #12

Serge Rielau wrote:

1. A thumb rule for using TBSCAN over (index scan & page fetch) is when
the selectivity is worse than, say 5%. Sure any DBSM will pick it's own
secret number,

Serge,

I think that threshold selectivity is not a constant - for SQL Server
it also depends on the width of bookmarks. So, if bookmarks are 4 byte
integers, the threshold selectivity is lower than if bookmarks are
50-byte character fields.

Similarly, besides selectivity DB2 also considers clustering factor,
and if an index has a high clustering factor, even low 50% selectivity
might be good enough for an access via an index, correct?

Apr 16 '06 #13

Peter

CREATE TABLE dbo.t1 (x INT NOT NULL PRIMARY KEY, z1 INT NULL);
GO
CREATE VIEW dbo.v1 AS
SELECT * FROM dbo.t1
GO
ALTER TABLE dbo.t1 DROP COLUMN z1 ;
ALTER TABLE dbo.t1 ADD z2 INT ;
GO
EXEC dbo.sp_recompile 'dbo.v1' ;
/* Notice that the second column still exists as Z1 in the view */
SELECT x,z1 FROM dbo.v1 ;

--

O that is bad!!! I gonne test that immidiatly tuesday. Is it different when
naming the columns?

Apr 16 '06 #14

Peter

"Erland Sommarskog" <es****@sommarskog.se> wrote in message
news:Xn**********************@127.0.0.1...

Serge Rielau (sr*****@ca.ibm.com) writes:
Lastly I respectfully disagree with the subsequent note in this thread
that over specification of columns in a view has no performance impact.

I think Peter's question actually was whether there was a performance
impact of saying "SELECT *" instead of listing all columns in the table
explicitly, and for all my dislike for SELECT * in production code, I
can't think of any reason why SELECT * should be any more expensive in
SQL Server.

I completely agree with that best is to include exactly those columns
for which there is an actual need.

One problem I often fight at work is that I want to drop a column, or
drastically change the meaning of it. I suspect that it is not really
in use, but still I find a bunch of stored procedures that return this
column in a result set. But I can also see that these procedures aim
at returning the universe, so I have no idea whether the value is used
for something. (Usually, I end up dropping the column anyway.)
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Erland,

In my case, the views only virtualizes the tables. The views must have the
same structure as the table. So if there are colums added to the table, the
view needs to be updated to. In that case, you have the same problem as the
table without view. You want to delete a column but you dont know if it is
used anywere.

What I mean is, declaring the view as select * is exactly what I functional
want. Give me all columns of the table, no matter what columns there are.

So, the statement

SELECT * FROM sometable

comes closer to what I want and what I mean than

SELECT
column1,
column2,
column3,
column4
FROM
sometable

Apr 16 '06 #15

Peter

"Serge Rielau" <sr*****@ca.ibm.com> wrote in message
news:4a************@individual.net...

Well, first of all IBM is not a product. I could very well be an IBM SQL
Server expert (working for IGS or Websphere for example), but I am a DB2
expert, which I take it was what you meant. :-)

My comments were generic and I'm confident they apply to any SQL RDBMS out
there.

Couple of observations:
1. A thumb rule for using TBSCAN over (index scan & page fetch) is when
the selectivity is worse than, say 5%. Sure any DBSM will pick it's own
secret number, but I don't doubt for a moment that SQL Server, having a
cost based optimizer will eventually say: "Crap. You make me go after
columns not in my index and I'm going to touch next to at least one row in
any page anyway!".
Enabling queries to use covering indexes is important. SELECT * is counter
productive on that end.

I never execute select * from sometable. I only use it in a definition of a
view. If I only select a few columns from that view, I see in the query plan
the select * is never executed. Even I dont even see the name of the view.

So, the query plan shows:

select
column1
from
someviewwithselect*

returns less bytes per row compared to

select
column1,
column2
from
sameviewwithselect*

Also, if the columns I select are in a alternate index, and SQL server uses
that alternate index, SQL server doesnt go to the datapages of the table.
SQL Server gets is data only form the index blocks.

Unless the query plan is lying to me.

So to me te code

create table x (column1, column2, colum3)
select column1 from x

has the same execution plan as

create table x (column1, column2, column3)
create view y as select * from x
select column1 from y
Serge, what do I miss? I dont know what your story about indexes and scans
has to do whith selecting on views.

Apr 16 '06 #16

Alexander Kuznetsov wrote:

Serge Rielau wrote:
1. A thumb rule for using TBSCAN over (index scan & page fetch) is when
the selectivity is worse than, say 5%. Sure any DBSM will pick it's own
secret number,
Serge,

I think that threshold selectivity is not a constant - for SQL Server
it also depends on the width of bookmarks. So, if bookmarks are 4 byte
integers, the threshold selectivity is lower than if bookmarks are
50-byte character fields.

Struggling to research bookmarks I ran into:
http://msdn2.microsoft.com/en-US/lib...0(SQL.90).aspx
"Bookmark Lookup is not used in SQL Server 2005. Instead, Clustered
Index Seek and RID Lookup provide bookmark lookup functionality."
Cool, let's bypass that one.
Similarly, besides selectivity DB2 also considers clustering factor,
and if an index has a high clustering factor, even low 50% selectivity
might be good enough for an access via an index, correct?

Certainly clustering will affect the decision. But the point of my note
was not to get into the gory details (see "thumb rule") as to state that
a covering index scan is certainly preferable in the vast majority of
cases over a "index seek and RID lookup" (in SQL Server words, aka
ISCAN/FETCH in DB2 words) and that "index seek and RID lookup" compete
with table scans.
So whenever a query for more columns that it consumes it's at risk of
getting suboptimal plans.

Back to Peter, if you select all rows and all columns in your view, then
what good is it? Are you blindly applying some lesson in some book?
views are used for access control, as shorthands for complex queries and
to hide details of the tables. I don't see how you do any of that...
The simplicity of your views also prevents you from seeing my point that
the optimizer (any optimizer) is fallible and you don't want to find out
about the unused column that wasn't dropped when it's too late.

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Apr 16 '06 #17

Peter wrote:

Serge, what do I miss?

I think you are using the wrong tool:
http://msdn2.microsoft.com/en-us/lib...4(SQL.90).aspx
try CREATE SYNONYM

Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

Apr 16 '06 #18

Serge Rielau wrote:

Struggling to research bookmarks I ran into:
http://msdn2.microsoft.com/en-US/lib...0(SQL.90).aspx
"Bookmark Lookup is not used in SQL Server 2005. Instead, Clustered
Index Seek and RID Lookup provide bookmark lookup functionality."
Cool, let's bypass that one.

Interesting. My understanding is that SQL Server 2000 does not store
clustering factor as part of statistics, and the reason is simple: it
makes no sence for bookmark lookups. However, for more efficient RID
lookups, it would make perfect sence to both calculate clustering
factor as part of statistics gathering and have the optimizer use it.
Yet I tried to google up "clustering factor"+"SQL server" and that
brought up nothing relevant to SQL Server.

Does SQL Server 2005 optimizer use clustering factor, and if yes,
what's the proper word for it?

Apr 17 '06 #19

Alexander Kuznetsov (AK************@hotmail.COM) writes:

Serge Rielau wrote:
Struggling to research bookmarks I ran into:
http://msdn2.microsoft.com/en-US/lib...0(SQL.90).aspx
"Bookmark Lookup is not used in SQL Server 2005. Instead, Clustered
Index Seek and RID Lookup provide bookmark lookup functionality."
Cool, let's bypass that one.

I think this is mainly a terminology issue. Instead of talking of
bookmark lookups, they present it as a join between the NC index and
the clustered index. But there is no real difference between SQL 2000
and SQL 2005, as far as I understand.
Interesting. My understanding is that SQL Server 2000 does not store
clustering factor as part of statistics, and the reason is simple: it
makes no sence for bookmark lookups. However, for more efficient RID
lookups, it would make perfect sence to both calculate clustering
factor as part of statistics gathering and have the optimizer use it.
Yet I tried to google up "clustering factor"+"SQL server" and that
brought up nothing relevant to SQL Server.

Does SQL Server 2005 optimizer use clustering factor, and if yes,
what's the proper word for it?

I'm not really sure what you mean with clustering factor. As for RID
lookups, they only occur with heaps, that is tables without clustered
indexes.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 17 '06 #20

Peter (so*****@someplace.com) writes:

CREATE TABLE dbo.t1 (x INT NOT NULL PRIMARY KEY, z1 INT NULL);
GO
CREATE VIEW dbo.v1 AS
SELECT * FROM dbo.t1
GO
ALTER TABLE dbo.t1 DROP COLUMN z1 ;
ALTER TABLE dbo.t1 ADD z2 INT ;
GO
EXEC dbo.sp_recompile 'dbo.v1' ;
/* Notice that the second column still exists as Z1 in the view */
SELECT x,z1 FROM dbo.v1 ;

O that is bad!!! I gonne test that immidiatly tuesday. Is it different
when naming the columns?

Yes, then you get an error when you try to access the view.

Then again, what David's example really shows is that you should use
sp_refreshview when you've changed the underlying table, not sp_recompile.
To wit, with sp_refreshview, v1 picks up the changed in columns.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 17 '06 #21

Serge Rielau (sr*****@ca.ibm.com) writes:

Peter wrote:
Serge, what do I miss?

I think you are using the wrong tool:
http://msdn2.microsoft.com/en-us/lib...4(SQL.90).aspx
try CREATE SYNONYM

I still haven't really understood why Peter want these views. But you
are right that if he has a one-to-one mapping from table to view, then
synonyms are a better choice. If he is on SQL 2005, that is. (SQL2000
does not have synonyms.)

However, there is also the case of partitioned views, where you have a
suite of identical tables with a CHECK constraint on the first primary-
key column. Such a view would look like:

SELECT * FROM sales2000
UNION ALL
SELECT * FROM sales2001
UNION ALL
...

Of course, you can list all columns here as well, but say that you add two
new columns of the same data type to the tables, and when you change the
view late Friday afternoon when your mind elsewhere, you end up with:

SELECT year, col1, col2, .... newcol1, newcol2 FROM sales2000
UNION ALL
SELECT year, col1, col2, .... newcol2, newcol1 FROM sales2000
UNION ALL

In unfortunate cases, this can lead to errors that can unnoticed for
quite a while, and in the mean while lead to incorrect decisions.

The counter-argument to this is that you may only want to add the
columns to the sales2006 table, but a SELECT * in the view would
force you to add the columns to all tables.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 17 '06 #22

Erland Sommarskog wrote:

I'm not really sure what you mean with clustering factor.

To make long story short,
suppose you have a Customer table clustered on phone number, having on
average 20 rows per page. Suppose you want to retrieve customers with
DOB between January 1st and January 15th, which is about 4% of data.
Because phone number and date of birth are not correlated, qualifying
rows are scattered all over the table, and it is very likely that there
is a customer with DOB between January 1st and January 15th on almost
every page. So Oracle/DB2 optimizer will look up clustering factor of
the index on DOB (it is low) and go for a table scan. On the other
hand, phone number and city are very correlated. As a result, if 10%
customers live in the city of Someville, the rows matching the criteria
city='SOMEVILLE' are located on adjacent pages, because they have phone
numbers with the same beginning. The index on city has a high
clustering factor. The Oracle/DB2 optimizer will choose to access the
table via the index on city, and it will be more efficient than a table
scan - almost 90% of data pages will not be read.

Apr 17 '06 #23

Alexander Kuznetsov (AK************@hotmail.COM) writes:

To make long story short,
suppose you have a Customer table clustered on phone number, having on
average 20 rows per page. Suppose you want to retrieve customers with
DOB between January 1st and January 15th, which is about 4% of data.
Because phone number and date of birth are not correlated, qualifying
rows are scattered all over the table, and it is very likely that there
is a customer with DOB between January 1st and January 15th on almost
every page. So Oracle/DB2 optimizer will look up clustering factor of
the index on DOB (it is low) and go for a table scan. On the other
hand, phone number and city are very correlated. As a result, if 10%
customers live in the city of Someville, the rows matching the criteria
city='SOMEVILLE' are located on adjacent pages, because they have phone
numbers with the same beginning. The index on city has a high
clustering factor. The Oracle/DB2 optimizer will choose to access the
table via the index on city, and it will be more efficient than a table
scan - almost 90% of data pages will not be read.

That explains why I haven't heard of it. SQL Server performs no such
deliberations, as far as I know.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 17 '06 #24

Erland Sommarskog wrote:

That explains why I haven't heard of it. SQL Server performs no such
deliberations, as far as I know.

Well if on the leaf level of a non clustered index you've got an RID, a
structure looking like (extent#, page#, row_on_page#), and you know
that matching rows are likely to be stored on adjacent pages, you can
traverse a range of the index, sort RIDs of matching rows, and read
every page with matching rows only once. This is why clustering factor
is a very useful piece of statistics for DB2/Oracle.

If on the leaf level of a non clustered index you've got a bookmark,
then you just know absolutely nothing about the physical location of
the rows you need to retrieve. I don't see any way SQL Server could
utilize clustering factor as long as bookmarks are used to access rows
from non-clustered indexes (unless it's a heap table, which is not the
best practice). Please correct me if I'm wrong.

Apr 17 '06 #25

Erland,

I think one more scenario for select * views is when a view is used as
a security tool. Suppose I expose a whole table to a user

grant select on sales to johnsmith

If later on I add a column to sales table, it is visible to johnsmith.
Suppose I expose to danbrown only his own sales via a view:
create view persons_own_sales
as
select * from sales where ...

grant select on persons_own_sales to danbrown

Again, If later on I add a column to sales table, I want it to be
visible to danbrown with minimum maintenance - just refresh the view.
Makes sense?

Apr 18 '06 #26

Peter

Erland,

Thanks, I tested it and indeed I did not notice a difference in using
sp_recompile or not.

And I see also the difference in naming the columns instead of using the
asterix. It seems to me the binding is different. With the asterix, the
binding is by column number instead of by name.

mmmm
"Erland Sommarskog" <es****@sommarskog.se> wrote in message
news:Xn**********************@127.0.0.1...

Peter (so*****@someplace.com) writes:
CREATE TABLE dbo.t1 (x INT NOT NULL PRIMARY KEY, z1 INT NULL);
GO
CREATE VIEW dbo.v1 AS
SELECT * FROM dbo.t1
GO
ALTER TABLE dbo.t1 DROP COLUMN z1 ;
ALTER TABLE dbo.t1 ADD z2 INT ;
GO
EXEC dbo.sp_recompile 'dbo.v1' ;
/* Notice that the second column still exists as Z1 in the view */
SELECT x,z1 FROM dbo.v1 ;

O that is bad!!! I gonne test that immidiatly tuesday. Is it different
when naming the columns?

Yes, then you get an error when you try to access the view.

Then again, what David's example really shows is that you should use
sp_refreshview when you've changed the underlying table, not sp_recompile.
To wit, with sp_refreshview, v1 picks up the changed in columns.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 18 '06 #27

Peter

From this discussion I conclude:

1. Select * in a view is not a performance issue when using a current
versions of SQL Server.
2. Select * in a view has benefits and drawbacks in a maintenance aspect.

Performance
I do not believe it is a big change Microsoft will change this behavior.
But, there is always a change.

Maintenance benefit
In the case I regular want the view to expose all columns of the table, I
don not have to name all the columns, but I can simply express what I want.
If the underlying table stucture is changed, I simple run exec
sp_refreshview and everything is ok again.

Maintenance drawback
If the underlying table structure is changed: columns are added and columns
are deleted but the count of column keeps the same, I do not get a error
message when I select the view and there is a change I get wrong results
without noticing it in time. I need to be very disciplinic to always use
sp_refreshview on all views that do a select * on the changed table.

For me, the drawback is a real issue, I like robust code. I also like
maintenance simplicity. I need to make a choice.

Thanks to all, I know what is the impact of this choice.

Unless someone has to add some new point in this discussion :-)

Apr 18 '06 #28

Jim Underwood

Regarding this one point below...
If you always specify the column names then you will get an error instead of
getting bad data. It will take only a moment to add or remove a column from
your view to correct the error, which you will find immediately because the
code will not run.

It will take much longer to fix problems caused by bad data that is not
discovered for a month.

IMHO, you are much, much better off having the entire application stop
running while you fix a column name that you overlooked, rather than have
the application continue running and corrupt all of your data.

"Peter" <so*****@someplace.com> wrote in message
news:2b**************************@news.chello.nl.. .

Maintenance drawback
If the underlying table structure is changed: columns are added and columns are deleted but the count of column keeps the same, I do not get a error
message when I select the view and there is a change I get wrong results
without noticing it in time.

Apr 18 '06 #29

Alexander Kuznetsov (AK************@hotmail.COM) writes:

I think one more scenario for select * views is when a view is used as
a security tool. Suppose I expose a whole table to a user

grant select on sales to johnsmith

If later on I add a column to sales table, it is visible to johnsmith.
Suppose I expose to danbrown only his own sales via a view:
create view persons_own_sales
as
select * from sales where ...

grant select on persons_own_sales to danbrown

Again, If later on I add a column to sales table, I want it to be
visible to danbrown with minimum maintenance - just refresh the view.
Makes sense?

Yes, I think that I mentioned this in previous post in the thread.

Then again, it may be the case that you don't want to expose the
columns that controls access.

Also, in this case, the maintenance problem is smaller as there is
on one SELECT on the view. (I'm assuming the view has something like
WHERE user = SYSTEM_USER.) With a partitioned view, you have a
repetition with more room for errors.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 18 '06 #30

Hugo Kornelis

(excessive crossposting limited to two of the foour groups)

On Tue, 18 Apr 2006 17:57:02 +0200, Peter wrote:

From this discussion I conclude: (snip)Thanks to all, I know what is the impact of this choice.

Unless someone has to add some new point in this discussion :-)

Hi Peter,

The point I'm missing in your summary is the ease of impact analysis.

If you use SELECT * anywhere (other than an EXISTS subquery), you lose
the possibility to do an impact analysis of a proposed change by
searching the source code (CREATE scripts) of your database.

--
Hugo Kornelis, SQL Server MVP

Apr 18 '06 #31

Hugo Kornelis (hu**@perFact.REMOVETHIS.info.INVALID) writes:

The point I'm missing in your summary is the ease of impact analysis.

If you use SELECT * anywhere (other than an EXISTS subquery), you lose
the possibility to do an impact analysis of a proposed change by
searching the source code (CREATE scripts) of your database.

A good point, but as Peter seems to using the views as a sort of
synonyms, I think he is covered on that point. That is, when you
work with his system, you need to know that the views mirror the
tables in some way.

But in the general case, you are perfectly right.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx

Apr 18 '06 #32

Gert-Jan Strik

Alexander Kuznetsov wrote:

Erland Sommarskog wrote:
That explains why I haven't heard of it. SQL Server performs no such
deliberations, as far as I know.

Well if on the leaf level of a non clustered index you've got an RID, a
structure looking like (extent#, page#, row_on_page#), and you know
that matching rows are likely to be stored on adjacent pages, you can
traverse a range of the index, sort RIDs of matching rows, and read
every page with matching rows only once. This is why clustering factor
is a very useful piece of statistics for DB2/Oracle.

If on the leaf level of a non clustered index you've got a bookmark,
then you just know absolutely nothing about the physical location of
the rows you need to retrieve. I don't see any way SQL Server could
utilize clustering factor as long as bookmarks are used to access rows
from non-clustered indexes (unless it's a heap table, which is not the
best practice). Please correct me if I'm wrong.

The bookmark does give (some) information about the physical location!

In some situations, sorting on the clustered index key could be of use.
But more information would be need to be used. The rows are physically
stored in the order of the clustered index key. So statistics of the
clustered index can determine the likelyhood that two consecutive
clustered keys (found in the nonclustered index) can be found on the
same page. This approach would guarantee a maximum of one read per page.
Without such a mechanism each bookmark lookup could lead to a physical
I/O, even if the average numbers of rows per page is much higher. This
could occur if there is not enough memory to hold the data pages in
cache, and the bookmark lookups are done out of (clustered index) order.

But even then, it is just an estimate. The index statistics do not give
information about the density of individual pages, and two consecutive
rows could still be on two different pages. So there is a large margin
of error which makes the tradeoff between 'random' bookmark lookups,
sorted bookmark lookups and clustered index scan (with the advantage of
sequential I/O over random I/O) difficult.

Gert-Jan

Apr 18 '06 #33