People are telling me it is bad to put
select * from <atable>
in a view. I better should list all fields of the table inside the
definition of the view.
I dont know exactly why but some say:
A select * from makes sql server does a table scan.
Is that true, even if i put a where on the select on the view? And what if i
dont list all fields in the select on the view?
Thanks for the answer.
Peter 33 6497
Try the following code:
create table t
(
ID int primary key
, Col1 int not null
)
go
create view v
as
select * from t
go
insert v values (1, 2)
go
alter table t
add
Col2 int not null constraint CK_t default (0)
go
alter table t
drop constraint CK_t
go
select * from v
go
select * from t
go
insert v values (2, 2)
go
drop view v
drop table t
You'll see that the SELECT from the view did not pick up the extra column.
Also, the second insert failed - even though the SELECT on the view
suggested there were only 3 columns.
--
Tom
----------------------------------------------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Toronto, ON Canada
"Peter" <so*****@someplace.com> wrote in message
news:68***************************@news.chello.nl. ..
People are telling me it is bad to put
select * from <atable>
in a view. I better should list all fields of the table inside the
definition of the view.
I dont know exactly why but some say:
A select * from makes sql server does a table scan.
Is that true, even if i put a where on the select on the view? And what if i
dont list all fields in the select on the view?
Thanks for the answer.
Peter
Peter wrote: People are telling me it is bad to put
select * from <atable>
in a view. I better should list all fields of the table inside the definition of the view.
I dont know exactly why but some say:
A select * from makes sql server does a table scan.
Is that true, even if i put a where on the select on the view? And what if i dont list all fields in the select on the view?
There are two reasons speaking against select * in views and select * in
general.
1. Typically your app does not need all columns.
So by using select * you will:
a) flow wider rows than needed
b) force the DBMS to access the data page even if, in reality a mere
index access would have been sufficient. Once you access the data page
anyway the optimizer will be tempted to use more table scans, but that's
really secondary damage.
Within teh context of a view you will force the optimizer to do more
work than needed. I.e. it needs to drop unused columns which may or may
not work depending on teh capabilities of the DBMS.
2. When a column is added to the table after the view is created the SQL
standard required "conservative" semantics. That is teh existing view
will not pick up the new column.
If however you drop and recreate the view (for whatever reason) the view
will pick up the new column. This can cause some rather unexpected
behavior which may be hard to debug.
The select * in a way is a time bomb.
I use SELECT * as a convenience for throw-away, ad-hoc queries, but not
for anything related to a production system.
Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Serge,
I see you are a IBM expert. Is your story true for DB2 only or do you know
it is true for Microsoft SQL Server 2000 too?
Because, if i do a select on a view (select * from <atable>) with only a
projection (only a few colums in the select list), I see in the execution
plan a clustered index scan with a minimized returned data row size. Also,
when I use a where clause on a indexed column, I see an index seek in the
query plan. Also, in the query plan, I dont see view names, but only table
names. It seems to me it works like documented by Microsoft, in the
execution plan, the view is replaces by the underlying view logics, but only
what is nessecary.
Indeed, when I add a new column to the underlying table, the view is not
changed until I recompile the view. What is the problem with that if
existing code never do a select * on the view? And if existing code does a
select *, then, I think this code wants all columns. So it is better to
recompile.
Do I oversee something? Please give a reaction if I am right or I oversee
something.
"Serge Rielau" <sr*****@ca.ibm.com> wrote in message
news:4a************@individual.net... Peter wrote: People are telling me it is bad to put
select * from <atable>
in a view. I better should list all fields of the table inside the definition of the view.
I dont know exactly why but some say:
A select * from makes sql server does a table scan.
Is that true, even if i put a where on the select on the view? And what if i dont list all fields in the select on the view? There are two reasons speaking against select * in views and select * in general. 1. Typically your app does not need all columns. So by using select * you will: a) flow wider rows than needed b) force the DBMS to access the data page even if, in reality a mere index access would have been sufficient. Once you access the data page anyway the optimizer will be tempted to use more table scans, but that's really secondary damage. Within teh context of a view you will force the optimizer to do more work than needed. I.e. it needs to drop unused columns which may or may not work depending on teh capabilities of the DBMS.
2. When a column is added to the table after the view is created the SQL standard required "conservative" semantics. That is teh existing view will not pick up the new column. If however you drop and recreate the view (for whatever reason) the view will pick up the new column. This can cause some rather unexpected behavior which may be hard to debug. The select * in a way is a time bomb.
I use SELECT * as a convenience for throw-away, ad-hoc queries, but not for anything related to a production system.
Cheers Serge -- Serge Rielau DB2 Solutions Development IBM Toronto Lab
Perhaps we can get a hole of an MS developer by posting ms sql server
questions in comp.databases.ibm-db2.
:)
Peter (so*****@someplace.com) writes: I see you are a IBM expert. Is your story true for DB2 only or do you know it is true for Microsoft SQL Server 2000 too?
Because, if i do a select on a view (select * from <atable>) with only a projection (only a few colums in the select list), I see in the execution plan a clustered index scan with a minimized returned data row size. Also, when I use a where clause on a indexed column, I see an index seek in the query plan. Also, in the query plan, I dont see view names, but only table names. It seems to me it works like documented by Microsoft, in the execution plan, the view is replaces by the underlying view logics, but only what is nessecary.
Indeed, when I add a new column to the underlying table, the view is not changed until I recompile the view. What is the problem with that if existing code never do a select * on the view? And if existing code does a select *, then, I think this code wants all columns. So it is better to recompile.
While Serge has more experienc of DB2 than SQL Server, I don't think his
observations are out of whack. The current implementation may forgive you,
but the next may not.
Since I use views very rarely overall muyself, I'm not sure why people are
so keen on using SELECT * in views.
I can think of two cases where it makes sense to use SELECT * in a view
definition:
1) The view presents a subset of table for row-level security.
2) The view is logically a table, that is implemented as several. That is,
partitioned views.
Then again, since you have to refresh the view when you change the
underlying tables, you could just as well update the source code for
it as well.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
Erland,
Thanks for your reply.
The views are a interface to another database, or are a interface to the
database where they live. To make things can change in the underlaying
table, for example, to put a restriction on rows on them or to place them in
another database.
The concept is the views returns the same columns as the underlaying tables.
In one case, some tables are in different databases, within another
database, those tables are all available by views. The transaction tables
are in its own database. Reference tables are in others.
The problem I have with this solution is I can not make indexed views on the
interface views.
The way I look at it is that when the implementation of SQL Server changes
in a next version, so the select * causes problems, I can change that anyway
and replace the asterix with the column names. I then make the cost of extra
maintenance overhead when nessecary. In the mean time, I didnt put any extra
development time in naming each column. When something change in the table,
I just recompile the view, without worrying about which columns are added.
Let me put my question in another way:
Is there an extra performance overhead now in SQL Server 2000/2005 when I
use select * in views instead of naming each column of the table?
Peter
"Erland Sommarskog" <es****@sommarskog.se> wrote in message
news:Xn**********************@127.0.0.1... Peter (so*****@someplace.com) writes: I see you are a IBM expert. Is your story true for DB2 only or do you know it is true for Microsoft SQL Server 2000 too?
Because, if i do a select on a view (select * from <atable>) with only a projection (only a few colums in the select list), I see in the execution plan a clustered index scan with a minimized returned data row size. Also, when I use a where clause on a indexed column, I see an index seek in the query plan. Also, in the query plan, I dont see view names, but only table names. It seems to me it works like documented by Microsoft, in the execution plan, the view is replaces by the underlying view logics, but only what is nessecary.
Indeed, when I add a new column to the underlying table, the view is not changed until I recompile the view. What is the problem with that if existing code never do a select * on the view? And if existing code does a select *, then, I think this code wants all columns. So it is better to recompile.
While Serge has more experienc of DB2 than SQL Server, I don't think his observations are out of whack. The current implementation may forgive you, but the next may not.
Since I use views very rarely overall muyself, I'm not sure why people are so keen on using SELECT * in views.
I can think of two cases where it makes sense to use SELECT * in a view definition: 1) The view presents a subset of table for row-level security. 2) The view is logically a table, that is implemented as several. That is, partitioned views.
Then again, since you have to refresh the view when you change the underlying tables, you could just as well update the source code for it as well.
-- Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
Peter wrote: The way I look at it is that when the implementation of SQL Server changes in a next version, so the select * causes problems, I can change that anyway and replace the asterix with the column names. I then make the cost of extra maintenance overhead when nessecary. In the mean time, I didnt put any extra development time in naming each column. When something change in the table, I just recompile the view, without worrying about which columns are added.
Let me put my question in another way:
Is there an extra performance overhead now in SQL Server 2000/2005 when I use select * in views instead of naming each column of the table?
If the queries are executed by SQL Server on the same server as the
view then there may not be any measurable performance overhead of
SELECT * (except perhaps during compilation). However, you are mistaken
about something else. Recompiling a view containing SELECT * will NOT
necessarily cause it to reflect changes made to the base tables. For
this reason alone, it is a bad idea to use SELECT * in views. See the
following example, which was tested on 2000 SP4 and 2005.
CREATE TABLE dbo.t1 (x INT NOT NULL PRIMARY KEY, z1 INT NULL);
GO
CREATE VIEW dbo.v1 AS
SELECT * FROM dbo.t1
GO
ALTER TABLE dbo.t1 DROP COLUMN z1 ;
ALTER TABLE dbo.t1 ADD z2 INT ;
GO
EXEC dbo.sp_recompile 'dbo.v1' ;
/* Notice that the second column still exists as Z1 in the view */
SELECT x,z1 FROM dbo.v1 ;
--
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online: http://msdn2.microsoft.com/library/m...S,SQL.90).aspx
--
Well, first of all IBM is not a product. I could very well be an IBM SQL
Server expert (working for IGS or Websphere for example), but I am a DB2
expert, which I take it was what you meant. :-)
My comments were generic and I'm confident they apply to any SQL RDBMS
out there.
Couple of observations:
1. A thumb rule for using TBSCAN over (index scan & page fetch) is when
the selectivity is worse than, say 5%. Sure any DBSM will pick it's own
secret number, but I don't doubt for a moment that SQL Server, having a
cost based optimizer will eventually say: "Crap. You make me go after
columns not in my index and I'm going to touch next to at least one row
in any page anyway!".
Enabling queries to use covering indexes is important. SELECT * is
counter productive on that end.
2. If all you did in you experiment was a simple CREATE VIEW V AS SELECT
* FROM T then this doesn't say too much.
Thsi is way I made my statement relative. If you SQL is more complex
then mileage will vary depending e.g. on the version of your RDBMS since
you are relying on the optimizer.
SQL is about:
You tell the RDBMS WHAT you want. The RDBMS takes care of the HOW to get it.
As developers it is our responsibility to specify the WHAT correctly.
I'm referring back to the view vs. stored procedure thread here. In that
thread users didn't trust views at all. I don't go that far, but I
wouldn't trust any optimizer ( SQL Server, DB2, you name it) to fix up
all my lazyness through divine intuition.
Lastly I respectfully disagree with the subsequent note in this thread
that over specification of columns in a view has no performance impact.
While this may be true in most cases once the statement invoking the
view is compiled, you are relying on the compiler to optimize out the
unused columns. And that is code path and costs you CPU on first
compile. Whether you can actually "feel" that depends on cache friendly
your app is.
Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Serge Rielau (sr*****@ca.ibm.com) writes: Lastly I respectfully disagree with the subsequent note in this thread that over specification of columns in a view has no performance impact.
I think Peter's question actually was whether there was a performance
impact of saying "SELECT *" instead of listing all columns in the table
explicitly, and for all my dislike for SELECT * in production code, I
can't think of any reason why SELECT * should be any more expensive in
SQL Server.
I completely agree with that best is to include exactly those columns
for which there is an actual need.
One problem I often fight at work is that I want to drop a column, or
drastically change the meaning of it. I suspect that it is not really
in use, but still I find a bunch of stored procedures that return this
column in a result set. But I can also see that these procedures aim
at returning the universe, so I have no idea whether the value is used
for something. (Usually, I end up dropping the column anyway.)
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
Serge Rielau wrote: 1. A thumb rule for using TBSCAN over (index scan & page fetch) is when the selectivity is worse than, say 5%. Sure any DBSM will pick it's own secret number,
Serge,
I think that threshold selectivity is not a constant - for SQL Server
it also depends on the width of bookmarks. So, if bookmarks are 4 byte
integers, the threshold selectivity is lower than if bookmarks are
50-byte character fields.
Similarly, besides selectivity DB2 also considers clustering factor,
and if an index has a high clustering factor, even low 50% selectivity
might be good enough for an access via an index, correct?
> CREATE TABLE dbo.t1 (x INT NOT NULL PRIMARY KEY, z1 INT NULL); GO CREATE VIEW dbo.v1 AS SELECT * FROM dbo.t1 GO ALTER TABLE dbo.t1 DROP COLUMN z1 ; ALTER TABLE dbo.t1 ADD z2 INT ; GO EXEC dbo.sp_recompile 'dbo.v1' ; /* Notice that the second column still exists as Z1 in the view */ SELECT x,z1 FROM dbo.v1 ;
--
O that is bad!!! I gonne test that immidiatly tuesday. Is it different when
naming the columns?
"Erland Sommarskog" <es****@sommarskog.se> wrote in message
news:Xn**********************@127.0.0.1... Serge Rielau (sr*****@ca.ibm.com) writes: Lastly I respectfully disagree with the subsequent note in this thread that over specification of columns in a view has no performance impact.
I think Peter's question actually was whether there was a performance impact of saying "SELECT *" instead of listing all columns in the table explicitly, and for all my dislike for SELECT * in production code, I can't think of any reason why SELECT * should be any more expensive in SQL Server.
I completely agree with that best is to include exactly those columns for which there is an actual need.
One problem I often fight at work is that I want to drop a column, or drastically change the meaning of it. I suspect that it is not really in use, but still I find a bunch of stored procedures that return this column in a result set. But I can also see that these procedures aim at returning the universe, so I have no idea whether the value is used for something. (Usually, I end up dropping the column anyway.)
-- Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
Erland,
In my case, the views only virtualizes the tables. The views must have the
same structure as the table. So if there are colums added to the table, the
view needs to be updated to. In that case, you have the same problem as the
table without view. You want to delete a column but you dont know if it is
used anywere.
What I mean is, declaring the view as select * is exactly what I functional
want. Give me all columns of the table, no matter what columns there are.
So, the statement
SELECT * FROM sometable
comes closer to what I want and what I mean than
SELECT
column1,
column2,
column3,
column4
FROM
sometable
"Serge Rielau" <sr*****@ca.ibm.com> wrote in message
news:4a************@individual.net... Well, first of all IBM is not a product. I could very well be an IBM SQL Server expert (working for IGS or Websphere for example), but I am a DB2 expert, which I take it was what you meant. :-)
My comments were generic and I'm confident they apply to any SQL RDBMS out there.
Couple of observations: 1. A thumb rule for using TBSCAN over (index scan & page fetch) is when the selectivity is worse than, say 5%. Sure any DBSM will pick it's own secret number, but I don't doubt for a moment that SQL Server, having a cost based optimizer will eventually say: "Crap. You make me go after columns not in my index and I'm going to touch next to at least one row in any page anyway!". Enabling queries to use covering indexes is important. SELECT * is counter productive on that end.
I never execute select * from sometable. I only use it in a definition of a
view. If I only select a few columns from that view, I see in the query plan
the select * is never executed. Even I dont even see the name of the view.
So, the query plan shows:
select
column1
from
someviewwithselect*
returns less bytes per row compared to
select
column1,
column2
from
sameviewwithselect*
Also, if the columns I select are in a alternate index, and SQL server uses
that alternate index, SQL server doesnt go to the datapages of the table.
SQL Server gets is data only form the index blocks.
Unless the query plan is lying to me.
So to me te code
create table x (column1, column2, colum3)
select column1 from x
has the same execution plan as
create table x (column1, column2, column3)
create view y as select * from x
select column1 from y
Serge, what do I miss? I dont know what your story about indexes and scans
has to do whith selecting on views.
Alexander Kuznetsov wrote: Serge Rielau wrote: 1. A thumb rule for using TBSCAN over (index scan & page fetch) is when the selectivity is worse than, say 5%. Sure any DBSM will pick it's own secret number, Serge,
I think that threshold selectivity is not a constant - for SQL Server it also depends on the width of bookmarks. So, if bookmarks are 4 byte integers, the threshold selectivity is lower than if bookmarks are 50-byte character fields.
Struggling to research bookmarks I ran into: http://msdn2.microsoft.com/en-US/lib...0(SQL.90).aspx
"Bookmark Lookup is not used in SQL Server 2005. Instead, Clustered
Index Seek and RID Lookup provide bookmark lookup functionality."
Cool, let's bypass that one.
Similarly, besides selectivity DB2 also considers clustering factor, and if an index has a high clustering factor, even low 50% selectivity might be good enough for an access via an index, correct?
Certainly clustering will affect the decision. But the point of my note
was not to get into the gory details (see "thumb rule") as to state that
a covering index scan is certainly preferable in the vast majority of
cases over a "index seek and RID lookup" (in SQL Server words, aka
ISCAN/FETCH in DB2 words) and that "index seek and RID lookup" compete
with table scans.
So whenever a query for more columns that it consumes it's at risk of
getting suboptimal plans.
Back to Peter, if you select all rows and all columns in your view, then
what good is it? Are you blindly applying some lesson in some book?
views are used for access control, as shorthands for complex queries and
to hide details of the tables. I don't see how you do any of that...
The simplicity of your views also prevents you from seeing my point that
the optimizer (any optimizer) is fallible and you don't want to find out
about the unused column that wasn't dropped when it's too late.
Cheers
Serge
--
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Serge Rielau wrote: Struggling to research bookmarks I ran into: http://msdn2.microsoft.com/en-US/lib...0(SQL.90).aspx "Bookmark Lookup is not used in SQL Server 2005. Instead, Clustered Index Seek and RID Lookup provide bookmark lookup functionality." Cool, let's bypass that one.
Interesting. My understanding is that SQL Server 2000 does not store
clustering factor as part of statistics, and the reason is simple: it
makes no sence for bookmark lookups. However, for more efficient RID
lookups, it would make perfect sence to both calculate clustering
factor as part of statistics gathering and have the optimizer use it.
Yet I tried to google up "clustering factor"+"SQL server" and that
brought up nothing relevant to SQL Server.
Does SQL Server 2005 optimizer use clustering factor, and if yes,
what's the proper word for it?
Alexander Kuznetsov (AK************@hotmail.COM) writes: Serge Rielau wrote: Struggling to research bookmarks I ran into: http://msdn2.microsoft.com/en-US/lib...0(SQL.90).aspx "Bookmark Lookup is not used in SQL Server 2005. Instead, Clustered Index Seek and RID Lookup provide bookmark lookup functionality." Cool, let's bypass that one.
I think this is mainly a terminology issue. Instead of talking of
bookmark lookups, they present it as a join between the NC index and
the clustered index. But there is no real difference between SQL 2000
and SQL 2005, as far as I understand.
Interesting. My understanding is that SQL Server 2000 does not store clustering factor as part of statistics, and the reason is simple: it makes no sence for bookmark lookups. However, for more efficient RID lookups, it would make perfect sence to both calculate clustering factor as part of statistics gathering and have the optimizer use it. Yet I tried to google up "clustering factor"+"SQL server" and that brought up nothing relevant to SQL Server.
Does SQL Server 2005 optimizer use clustering factor, and if yes, what's the proper word for it?
I'm not really sure what you mean with clustering factor. As for RID
lookups, they only occur with heaps, that is tables without clustered
indexes.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
Peter (so*****@someplace.com) writes: CREATE TABLE dbo.t1 (x INT NOT NULL PRIMARY KEY, z1 INT NULL); GO CREATE VIEW dbo.v1 AS SELECT * FROM dbo.t1 GO ALTER TABLE dbo.t1 DROP COLUMN z1 ; ALTER TABLE dbo.t1 ADD z2 INT ; GO EXEC dbo.sp_recompile 'dbo.v1' ; /* Notice that the second column still exists as Z1 in the view */ SELECT x,z1 FROM dbo.v1 ;
O that is bad!!! I gonne test that immidiatly tuesday. Is it different when naming the columns?
Yes, then you get an error when you try to access the view.
Then again, what David's example really shows is that you should use
sp_refreshview when you've changed the underlying table, not sp_recompile.
To wit, with sp_refreshview, v1 picks up the changed in columns.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
Serge Rielau (sr*****@ca.ibm.com) writes: Peter wrote: Serge, what do I miss? I think you are using the wrong tool: http://msdn2.microsoft.com/en-us/lib...4(SQL.90).aspx try CREATE SYNONYM
I still haven't really understood why Peter want these views. But you
are right that if he has a one-to-one mapping from table to view, then
synonyms are a better choice. If he is on SQL 2005, that is. (SQL2000
does not have synonyms.)
However, there is also the case of partitioned views, where you have a
suite of identical tables with a CHECK constraint on the first primary-
key column. Such a view would look like:
SELECT * FROM sales2000
UNION ALL
SELECT * FROM sales2001
UNION ALL
...
Of course, you can list all columns here as well, but say that you add two
new columns of the same data type to the tables, and when you change the
view late Friday afternoon when your mind elsewhere, you end up with:
SELECT year, col1, col2, .... newcol1, newcol2 FROM sales2000
UNION ALL
SELECT year, col1, col2, .... newcol2, newcol1 FROM sales2000
UNION ALL
In unfortunate cases, this can lead to errors that can unnoticed for
quite a while, and in the mean while lead to incorrect decisions.
The counter-argument to this is that you may only want to add the
columns to the sales2006 table, but a SELECT * in the view would
force you to add the columns to all tables.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
Erland Sommarskog wrote: I'm not really sure what you mean with clustering factor.
To make long story short,
suppose you have a Customer table clustered on phone number, having on
average 20 rows per page. Suppose you want to retrieve customers with
DOB between January 1st and January 15th, which is about 4% of data.
Because phone number and date of birth are not correlated, qualifying
rows are scattered all over the table, and it is very likely that there
is a customer with DOB between January 1st and January 15th on almost
every page. So Oracle/DB2 optimizer will look up clustering factor of
the index on DOB (it is low) and go for a table scan. On the other
hand, phone number and city are very correlated. As a result, if 10%
customers live in the city of Someville, the rows matching the criteria
city='SOMEVILLE' are located on adjacent pages, because they have phone
numbers with the same beginning. The index on city has a high
clustering factor. The Oracle/DB2 optimizer will choose to access the
table via the index on city, and it will be more efficient than a table
scan - almost 90% of data pages will not be read.
Alexander Kuznetsov (AK************@hotmail.COM) writes: To make long story short, suppose you have a Customer table clustered on phone number, having on average 20 rows per page. Suppose you want to retrieve customers with DOB between January 1st and January 15th, which is about 4% of data. Because phone number and date of birth are not correlated, qualifying rows are scattered all over the table, and it is very likely that there is a customer with DOB between January 1st and January 15th on almost every page. So Oracle/DB2 optimizer will look up clustering factor of the index on DOB (it is low) and go for a table scan. On the other hand, phone number and city are very correlated. As a result, if 10% customers live in the city of Someville, the rows matching the criteria city='SOMEVILLE' are located on adjacent pages, because they have phone numbers with the same beginning. The index on city has a high clustering factor. The Oracle/DB2 optimizer will choose to access the table via the index on city, and it will be more efficient than a table scan - almost 90% of data pages will not be read.
That explains why I haven't heard of it. SQL Server performs no such
deliberations, as far as I know.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
Erland Sommarskog wrote: That explains why I haven't heard of it. SQL Server performs no such deliberations, as far as I know.
Well if on the leaf level of a non clustered index you've got an RID, a
structure looking like (extent#, page#, row_on_page#), and you know
that matching rows are likely to be stored on adjacent pages, you can
traverse a range of the index, sort RIDs of matching rows, and read
every page with matching rows only once. This is why clustering factor
is a very useful piece of statistics for DB2/Oracle.
If on the leaf level of a non clustered index you've got a bookmark,
then you just know absolutely nothing about the physical location of
the rows you need to retrieve. I don't see any way SQL Server could
utilize clustering factor as long as bookmarks are used to access rows
from non-clustered indexes (unless it's a heap table, which is not the
best practice). Please correct me if I'm wrong.
Erland,
I think one more scenario for select * views is when a view is used as
a security tool. Suppose I expose a whole table to a user
grant select on sales to johnsmith
If later on I add a column to sales table, it is visible to johnsmith.
Suppose I expose to danbrown only his own sales via a view:
create view persons_own_sales
as
select * from sales where ...
grant select on persons_own_sales to danbrown
Again, If later on I add a column to sales table, I want it to be
visible to danbrown with minimum maintenance - just refresh the view.
Makes sense?
Erland,
Thanks, I tested it and indeed I did not notice a difference in using
sp_recompile or not.
And I see also the difference in naming the columns instead of using the
asterix. It seems to me the binding is different. With the asterix, the
binding is by column number instead of by name.
mmmm
"Erland Sommarskog" <es****@sommarskog.se> wrote in message
news:Xn**********************@127.0.0.1... Peter (so*****@someplace.com) writes: CREATE TABLE dbo.t1 (x INT NOT NULL PRIMARY KEY, z1 INT NULL); GO CREATE VIEW dbo.v1 AS SELECT * FROM dbo.t1 GO ALTER TABLE dbo.t1 DROP COLUMN z1 ; ALTER TABLE dbo.t1 ADD z2 INT ; GO EXEC dbo.sp_recompile 'dbo.v1' ; /* Notice that the second column still exists as Z1 in the view */ SELECT x,z1 FROM dbo.v1 ;
O that is bad!!! I gonne test that immidiatly tuesday. Is it different when naming the columns?
Yes, then you get an error when you try to access the view.
Then again, what David's example really shows is that you should use sp_refreshview when you've changed the underlying table, not sp_recompile. To wit, with sp_refreshview, v1 picks up the changed in columns.
-- Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
From this discussion I conclude:
1. Select * in a view is not a performance issue when using a current
versions of SQL Server.
2. Select * in a view has benefits and drawbacks in a maintenance aspect.
Performance
I do not believe it is a big change Microsoft will change this behavior.
But, there is always a change.
Maintenance benefit
In the case I regular want the view to expose all columns of the table, I
don not have to name all the columns, but I can simply express what I want.
If the underlying table stucture is changed, I simple run exec
sp_refreshview and everything is ok again.
Maintenance drawback
If the underlying table structure is changed: columns are added and columns
are deleted but the count of column keeps the same, I do not get a error
message when I select the view and there is a change I get wrong results
without noticing it in time. I need to be very disciplinic to always use
sp_refreshview on all views that do a select * on the changed table.
For me, the drawback is a real issue, I like robust code. I also like
maintenance simplicity. I need to make a choice.
Thanks to all, I know what is the impact of this choice.
Unless someone has to add some new point in this discussion :-)
Regarding this one point below...
If you always specify the column names then you will get an error instead of
getting bad data. It will take only a moment to add or remove a column from
your view to correct the error, which you will find immediately because the
code will not run.
It will take much longer to fix problems caused by bad data that is not
discovered for a month.
IMHO, you are much, much better off having the entire application stop
running while you fix a column name that you overlooked, rather than have
the application continue running and corrupt all of your data.
"Peter" <so*****@someplace.com> wrote in message
news:2b**************************@news.chello.nl.. . Maintenance drawback If the underlying table structure is changed: columns are added and
columns are deleted but the count of column keeps the same, I do not get a error message when I select the view and there is a change I get wrong results without noticing it in time.
Alexander Kuznetsov (AK************@hotmail.COM) writes: I think one more scenario for select * views is when a view is used as a security tool. Suppose I expose a whole table to a user
grant select on sales to johnsmith
If later on I add a column to sales table, it is visible to johnsmith. Suppose I expose to danbrown only his own sales via a view: create view persons_own_sales as select * from sales where ...
grant select on persons_own_sales to danbrown
Again, If later on I add a column to sales table, I want it to be visible to danbrown with minimum maintenance - just refresh the view. Makes sense?
Yes, I think that I mentioned this in previous post in the thread.
Then again, it may be the case that you don't want to expose the
columns that controls access.
Also, in this case, the maintenance problem is smaller as there is
on one SELECT on the view. (I'm assuming the view has something like
WHERE user = SYSTEM_USER.) With a partitioned view, you have a
repetition with more room for errors.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
(excessive crossposting limited to two of the foour groups)
On Tue, 18 Apr 2006 17:57:02 +0200, Peter wrote: From this discussion I conclude:
(snip)Thanks to all, I know what is the impact of this choice.
Unless someone has to add some new point in this discussion :-)
Hi Peter,
The point I'm missing in your summary is the ease of impact analysis.
If you use SELECT * anywhere (other than an EXISTS subquery), you lose
the possibility to do an impact analysis of a proposed change by
searching the source code (CREATE scripts) of your database.
--
Hugo Kornelis, SQL Server MVP
Hugo Kornelis (hu**@perFact.REMOVETHIS.info.INVALID) writes: The point I'm missing in your summary is the ease of impact analysis.
If you use SELECT * anywhere (other than an EXISTS subquery), you lose the possibility to do an impact analysis of a proposed change by searching the source code (CREATE scripts) of your database.
A good point, but as Peter seems to using the views as a sort of
synonyms, I think he is covered on that point. That is, when you
work with his system, you need to know that the views mirror the
tables in some way.
But in the general case, you are perfectly right.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se
Books Online for SQL Server 2005 at http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinf...ons/books.mspx
Alexander Kuznetsov wrote: Erland Sommarskog wrote: That explains why I haven't heard of it. SQL Server performs no such deliberations, as far as I know.
Well if on the leaf level of a non clustered index you've got an RID, a structure looking like (extent#, page#, row_on_page#), and you know that matching rows are likely to be stored on adjacent pages, you can traverse a range of the index, sort RIDs of matching rows, and read every page with matching rows only once. This is why clustering factor is a very useful piece of statistics for DB2/Oracle.
If on the leaf level of a non clustered index you've got a bookmark, then you just know absolutely nothing about the physical location of the rows you need to retrieve. I don't see any way SQL Server could utilize clustering factor as long as bookmarks are used to access rows from non-clustered indexes (unless it's a heap table, which is not the best practice). Please correct me if I'm wrong.
The bookmark does give (some) information about the physical location!
In some situations, sorting on the clustered index key could be of use.
But more information would be need to be used. The rows are physically
stored in the order of the clustered index key. So statistics of the
clustered index can determine the likelyhood that two consecutive
clustered keys (found in the nonclustered index) can be found on the
same page. This approach would guarantee a maximum of one read per page.
Without such a mechanism each bookmark lookup could lead to a physical
I/O, even if the average numbers of rows per page is much higher. This
could occur if there is not enough memory to hold the data pages in
cache, and the bookmark lookups are done out of (clustered index) order.
But even then, it is just an estimate. The index statistics do not give
information about the density of individual pages, and two consecutive
rows could still be on two different pages. So there is a large margin
of error which makes the tradeoff between 'random' bookmark lookups,
sorted bookmark lookups and clustered index scan (with the advantage of
sequential I/O over random I/O) difficult.
Gert-Jan This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Carmine |
last post by:
Is there anyway to execute this select statement with dbi?
select
'grant select on ' concat substr(viewschema,1,7) concat '.' concat
substr(viewname,1,18) concat
' to ' concat dbc_xid_user_id...
|
by: Rowland Hills |
last post by:
I have a table which is returning inconsistent results when I query
it!
In query analyzer:
If I do "SELECT * FROM TABLE_NAME" I get no rows returned.
If I do "SELECT COL1, COL2 FROM...
|
by: RCS |
last post by:
I've been running into more and more complexity with an application, because
as time goes on - we need more and more high-level, rolled-up information.
And so I've created views, and views that use...
|
by: CharlesEF |
last post by:
Hi All,
I have run into another problem that is eating my lunch. Should be
simple but I am having one heck of a time. Please look at this SELECT
statement:
SELECT FROM States WHERE ] =...
|
by: serge |
last post by:
Using "SELECT * " is a bad practice even
when using a VIEW instead of a table?
I have some stored procedures that are
identical with the difference of one statement
in the WHERE clause. If I...
|
by: Paul Reddin |
last post by:
Hi,
using ROWNUMBER() is affecting the plan of a view very badly...
is there a way of writing the following view to ensure rownumber()
is done as the last thing done?
i.e after the calling...
|
by: serge |
last post by:
I was working on figuring out where a certain application was
storing the multiple selection choices I was doing through the app.
I finally figured out that they were being store in an IMAGE
data...
|
by: Eitan |
last post by:
Hello,
I want a solutions for a compicateds sql select statments.
The selects can use anything : views, stored procedures, analytic functions,
etc...
(not use materialized view, unless it is...
|
by: Patrick.Laprise |
last post by:
Here's the thing.
We have a SQL Server with more than 25 Databases on it. Each database
is completely independant (but they all have the same structure) and
is use to manage different project....
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: erikbower65 |
last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps:
1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal.
2. Connect to...
|
by: linyimin |
last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Sept 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM)
The start time is equivalent to 19:00 (7PM) in Central...
|
by: DJRhino1175 |
last post by:
When I run this code I get an error, its Run-time error# 424 Object required...This is my first attempt at doing something like this. I test the entire code and it worked until I added this -
If...
|
by: DJRhino |
last post by:
Private Sub CboDrawingID_BeforeUpdate(Cancel As Integer)
If = 310029923 Or 310030138 Or 310030152 Or 310030346 Or 310030348 Or _
310030356 Or 310030359 Or 310030362 Or...
|
by: lllomh |
last post by:
Define the method first
this.state = {
buttonBackgroundColor: 'green',
isBlinking: false, // A new status is added to identify whether the button is blinking or not
}
autoStart=()=>{
|
by: lllomh |
last post by:
How does React native implement an English player?
|
by: Mushico |
last post by:
How to calculate date of retirement from date of birth
| |