469,090 Members | 1,196 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,090 developers. It's quick & easy.

Increasing performance by selecting one table

Hello all,

I've following problem. Please forgive me not posting script, but I
think it won't help anyway.

I've a table, which is quite big (over 5 milions records). Now, this
table contains one field (varchar[100]), which contains some data in
the chain.

Now, there is a view on this table, to present the data to user. The
problem is, in this view need to be displayed some data from this one
large field (using substring function or inline function returning
value).

User in the application, is able to filter and sort threw this fields.
Now, the whole situation starts to be more complicated, if I would like
combine this table, with another one, where is one additional much mor
larger field, from which I need to select data in the same way.

Problem is: it takes TO LONG to select the data according to user
request (user access view, not table direct)

Now the question:
- using this substring (as in example) is agood solution, or beter to
do a inline function which will return me the part of this dataset
(probably there is no difference)
- will it be much faster, if i could add some fields in to
Source_Table, containing also varchar data, but only this part which
I'm interested in and binde these fields in view instead off using
substring function?

Small example:

CREATE TABLE [dbo].[Source_Table] (
[CID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[MSrepl_tran_version] uniqueidentifier ROWGUIDCOL NULL ,
[Date_Event] [datetime] NOT NULL ,
[mama_id] [varchar] (6) COLLATE Latin1_General_CI_AS NOT NULL ,
[mama_type] [varchar] (4) COLLATE Latin1_General_CI_AS NULL ,
[tata_id] [varchar] (4) COLLATE Latin1_General_CI_AS NOT NULL ,
[tata_type] [varchar] (2) COLLATE Latin1_General_CI_AS NULL ,
[loc_id] [nvarchar] (64) COLLATE Latin1_General_CI_AS NOT NULL ,
[sn_no] [smallint] NOT NULL ,
[tel_type] [smallint] NULL ,
[loc_status] [smallint] NULL ,
[sq_break] [bit] NULL ,
[cmpl_data] [varchar] (100) COLLATE Latin1_General_CI_AS NOT NULL ,
[fk_cmpl_erp_data] [numeric](18, 0) NULL ,
[erp_dynia] [bigint] NULL
) ON [PRIMARY]
GO

create view VIEW_AllData
as
select top 100 percent
isnull(substring(RODZ.cmpl_data,27,10),'-') as ASO_NO,
(RODZ.mama_type + RODZ.mama_Id) as MAMA,
isnull(substring(RODZ.cmpl_data,45,5),'-') as MI,
isnull(substring(RODZ.cmpl_data,57,3),'-') as ctl_EC,
isnull(substring(RODZ.cmpl_data,60,3),'-') as ctl_IC,
RODZ.Date_Event as time_time,
RODZ.sn_no as SN
FROM
Source_Table RODZ with (nolock)
go


Thanks in advance

Mateusz

Sep 6 '05 #1
6 1654
Hi

Just wondering why this is not stored as the separate fields if that is how
it is to be accessed?

John

"Matik" <ma****@sauron.xo.pl> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
Hello all,

I've following problem. Please forgive me not posting script, but I
think it won't help anyway.

I've a table, which is quite big (over 5 milions records). Now, this
table contains one field (varchar[100]), which contains some data in
the chain.

Now, there is a view on this table, to present the data to user. The
problem is, in this view need to be displayed some data from this one
large field (using substring function or inline function returning
value).

User in the application, is able to filter and sort threw this fields.
Now, the whole situation starts to be more complicated, if I would like
combine this table, with another one, where is one additional much mor
larger field, from which I need to select data in the same way.

Problem is: it takes TO LONG to select the data according to user
request (user access view, not table direct)

Now the question:
- using this substring (as in example) is agood solution, or beter to
do a inline function which will return me the part of this dataset
(probably there is no difference)
- will it be much faster, if i could add some fields in to
Source_Table, containing also varchar data, but only this part which
I'm interested in and binde these fields in view instead off using
substring function?

Small example:

CREATE TABLE [dbo].[Source_Table] (
[CID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[MSrepl_tran_version] uniqueidentifier ROWGUIDCOL NULL ,
[Date_Event] [datetime] NOT NULL ,
[mama_id] [varchar] (6) COLLATE Latin1_General_CI_AS NOT NULL ,
[mama_type] [varchar] (4) COLLATE Latin1_General_CI_AS NULL ,
[tata_id] [varchar] (4) COLLATE Latin1_General_CI_AS NOT NULL ,
[tata_type] [varchar] (2) COLLATE Latin1_General_CI_AS NULL ,
[loc_id] [nvarchar] (64) COLLATE Latin1_General_CI_AS NOT NULL ,
[sn_no] [smallint] NOT NULL ,
[tel_type] [smallint] NULL ,
[loc_status] [smallint] NULL ,
[sq_break] [bit] NULL ,
[cmpl_data] [varchar] (100) COLLATE Latin1_General_CI_AS NOT NULL ,
[fk_cmpl_erp_data] [numeric](18, 0) NULL ,
[erp_dynia] [bigint] NULL
) ON [PRIMARY]
GO

create view VIEW_AllData
as
select top 100 percent
isnull(substring(RODZ.cmpl_data,27,10),'-') as ASO_NO,
(RODZ.mama_type + RODZ.mama_Id) as MAMA,
isnull(substring(RODZ.cmpl_data,45,5),'-') as MI,
isnull(substring(RODZ.cmpl_data,57,3),'-') as ctl_EC,
isnull(substring(RODZ.cmpl_data,60,3),'-') as ctl_IC,
RODZ.Date_Event as time_time,
RODZ.sn_no as SN
FROM
Source_Table RODZ with (nolock)
go


Thanks in advance

Mateusz

Sep 6 '05 #2
On 6 Sep 2005 09:53:37 -0700, Matik wrote:

(snip)
Now the question:
- using this substring (as in example) is agood solution, or beter to
do a inline function which will return me the part of this dataset
(probably there is no difference)
- will it be much faster, if i could add some fields in to
Source_Table, containing also varchar data, but only this part which
I'm interested in and binde these fields in view instead off using
substring function?


Hi Mateusz,

I agree with John Bell. Store the individual data elements. That allows
you to create indexes on the columns that are frequently used in
searches. With your table + view, all searches will always result in a
table scan, which will take a looooong time.

If you want the concatenated data as well, add it as a computed column
(that way, it won't take additional storage, but the individual strings
will be concatenated when you read the table).

Example:

CREATE TABLE Mateusz
(Col1 varchar(10) NOT NULL,
Col2 varchar(20) NOT NULL,
Combined AS Col1 + Col2,
PRIMARY KEY (Col1, Col2)
)
go
INSERT INTO Mateusz (Col1, Col2)
VALUES ('Abc', 'deF')
go
SELECT * FROM Mateusz
go
DROP TABLE Mateusz
go

By the way: why do you have both an IDENTITY and a UNIQUEIDENTIFIER
column in your table, but no PRIMARY KEY? Do you really expect
quintillion rows in your table? Why do you add the unneeded "top 100
percent" to your view's defintion?

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Sep 6 '05 #3
Hello 2 all,

Thank You for response. Hugo, additional explenation: I posted just a
part of view acctualy, in normal, it uses at the end pre defined order
by. To use order by in a view, you need to specify a top :(

John: I was wondering also ... but the explenation was easy: because of
historical reasons ;) For me it is also more or less obvious, that
storing data in separate collumns (specialy by long strings, and lot's
of data) is much more efficiency, that concentrated (specialy by this
scans).

You're right, that I do not expect so much rows in my table, I don't
acctualy know, why they integrated this collumn, but I need to keep it
(i think, they are using it for replication).

Primary key is set up on CID collumn (is not clustered with fill factor
on 90%). I dont know why the script didn't included the statement :(

I think, your idea is great, I was thinking acctualy exactly in this
way, I just need to be sure, from someone with expirience.

Thank You very much

Mateusz

Sep 9 '05 #4
Hi

As you seem to have reached the limits of acceptability with your current
design, then maybe it is time to have a re-think and implement a better
structure. It sounds like this may have come from a flat file based system
and that is the history that is dragging you down! Migrating to separate
columns may not mean a single table. You can ease the backward compatibility
by creating a view to re-create the way the old table presented data, this
may mean that you don't have to change everything at once, although I would
make sure everything that inserts/updates the data is in the first tranche.

John

"Matik" <ma****@sauron.xo.pl> wrote in message
news:11**********************@o13g2000cwo.googlegr oups.com...
Hello 2 all,

Thank You for response. Hugo, additional explenation: I posted just a
part of view acctualy, in normal, it uses at the end pre defined order
by. To use order by in a view, you need to specify a top :(

John: I was wondering also ... but the explenation was easy: because of
historical reasons ;) For me it is also more or less obvious, that
storing data in separate collumns (specialy by long strings, and lot's
of data) is much more efficiency, that concentrated (specialy by this
scans).

You're right, that I do not expect so much rows in my table, I don't
acctualy know, why they integrated this collumn, but I need to keep it
(i think, they are using it for replication).

Primary key is set up on CID collumn (is not clustered with fill factor
on 90%). I dont know why the script didn't included the statement :(

I think, your idea is great, I was thinking acctualy exactly in this
way, I just need to be sure, from someone with expirience.

Thank You very much

Mateusz

Sep 9 '05 #5
On 8 Sep 2005 18:55:04 -0700, Matik wrote:
Hello 2 all,

Thank You for response. Hugo, additional explenation: I posted just a
part of view acctualy, in normal, it uses at the end pre defined order
by. To use order by in a view, you need to specify a top :(


Hi Matik,

And even then, it doesn't serve any practical use.

The rows in a view, like the rows in a table, are UNordered _by
definition_. The only way to guarantee that you retrieve rows in a
predefined order is by using an ORDER BY clause on the final SELECT
statement that actually retrieves the data.

In the SQL-92 ANSI standard, it's not permitted to use ORDER BY in a
view definition for exactly this reason. And this is carried over to the
behaviour of SQL Server - with only one exception: when you add a TOP
constraint (which is not defined in SQL-92 either), you can (no, SHOULD)
use an ORDER BY clause to define what ordering to use **for the
evaluation of the TOP clause**. This is meaningful when the TOP clause
specifies a fixed number, or something less than 100 percent ("this view
should only containt the 10 most expensive articles" - TOP 10 ... ORDER
BY Price).

If you use TOP 100 PERCENT ... ORDER BY ... in the definition of your
view, you'll almost certainly see the results in the desired order
everytime you test it. But be warned - there is no guarantee that you'll
never have a different order. And since Murphy's law says that anything
that CAN go wrong, WILL go wrong, be prepared to do some serious
debugging sometime in the middle of the night, just when you need your
sleep most.

Or fix your code now - add ORDER BY to the SELECT statements that need
to return data in a specific order, remove the ORDER BY and the TOP 100
PERCENT from your view definition and fix any code that appears to
depend on the order in which rows are returned or processed, even in the
absense of any ORDER BY.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Sep 9 '05 #6
Hugo - Murphy has right:)

Thanks

Matik

Oct 3 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by =?Utf-8?B?V2lsc29uIEMuSy4gTmc=?= | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by zhoujie | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.