How does DB2's estimate size tool calculate avg row size when the table is empty?

sql-db2-dba

Does DB2 just fudge it when it is an empty table? Is there a "formula"
for average row size when you have variable length records. Or you
really have to know what your application is packing into those
varchar columns.

Bill Leung
le****@aptea.com

Nov 12 '05 #1

Subscribe Post Reply

19909

Blair Adamache

Until you've run runstats, most of the stats will = -1 - which is saying
DB2 does not know. So, no fudging or guessing - the optimizer just
proceeds on assuming that the data is unavailable. Example: cardinality
(number of rows in the table). For a new table, DB2 could assume 0 or 1
- but the -1 value for cardinality says DB2 does not know - maybe the
table was created with an import statement that populated the table with
10,000 rows.

sql-db2-dba wrote:

Does DB2 just fudge it when it is an empty table? Is there a "formula"
for average row size when you have variable length records. Or you
really have to know what your application is packing into those
varchar columns.

Bill Leung
le****@aptea.com

Nov 12 '05 #2

Amit

For most datatypes, this is calcuated by adding AVGCOLLEN in
syscat.columns (which is updated when you RUNSTATS). Long columns
space usage is based on the defined size of long columns. And on top
of everything else a 10 byte row overhead is added. So even though the
table may be empty, average row length will have some value.
The query below calculates the avg row length for my SQL version of
reorgchk - I use it in a common table expression in my SQL version of
REORGCHK, the results it produces match those from reorgchk in DB2 v8
fp 4.
Btw, I've noticed in the past that some of the calculations for
REORGCHK output are done differently from size estimator...looks like
the two tools were developed independent of each other

SELECT TABSCHEMA, TABNAME,
CASE
WHEN MIN (
CASE
WHEN A.LOGGED<>''
THEN 0
ELSE A.COLCARD
END) >= 0
THEN (SUM(
CASE
WHEN A.LOGGED = ''
THEN A.AVGCOLLEN
WHEN A.LENGTH <= 1024
THEN 72
WHEN A.LENGTH <= 8192
THEN 96
WHEN A.LENGTH <= 65536
THEN 120
WHEN A.LENGTH <= 524000
THEN 144
WHEN A.LENGTH <= 4190000
THEN 168
WHEN A.LENGTH <= 134000000
THEN 200
WHEN A.LENGTH <= 536000000
THEN 224
WHEN A.LENGTH <= 1070000000
THEN 256
WHEN A.LENGTH <= 1470000000
THEN 280
WHEN A.LENGTH <= 2147483647
THEN 316
END +
CASE
WHEN LOGGED <> ''
AND NULLS = 'Y'
THEN 1
ELSE 0
END ) + 10 )
ELSE -1
END AS AVGROWLEN
FROM SYSCAT.COLUMNS A
GROUP BY TABSCHEMA, TABNAME

Nov 12 '05 #3

Pierre Saint-Jacques

Doen't the optimizer use "guesstimates" in this case?
If you run an explain on select * from tablename, if you have no data and no
stats, visual explain will show you default values that it calculates.
These are not stored anywhere but generated when you explain as the
optimizer has nothing to work with. So I would guess in this case it does
some fudging because, when looking at the output of show statistics, there
are values!!
Does it make sense, Pierre.

"Blair Adamache" <ba*******@2muchspam.yahoo.com> a écrit dans le message de
news:c0**********@hanover.torolab.ibm.com...

Until you've run runstats, most of the stats will = -1 - which is saying
DB2 does not know. So, no fudging or guessing - the optimizer just
proceeds on assuming that the data is unavailable. Example: cardinality
(number of rows in the table). For a new table, DB2 could assume 0 or 1
- but the -1 value for cardinality says DB2 does not know - maybe the
table was created with an import statement that populated the table with
10,000 rows.

sql-db2-dba wrote:
Does DB2 just fudge it when it is an empty table? Is there a "formula"
for average row size when you have variable length records. Or you
really have to know what your application is packing into those
varchar columns.

Bill Leung
le****@aptea.com

Nov 12 '05 #4

sql-db2-dba

How did you come up with those numbers in your case statement for long
columns? I have a table with a CLOB ( 1M ) column. If I do a
AVG(LENGTH(CLOB_COL)) I get about 2000. But if I were to use your
formula, it would be 168.

Bill Leung
le****@aptea.com

aa***@hotmail.com (Amit) wrote in message news:<85**************************@posting.google. com>...

For most datatypes, this is calcuated by adding AVGCOLLEN in
syscat.columns (which is updated when you RUNSTATS). Long columns
space usage is based on the defined size of long columns. And on top
of everything else a 10 byte row overhead is added. So even though the
table may be empty, average row length will have some value.
The query below calculates the avg row length for my SQL version of
reorgchk - I use it in a common table expression in my SQL version of
REORGCHK, the results it produces match those from reorgchk in DB2 v8
fp 4.
Btw, I've noticed in the past that some of the calculations for
REORGCHK output are done differently from size estimator...looks like
the two tools were developed independent of each other

SELECT TABSCHEMA, TABNAME,
CASE
WHEN MIN (
CASE
WHEN A.LOGGED<>''
THEN 0
ELSE A.COLCARD
END) >= 0
THEN (SUM(
CASE
WHEN A.LOGGED = ''
THEN A.AVGCOLLEN
WHEN A.LENGTH <= 1024
THEN 72
WHEN A.LENGTH <= 8192
THEN 96
WHEN A.LENGTH <= 65536
THEN 120
WHEN A.LENGTH <= 524000
THEN 144
WHEN A.LENGTH <= 4190000
THEN 168
WHEN A.LENGTH <= 134000000
THEN 200
WHEN A.LENGTH <= 536000000
THEN 224
WHEN A.LENGTH <= 1070000000
THEN 256
WHEN A.LENGTH <= 1470000000
THEN 280
WHEN A.LENGTH <= 2147483647
THEN 316
END +
CASE
WHEN LOGGED <> ''
AND NULLS = 'Y'
THEN 1
ELSE 0
END ) + 10 )
ELSE -1
END AS AVGROWLEN
FROM SYSCAT.COLUMNS A
GROUP BY TABSCHEMA, TABNAME

Nov 12 '05 #5

Amit

The table only stores the descriptors for LOB columns. These
descriptors point to the location of the LOB data. The size of the row
includes the size of the descriptor and not the actual LOB data. The
Descriptor sizes are given in DB2 SQL Reference under CREATE TABLE
section.
Maximum LOB Length LOB Descriptor Size
1 024 72
8 192 96
65 536 120
524 000 144
4 190 000 168
134 000 000 200
536 000 000 224
1 070 000 000 256
1 470 000 000 280
2 147 483 647 316

le****@aptea.com (sql-db2-dba) wrote in message news:<c8************************@posting.google.co m>...

How did you come up with those numbers in your case statement for long
columns? I have a table with a CLOB ( 1M ) column. If I do a
AVG(LENGTH(CLOB_COL)) I get about 2000. But if I were to use your
formula, it would be 168.

Bill Leung
le****@aptea.com

aa***@hotmail.com (Amit) wrote in message news:<85**************************@posting.google. com>...
For most datatypes, this is calcuated by adding AVGCOLLEN in
syscat.columns (which is updated when you RUNSTATS). Long columns
space usage is based on the defined size of long columns. And on top
of everything else a 10 byte row overhead is added. So even though the
table may be empty, average row length will have some value.
The query below calculates the avg row length for my SQL version of
reorgchk - I use it in a common table expression in my SQL version of
REORGCHK, the results it produces match those from reorgchk in DB2 v8
fp 4.
Btw, I've noticed in the past that some of the calculations for
REORGCHK output are done differently from size estimator...looks like
the two tools were developed independent of each other

SELECT TABSCHEMA, TABNAME,
CASE
WHEN MIN (
CASE
WHEN A.LOGGED<>''
THEN 0
ELSE A.COLCARD
END) >= 0
THEN (SUM(
CASE
WHEN A.LOGGED = ''
THEN A.AVGCOLLEN
WHEN A.LENGTH <= 1024
THEN 72
WHEN A.LENGTH <= 8192
THEN 96
WHEN A.LENGTH <= 65536
THEN 120
WHEN A.LENGTH <= 524000
THEN 144
WHEN A.LENGTH <= 4190000
THEN 168
WHEN A.LENGTH <= 134000000
THEN 200
WHEN A.LENGTH <= 536000000
THEN 224
WHEN A.LENGTH <= 1070000000
THEN 256
WHEN A.LENGTH <= 1470000000
THEN 280
WHEN A.LENGTH <= 2147483647
THEN 316
END +
CASE
WHEN LOGGED <> ''
AND NULLS = 'Y'
THEN 1
ELSE 0
END ) + 10 )
ELSE -1
END AS AVGROWLEN
FROM SYSCAT.COLUMNS A
GROUP BY TABSCHEMA, TABNAME

Nov 12 '05 #6

sql-db2-dba

Thanks Amit.

Not to beat this to a dead horse, but IBM probably should have left it
to be minus 1 or 0 when there is zero row in the table. I have
descibed my CLOB table below. To use Amit's calculation, it would
yeild a AVGROLLEN of 210 or 211 (13+4+168+10)+10. Control Center's
estimator gives 193. I am beginning to think that it might have
factored in the "Nulls" attribute. Perhaps nullable columns would
account for 80% or 90% of the allowable col-length. Unless Toronto
developers divulge their formula, I am not convinced.
$ db2 "describe table TNX_DOC"

Column Type
name name Length Scale Nulls
--------------------------- ------------------ -------- ----- ------
TNX_ID VARCHAR 13 0 No
TNX_DOC_CNT INTEGER 4 0 No
TNX_PY_CLB CLOB 1048576 0 Yes
TNX_INS_DTTIME TIMESTAMP 10 0 Yes

4 record(s) selected.

$ db2 "select SUBSTR(TABSCHEMA,1,8) as tabschema,
SUBSTR(TABNAME,1,16) as tabname, SUBSTR(COLNAME,1,16) as colname,
LOGGED, AVGCOLLEN from SYSCAT.COLUMNS WHERE TABSCHEMA='LDEV' and
TABNAME='TNX_DOC'" | pg

TABSCHEMA TABNAME COLNAME LOGGED AVGCOLLEN
--------- ------------ ---------------- ------ ---------
LDEV TNX_DOC TNX_ID 17
LDEV TNX_DOC TNX_INS_DTTIME 11
LDEV TNX_DOC TNX_PY_CLB Y -1
LDEV TNX_DOC TNX_DOC_CNT 4

4 record(s) selected.

aa***@hotmail.com (Amit) wrote in message news:<85**************************@posting.google. com>...

The table only stores the descriptors for LOB columns. These
descriptors point to the location of the LOB data. The size of the row
includes the size of the descriptor and not the actual LOB data. The
Descriptor sizes are given in DB2 SQL Reference under CREATE TABLE
section.
Maximum LOB Length LOB Descriptor Size
1 024 72
8 192 96
65 536 120
524 000 144
4 190 000 168
134 000 000 200
536 000 000 224
1 070 000 000 256
1 470 000 000 280
2 147 483 647 316

le****@aptea.com (sql-db2-dba) wrote in message news:<c8************************@posting.google.co m>...
How did you come up with those numbers in your case statement for long
columns? I have a table with a CLOB ( 1M ) column. If I do a
AVG(LENGTH(CLOB_COL)) I get about 2000. But if I were to use your
formula, it would be 168.

Bill Leung
le****@aptea.com

aa***@hotmail.com (Amit) wrote in message news:<85**************************@posting.google. com>...
For most datatypes, this is calcuated by adding AVGCOLLEN in
syscat.columns (which is updated when you RUNSTATS). Long columns
space usage is based on the defined size of long columns. And on top
of everything else a 10 byte row overhead is added. So even though the
table may be empty, average row length will have some value.
The query below calculates the avg row length for my SQL version of
reorgchk - I use it in a common table expression in my SQL version of
REORGCHK, the results it produces match those from reorgchk in DB2 v8
fp 4.
Btw, I've noticed in the past that some of the calculations for
REORGCHK output are done differently from size estimator...looks like
the two tools were developed independent of each other

SELECT TABSCHEMA, TABNAME,
CASE
WHEN MIN (
CASE
WHEN A.LOGGED<>''
THEN 0
ELSE A.COLCARD
END) >= 0
THEN (SUM(
CASE
WHEN A.LOGGED = ''
THEN A.AVGCOLLEN
WHEN A.LENGTH <= 1024
THEN 72
WHEN A.LENGTH <= 8192
THEN 96
WHEN A.LENGTH <= 65536
THEN 120
WHEN A.LENGTH <= 524000
THEN 144
WHEN A.LENGTH <= 4190000
THEN 168
WHEN A.LENGTH <= 134000000
THEN 200
WHEN A.LENGTH <= 536000000
THEN 224
WHEN A.LENGTH <= 1070000000
THEN 256
WHEN A.LENGTH <= 1470000000
THEN 280
WHEN A.LENGTH <= 2147483647
THEN 316
END +
CASE
WHEN LOGGED <> ''
AND NULLS = 'Y'
THEN 1
ELSE 0
END ) + 10 )
ELSE -1
END AS AVGROWLEN
FROM SYSCAT.COLUMNS A
GROUP BY TABSCHEMA, TABNAME

Nov 12 '05 #7

Amit

For your example, DB2 RUNSTATS sets the avg col lengths to 4,4,-1,11
even when the table is empty. My query is based on avg col len and
will yield 198 ( 5 more than control center).
My query is actually a subquery I use for doing REORGCHK via sql.
Using the results the avg row length query gives, I can mimic the
results of TSIZE(estimated table size = avg row length * cardinality)
in DB2 REORGCHK utility. I have no clue why IBM chose to use different
formulas for Control Center and REORGCHK. I used the REORGCHK formula
since it is documented.
le****@aptea.com (sql-db2-dba) wrote in message news:<c8*************************@posting.google.c om>...

Thanks Amit.

Not to beat this to a dead horse, but IBM probably should have left it
to be minus 1 or 0 when there is zero row in the table. I have
descibed my CLOB table below. To use Amit's calculation, it would
yeild a AVGROLLEN of 210 or 211 (13+4+168+10)+10. Control Center's
estimator gives 193. I am beginning to think that it might havereorgchk
factored in the "Nulls" attribute. Perhaps nullable columns would
account for 80% or 90% of the allowable col-length. Unless Toronto
developers divulge their formula, I am not convinced.
$ db2 "describe table TNX_DOC"

Column Type
name name Length Scale Nulls
--------------------------- ------------------ -------- ----- ------
TNX_ID VARCHAR 13 0 No
TNX_DOC_CNT INTEGER 4 0 No
TNX_PY_CLB CLOB 1048576 0 Yes
TNX_INS_DTTIME TIMESTAMP 10 0 Yes

4 record(s) selected.

$ db2 "select SUBSTR(TABSCHEMA,1,8) as tabschema,
SUBSTR(TABNAME,1,16) as tabname, SUBSTR(COLNAME,1,16) as colname,
LOGGED, AVGCOLLEN from SYSCAT.COLUMNS WHERE TABSCHEMA='LDEV' and
TABNAME='TNX_DOC'" | pg

TABSCHEMA TABNAME COLNAME LOGGED AVGCOLLEN
--------- ------------ ---------------- ------ ---------
LDEV TNX_DOC TNX_ID 17
LDEV TNX_DOC TNX_INS_DTTIME 11
LDEV TNX_DOC TNX_PY_CLB Y -1
LDEV TNX_DOC TNX_DOC_CNT 4

4 record(s) selected.

aa***@hotmail.com (Amit) wrote in message news:<85**************************@posting.google. com>...
The table only stores the descriptors for LOB columns. These
descriptors point to the location of the LOB data. The size of the row
includes the size of the descriptor and not the actual LOB data. The
Descriptor sizes are given in DB2 SQL Reference under CREATE TABLE
section.
Maximum LOB Length LOB Descriptor Size
1 024 72
8 192 96
65 536 120
524 000 144
4 190 000 168
134 000 000 200
536 000 000 224
1 070 000 000 256
1 470 000 000 280
2 147 483 647 316

le****@aptea.com (sql-db2-dba) wrote in message news:<c8************************@posting.google.co m>...
How did you come up with those numbers in your case statement for long
columns? I have a table with a CLOB ( 1M ) column. If I do a
AVG(LENGTH(CLOB_COL)) I get about 2000. But if I were to use your
formula, it would be 168.

Bill Leung
le****@aptea.com

aa***@hotmail.com (Amit) wrote in message news:<85**************************@posting.google. com>...
> For most datatypes, this is calcuated by adding AVGCOLLEN in
> syscat.columns (which is updated when you RUNSTATS). Long columns
> space usage is based on the defined size of long columns. And on top
> of everything else a 10 byte row overhead is added. So even though the
> table may be empty, average row length will have some value.
> The query below calculates the avg row length for my SQL version of
> reorgchk - I use it in a common table expression in my SQL version of
> REORGCHK, the results it produces match those from reorgchk in DB2 v8
> fp 4.
> Btw, I've noticed in the past that some of the calculations for
> REORGCHK output are done differently from size estimator...looks like
> the two tools were developed independent of each other
>
> SELECT TABSCHEMA, TABNAME,
> CASE
> WHEN MIN (
> CASE
> WHEN A.LOGGED<>''
> THEN 0
> ELSE A.COLCARD
> END) >= 0
> THEN (SUM(
> CASE
> WHEN A.LOGGED = ''
> THEN A.AVGCOLLEN
> WHEN A.LENGTH <= 1024
> THEN 72
> WHEN A.LENGTH <= 8192
> THEN 96
> WHEN A.LENGTH <= 65536
> THEN 120
> WHEN A.LENGTH <= 524000
> THEN 144
> WHEN A.LENGTH <= 4190000
> THEN 168
> WHEN A.LENGTH <= 134000000
> THEN 200
> WHEN A.LENGTH <= 536000000
> THEN 224
> WHEN A.LENGTH <= 1070000000
> THEN 256
> WHEN A.LENGTH <= 1470000000
> THEN 280
> WHEN A.LENGTH <= 2147483647
> THEN 316
> END +
> CASE
> WHEN LOGGED <> ''
> AND NULLS = 'Y'
> THEN 1
> ELSE 0
> END ) + 10 )
> ELSE -1
> END AS AVGROWLEN
> FROM SYSCAT.COLUMNS A
> GROUP BY TABSCHEMA, TABNAME

Nov 12 '05 #8

Similar topics

Buffer size & Prebuffering size?

by: Skyhorse | last post by:

In fact, this is not a programming problem, but I think I can seek help here Sometimes I saw terms about multimedia such as 'prebuffering', 'buffer size' , but I don't know their actual meaning....

C / C++

Does DB2 support inline views?

by: chris | last post by:

Does DB2 support inline views, or subqueries in the FROM clause? If so what is the syntax?

DB2 Database

Table size and index size

by: dunleav1 | last post by:

I am trying to write a sql script to estimate size and count of rows of all tables and indexes within a schema. Here's what I have so far but need a little help. #$1 is database name and $2 is...

DB2 Database

Which tool to use when creating/compiling procedures?

by: serge | last post by:

Is it good enough to use IBM DB2 8.2 SQL Development Center to create and compile procedures? Or a tool like Embarcadero's Rapid SQL is better? If it is better, would you know why exactly? ...

DB2 Database

How many tcp ports does DB2 use?

by: Hemant Shah | last post by:

Folks, How many tcp ports does DB2 server use? I installed UDB 8.2 on a AIX system and created several instances. I used consecutive ports for each instance, but when I look at /etc/services...

DB2 Database

Does DB2 UDB v8 come with a compare tool?

by: mjf | last post by:

I need to come up with a tool to compare the table definition and/or its data before and after some modification to the DDL or some tests (that change the data). Does DB2 come with such a tool, or...

DB2 Database

Current and Estimate Table Data Size and Index Size for DB2

by: stabbert | last post by:

We are on DB2 UDB 8.2.2 on AIX. I know this question has been asked many times on the ng but I am just not finding a real good answer. I need to be able to not only determine existing table data...

DB2 Database

calculate size of table with BLOB

by: Yaro | last post by:

Hi, I am looking for information how calculate size of table with BLOB field. After RUNSTATS command select fpages * tablespace pagesize - gives me only 16kB (table contain about 10M) I...

DB2 Database

Does db2 agents only get what it need?

by: Woody Ling | last post by:

>From the DB2 UDB memory model document, I found the following info: ============================== The agent private memory set consist of the following memory pools. Their sizes are specified...

DB2 Database

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice