CHAR to VARCHAR conversion in DB2(iSeries)

Background
Often while remodeling legacy application, one of the important tasks for the architects is to have an optimum usage of storage capabilities of database. Most of the legacy applications are constrained by the technology available at the time of their development and hence aren’t optimum as per current scenario. One of such cases is the extensive usage of CHAR fields, which aren’t optimum solution for space storage now. This paper will highlight the systematic approach which needs to be taken while converting CHAR data fields to VARCHAR in DB2 for iSeries.

Problem Statement
The basic constraint of the CHAR data type is its inflexibility to get optimally required space for storage. A CHAR(50) field will reserve 50 bytes in database, irrespective of the data content of the field. If most of the values of this field are less than length 35, then remaining 15 bytes are sheer wastage of space as it cannot be utilized by database otherwise. This will result in extra disk space.

Solution
In such a case, one of the most important and immediate option to save storage space is to let database store only relevant data without unnecessary trailing spaces. For this, architects simply convert CHAR fields to VARCHAR.

Is it really a solution?
Is simply converting all CHAR columns to VARCHAR is appropriate solution and if not then what is the optimum way to do this. This is the point where the real catch is, ignoring which can led to performance degradation, which will be hard to track at later stages.
Ideally it’s a two step process:
(a) Identifying the relevant columns for conversion: Changing all the columns blindly from CHAR to VARCHAR may result in huge effort as not only the table definition but related programs which are inserting/updating/deleting these columns needs to be changed accordingly. Therefore only those columns should be chosen where space saving is significant. Please refer section “Recommendation for architects” to select appropriate columns.
(b) Converting data definition of identified columns and related programs from CHAR to VARCHAR: This will require a little knowledge of how DB2 on iSeries implements storage of VARCHAR. Please refer section “Implementation of VARCHAR storage in DB2 (iSeries)” below.

Implementation of VARCHAR storage in DB2 (iSeries)
Variable-length column (VARCHAR) support allows you to define any number of columns in a table as variable length. If you use VARCHAR support, the size of a table can usually be reduced. Data in a variable-length column is stored internally in two areas: a fixed-length or ALLOCATE area and an overflow area. Behavior of ALLOCATE area is similar to the behavior of CHAR data type i.e. irrespective of the actual data length database reserves as much bytes for storage as specified in allocate area. Any data, over and above that length is stored in variable or overflow area. If a default value is specified, the allocated length is at least as large as the value.
Recommendation for architects
The following points help you determine the best way to use your storage area. If the primary goal is:
 Space saving: use simply VARCHAR data type (instead of CHAR) while defining table definition.
 Performance: Use VARCHAR with ALLOCATE clause. Create the table using the ALLOCATE keyword as
CREATE TABLE Tab (col1 VARCHAR(40) ALLOCATE(10),
Col2 VARCHAR(40) ALLOCATE(10),
COL3 VARCHAR(40) ALLOCATE(7))

In many applications, performance must be considered. If you use the default VARCHAR datatype i.e. ALLOCATE(0), it will double the disk unit traffic. ALLOCATE(0) requires two reads; one to read the fixed-length portion of the row and one to read the overflow space. The variable-length implementations, with the carefully chosen ALLOCATE, minimize overflow and space and maximizes performance.

It is possible to balance space savings and performance by using ALLOCATE clause properly. When you define a table with variable-length data, you must decide the width of the ALLOCATE area i.e. ALLOCATE area should be wide enough to incorporate at least 90% to 95% of the values for the column.

Now the question is how to get columns which needs to be converted from CHAR to VARCHAR and how to get the magical figure for the columns’ ALLOCATE clause? Because choosing irrelevant columns for conversion may lead to high efforts but fewer gains. Normally, columns which have very high variation in data length are most suitable for conversion. Columns like remarks, comments etc fall into this category and usually columns in this category have length 35 or above. Also, the corresponding table should have enough number of rows to justify the effort involved as space saving is directly proportional to the number of rows. Thus, to get the list of relevant columns and their corresponding tables, run the SQL script listed below:

SELECT a.table_name, a.column_name, a.data_type, a.length, b.number_rows
FROM qsys2.syscolumns a , qsys2.systablestat b
WHERE a.table_schema = ‘lib'
AND a.length >= 35
AND b.table_schema = a.table_schema
AND b.table_name = a.table_name
AND b.number_rows > 5000

Here, lib is the name of library where database table resides. This SQL will provide the list of all the relevant columns. Architects are recommended to review suitable value for a.length and b.number_rows according to the nature of their application. Now, we need to analyze the existing data and find the ALLOCATE figure, which should be appropriate to hold 90-95% of the data. For this, run this SQL in the concerned database, where col1 and tab1 is the column and its respective database table under consideration.

SELECT LENGTH(RTRIM(col1)) Column_Length, count(*) Count
FROM tab1
GROUP BY LENGTH(RTRIM(col1))
ORDER BYLENGTH(RTRIM(col1))

This SQL will provide column’s classification by their data length. Paste this information into an excel spread sheet and derive cumulative sum and related percentage. Attached below is the similar data for a column where number of rows in the parent table was 1626986 and the length of the column was defined as CHAR(50).

Column Length Count Cumm. Sum Percentage

null 997280 997280
0 581 997861 0.613318738
1 116 997977 0.613390035
2 188 998165 0.613505586
3 603 998768 0.61387621
4 961 999729 0.614466873
5 14 999743 0.614475478
6 2 999745 0.614476707
7 23 999768 0.614490844
8 14 999782 0.614499449
9 26 999808 0.614515429
10 2786 1002594 0.616227798
11 500 1003094 0.616535115
12 2454 1005548 0.618043425
13 4438 1009986 0.620771168
14 12623 1022609 0.628529686
15 21423 1044032 0.641696978
16 33377 1077409 0.662211599
17 43236 1120645 0.68878589
18 57116 1177761 0.723891293
19 59734 1237495 0.760605807
20 65324 1302819 0.800756122
21 64123 1366942 0.840168262
22 59600 1426542 0.876800415
23 49011 1475553 0.906924214
24 40671 1516224 0.931921971
25 30856 1547080 0.9508871
26 23382 1570462 0.96525846
27 17017 1587479 0.975717677
28 12170 1599649 0.983197766
29 8537 1608186 0.988444891
30 5674 1613860 0.991932321
31 4108 1617968 0.994457236
32 2778 1620746 0.996164687
33 1922 1622668 0.997346013
34 1315 1623983 0.998154256
35 1030 1625013 0.998787328
36 686 1625699 0.999208967
37 523 1626222 0.99953042
38 280 1626502 0.999702517
39 169 1626671 0.99980639
40 145 1626816 0.999895512
41 56 1626872 0.999929932
42 37 1626909 0.999952673
43 20 1626929 0.999964966
44 10 1626939 0.999971112
45 19 1626958 0.99998279
46 5 1626963 0.999985863
47 3 1626966 0.999987707
50 20 1626986 1

Almost 95% data is having length 25 or less than 25, therefore choosing ALLOCATE(25) will save around 35MB space with virtually no impact on performance. This can be roughly estimated as:
((Total_rows * percentile * saved_space) – (Total_rows * remaining percentile * average variable space per overflowed row)).
As the later half is not supposed to be significant, approximate value of the first half can provide rough estimate. Moreover, we need not to be very specific; any close estimate can aid our decision. For tables having millions of rows this saving may go up to even 300MB with just one column.
The result of this analysis will also aid in deciding whether this column is appropriate for conversion or not as less saving in storage indicates there is no use in spending effort for that column.

This type of analysis can easily be done for existing columns, but while designing new tables what should be this “ALLOCATE” size? For this, data architect initially has to choose an appropriate value according to the functionality of the field (initially 75% of the total length can be considered appropriate if there is no other input to aid the decision). After having considerable production data, data architect should check the performance of this field with the SQL listed below:

SELECT DOUBLE_PRECISION(overflow)/(number_rows)
FROM qsys2.systablestat
WHERE table_name = “tab1”
AND table_schema = “lib”

Any value below 5% is OK and anything above 5% means ALLOCATE length needs to be increased.

Possible impact on code
Apart from the table definition, programs depending upon these fields may also require to be changed in case host variables are used. If you are using host variables to insert or update variable-length columns, the host variables should be variable length. Because blanks are not truncated from fixed-length host variables, using fixed-length host variables would cause more rows to spill into the overflow space. This would increase the size of the table. In this example, fixed-length host variables are used to insert a row into a table:
01 COL1 PIC X(40). ...
MOVE "SMITH" TO COL1.
EXEC SQL INSERT INTO tab1 VALUES
(:COL1, :COL2, :COL3)
END-EXEC.
The host-variable COL1 is not variable length. The string “SMITH”, followed by 45 blanks, is inserted into the VARCHAR column COL1. The value is longer than the allocate size of 25. Twenty five of forty-five trailing blanks are in the overflow area. In this example, variable-length host variables are used to insert a row into a table:
01 VCol1.
49 Col1-LEN PIC S9(4) BINARY.
49 Col1-DATA PIC X(40).
...
MOVE "SMITH" TO Col1-DATA.
MOVE 5 TO Col1-LEN.
EXEC SQL INSERT INTO tab1 VALUES
(:VCol1, :VCol2, :VCol3)
END-EXEC.
The host variable VCol1 is variable length. The actual length of the data is set to 5. The value is shorter than the allocated length. It can be placed in the fixed portion of the column and thus having no performance overhead.

Jan 16 '08 #1

Subscribe Post Reply

27183

by: Jim Gerland | last post by:

When I access my DB2 table from a W2K machine running PHP using ODBC with the SQL statement: SELECT DOUBLE(BGROUP.PYHGPT.GPPRAM) I get: 2376.92 but when I do the same from my iSeries...

PHP

Data moving from iSeries to DB2 UDB

by: erkki | last post by:

Hi, I have to move iSeries library to new DB2 database. Is there any easier way to do this moving operation than exporting all tables in iSeries into text files and then import them into DB2...

DB2 Database

IBM.Data.DB2.iSeries database access problem

by: Yanik Reid | last post by:

When I run the following vb.net code, I'm not able to acces the my database AMFLIB. I use to be able to access AMFLIB with the old ODBC driver from IBM iSeries Access but now I want to use the...

DB2 Database

ERWin data modeler, iSeries DB2 reverse engineering error

by: Sulu's Beard | last post by:

Hey gang, I'm evaluating the CA ERWin product for an upcoming data warehousing project. I'm most excited about the reverse engineering aspect of this system. I've sucessfully tested it on a...

DB2 Database

I need lock a row on DB2 Iseries

by: ftapia | last post by:

hello friends. I´m working with DB2 on Iseries and i need lock a row for generate a sequence number. the first user must generate id = 1 , the sencond must generate id = 2 and so on. I use ...

DB2 Database

DB2 iSeries/AS400 vs. DB2 ESE Windows/Linux

by: chmmr | last post by:

Hello, I would like to ask if there are any differences between DB2 in these versions (e.g. support of triggers, Java stored procedures and such). Also, does anyone know from which DB2 AS/400...

DB2 Database

IBM.Data.DB2.iSeries for .NET

by: scase | last post by:

I need some trying to figure out what is happening on our AS/400. I am not an AS/400 programmer, but I am using ASP.NET (VB) and the IBM.Data.DB2.iSeries control for .NET to access our DB2...

DB2 Database

db2 iseries

by: padmagvs | last post by:

have doubt when we can get table information in db2 iseries from SYSIBM.SYSTABLES why every where its written fetch from QSYS2.SYSTABLES , does QSYS2.SYSTABLES provide more info in what way this is...

DB2 Database

Problem DB2 9.5 connection to Iseries

by: Saharian | last post by:

Hi all, I am new to DB2. I want to configure replication between DB2 9.5-C Express and DB2 Iseries V5R2. I am maintaining a small database(around 50 tables) on local DB2 9.5. I want to keep...

DB2 Database

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

CHAR to VARCHAR conversion in DB2(iSeries)

Similar topics