By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,054 Members | 1,635 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,054 IT Pros & Developers. It's quick & easy.

which data type to select

P: n/a
Hi,

I have a web application with a table to store terms and
conditions of a Company.
This may some times run into many pages and some times it may be just a
few sentences. It is a character text field. I want to know which Data
type I need to use so that it doesnt waste memory.

thanks in advance,
rAinDeEr

May 10 '06 #1
Share this Question
Share on Google+
5 Replies


P: n/a

"rAinDeEr" <ta**********@gmail.com> wrote in message
news:11*********************@v46g2000cwv.googlegro ups.com...
Hi,

I have a web application with a table to store terms and
conditions of a Company.
This may some times run into many pages and some times it may be just a
few sentences. It is a character text field. I want to know which Data
type I need to use so that it doesnt waste memory.


I'm going to assume that you are worried about wasting disk space, not
memory, since most DBAs care much more about disk space than memory.

VARCHAR is a good choice for text fields of varying lengths if the lengths
aren't too great; DB2 stores only the actual amount of data desired in the
field, plus two bytes of overhead for the length. Therefore, if the terms
and conditions of one company is 100 bytes, only 102 bytes of storage are
needed and if the terms and conditions of another company are 5000 bytes,
only 5002 bytes are needed.

Stay away from CHAR and LONG VARCHAR for the terms and conditions data. CHAR
fields are probably not going to be long enough in many cases - they can
only be 254 bytes long at most and if you store your data in a CHAR(254)
field, every value in that column will occupy the full 254 bytes because
CHAR values are always padded with spaces to bring them up the maximum
length of the column. That can waste a lot of space. Avoid LONG VARCHAR
because those columns have their sizes calculated by the system and DB2 will
always make the column as big as it can possibly be. This has the
unfortunate side effect of making it impossible to add new columns to the
table without dropping and recreating the table.

Please note that VARCHAR columns cannot be more than 32,672 bytes long and
LONG VARCHARs cannot be more than 32,700 bytes long (assuming you are using
DB2 for Windows, Unix, and Linux and are running Version 8). If you expect
the terms and conditions for some companies to be longer than that, you will
want to use the CLOB datatype, which can store up to 2,147,483,647 bytes. If
your terms and conditions are even bigger than that, you will need to use
multiple columns to store it all or you will have to store the terms and
conditions separately from the data tables. When you use the LOB (Large
Object) datatypes, which include BLOB, CLOB, and DBCLOB, it is possible to
simply store a link to a document in the table column, rather than storing
the data itself; you might find that this approach makes your database
significantly smaller but it can also complicate the administration and
programming side of things for you.

--
Rhino
May 10 '06 #2

P: n/a
rAinDeEr wrote:
Hi,

I have a web application with a table to store terms and
conditions of a Company.
This may some times run into many pages and some times it may be just
a few sentences. It is a character text field. I want to know which
Data type I need to use so that it doesnt waste memory.

thanks in advance,
rAinDeEr


If you're sure it'll never exceed about 32k characters LONG VARCHAR
might be a viable choice. Failing that, use CLOB(size) with whatever
the absolute maximum size of data is (1M, 1G, whatever). Both LONG
VARCHAR and CLOB are variable length "long" storage types.

If the data isn't always plain text (in other words, if your definition
of "text" is rather looser and could include things like rich text
documents) use a BLOB(size) instead.
HTH,

Dave.

--

May 10 '06 #3

P: n/a
Rhino wrote:

"rAinDeEr" <ta**********@gmail.com> wrote in message
news:11*********************@v46g2000cwv.googlegro ups.com...
Hi,

I have a web application with a table to store terms and
conditions of a Company.
This may some times run into many pages and some times it may be
just a few sentences. It is a character text field. I want to know
which Data type I need to use so that it doesnt waste memory.

I'm going to assume that you are worried about wasting disk space,
not memory, since most DBAs care much more about disk space than
memory.

VARCHAR is a good choice for text fields of varying lengths if the
lengths aren't too great; DB2 stores only the actual amount of data
desired in the field, plus two bytes of overhead for the length.
Therefore, if the terms and conditions of one company is 100 bytes,
only 102 bytes of storage are needed and if the terms and conditions
of another company are 5000 bytes, only 5002 bytes are needed.

Stay away from CHAR and LONG VARCHAR for the terms and conditions
data. CHAR fields are probably not going to be long enough in many
cases - they can only be 254 bytes long at most and if you store your
data in a CHAR(254) field, every value in that column will occupy the
full 254 bytes because CHAR values are always padded with spaces to
bring them up the maximum length of the column. That can waste a lot
of space. Avoid LONG VARCHAR because those columns have their sizes
calculated by the system and DB2 will always make the column as big
as it can possibly be. This has the unfortunate side effect of making
it impossible to add new columns to the table without dropping and
recreating the table.


Interesting ... I didn't know about that limitation of LONG VARCHARs
(not that I've used them much; don't think I've ever used them in a
production environment)
Please note that VARCHAR columns cannot be more than 32,672 bytes
long and LONG VARCHARs cannot be more than 32,700 bytes long
(assuming you are using DB2 for Windows, Unix, and Linux and are
running Version 8). If you expect the terms and conditions for some
companies to be longer than that, you will want to use the CLOB
datatype, which can store up to 2,147,483,647 bytes. If your terms
and conditions are even bigger than that, you will need to use
multiple columns to store it all or you will have to store the terms
and conditions separately from the data tables. When you use the LOB
(Large Object) datatypes, which include BLOB, CLOB, and DBCLOB, it is
possible to simply store a link to a document in the table column,
rather than storing the data itself; you might find that this
approach makes your database significantly smaller but it can also
complicate the administration and programming side of things for you.


IIRC, LONG VARCHARs are more-or-less Large Objects as well (i.e. like
LOBs their data isn't stored entirely within the table)? The reason I
didn't suggest a 32k VARCHAR was that VARCHARs are stored entirely
within the table which would mean using a tablespace with a big page
size.

Not necssarily a bad thing as page sizes bigger than the default 4k can
have a performance benefit, but it does include a little bit of extra
complexity what with having to create the extra tablespace, unless one
creates the entire database with a bigger pagesize (which in the case
of a 32k page size I wouldn't recommend given that there are likely to
be other tables in the database with smaller row sizes and given that
pages can hold a maximum of 255 rows this would result in a *lot* of
wasted space).
Dave.

--

May 10 '06 #4

P: n/a

"Dave Hughes" <da**@waveform.plus.com> wrote in message
news:44*********************@ptn-nntp-reader01.plus.net...
Rhino wrote:

"rAinDeEr" <ta**********@gmail.com> wrote in message
news:11*********************@v46g2000cwv.googlegro ups.com...
> Hi,
>
> I have a web application with a table to store terms and
> conditions of a Company.
> This may some times run into many pages and some times it may be
> just a few sentences. It is a character text field. I want to know
> which Data type I need to use so that it doesnt waste memory.
>
I'm going to assume that you are worried about wasting disk space,
not memory, since most DBAs care much more about disk space than
memory.

VARCHAR is a good choice for text fields of varying lengths if the
lengths aren't too great; DB2 stores only the actual amount of data
desired in the field, plus two bytes of overhead for the length.
Therefore, if the terms and conditions of one company is 100 bytes,
only 102 bytes of storage are needed and if the terms and conditions
of another company are 5000 bytes, only 5002 bytes are needed.

Stay away from CHAR and LONG VARCHAR for the terms and conditions
data. CHAR fields are probably not going to be long enough in many
cases - they can only be 254 bytes long at most and if you store your
data in a CHAR(254) field, every value in that column will occupy the
full 254 bytes because CHAR values are always padded with spaces to
bring them up the maximum length of the column. That can waste a lot
of space. Avoid LONG VARCHAR because those columns have their sizes
calculated by the system and DB2 will always make the column as big
as it can possibly be. This has the unfortunate side effect of making
it impossible to add new columns to the table without dropping and
recreating the table.


Interesting ... I didn't know about that limitation of LONG VARCHARs
(not that I've used them much; don't think I've ever used them in a
production environment)

In the early days of the MVS version of DB2, before row-level locking became
available, some people liked LONG VARCHARs because they effectively made
each row occupy an entire page. When every row was the only row on its page,
page-locking effectively became row-locking. Obviously, there can be
application benefits to locking only one row instead of a whole page full of
rows.

But if you consider that rows containing LONG VARCHAR columns occupy an
entire page but are probably only using a small fraction of the available
space, it's obvious that you are paying a very high price in storage for the
locking-benefit you are getting.

For example, I remember a table that some people on one of my project teams
developed back in the late 1980s. It was an audit table of some kind that
was intended to hold comments about particular transactions. There was a
small multicolumn key, then a single LONG VARCHAR column for comments about
the transaction identified in the key. I'm not sure what page size they were
using but I think it was 32K. Most of the comments were 100 characters or
less. By using LONG VARCHAR for the column, they were guaranteeing
themselves that their comments could be as long 32,700 bytes but they were
also guaranteeing that 32,700 bytes would be unavailable for other rows on
that same page! Since most comments were quite brief, they were wasting an
incredible amount of space. Now, if I remember correctly, they were only
writing rows to that table for a handful of transactions so the space wasted
was relatively trivial but if they changed the design to write a row to the
table for every transaction in the system, whether it was likely to need a
comment or not, the space wasted would quickly become staggering as the
number of transactions grew.

On top of that, if they ever needed to add a column to the table, they would
have had to save the data, drop the table, re-create it with the new
column(s) in it, then reload all old data. You can't add columns to a table
if it already occupies the whole page since there is no place to put the
data for the new column.
Please note that VARCHAR columns cannot be more than 32,672 bytes
long and LONG VARCHARs cannot be more than 32,700 bytes long
(assuming you are using DB2 for Windows, Unix, and Linux and are
running Version 8). If you expect the terms and conditions for some
companies to be longer than that, you will want to use the CLOB
datatype, which can store up to 2,147,483,647 bytes. If your terms
and conditions are even bigger than that, you will need to use
multiple columns to store it all or you will have to store the terms
and conditions separately from the data tables. When you use the LOB
(Large Object) datatypes, which include BLOB, CLOB, and DBCLOB, it is
possible to simply store a link to a document in the table column,
rather than storing the data itself; you might find that this
approach makes your database significantly smaller but it can also
complicate the administration and programming side of things for you.


IIRC, LONG VARCHARs are more-or-less Large Objects as well (i.e. like
LOBs their data isn't stored entirely within the table)?


Yes, that's right. According to the current manual for DB2 for
Windows/Unix/Linux, the 'LONG IN' clause:

Identifies the table space in which the values of any long columns (LONG
VARCHAR, LONG VARGRAPHIC, LOB data types, distinct types with any of these
as source types, or any columns defined with user-defined structured types
with values that cannot be stored inline) will be stored. This option is
allowed only when the primary table space specified in the IN clause is a
DMS table space. The table space must exist, must be a LARGE DMS table space
over which the authorization ID of the statement has USE privilege, and must
be in the same database partition group as tablespace-name1 (SQLSTATE
42838).
The reason I
didn't suggest a 32k VARCHAR was that VARCHARs are stored entirely
within the table which would mean using a tablespace with a big page
size.

Not necssarily a bad thing as page sizes bigger than the default 4k can
have a performance benefit, but it does include a little bit of extra
complexity what with having to create the extra tablespace, unless one
creates the entire database with a bigger pagesize (which in the case
of a 32k page size I wouldn't recommend given that there are likely to
be other tables in the database with smaller row sizes and given that
pages can hold a maximum of 255 rows this would result in a *lot* of
wasted space).


The important thing is that you want to make sure you've chosen optimum page
sizes otherwise you will waste a lot of space, as you've said. You also want
to choose those sizes BEFORE you start loading the database if possible; it
can be a lot of work to move data that was in tablespaces with poor page
sizes to tablespaces with better page sizes. As with most things, it best to
plan things out ahead of time as best you can than to go into things without
any forethought and then spent all your time doing damage control.

--
Rhino

May 11 '06 #5

P: n/a
> IIRC, LONG VARCHARs are more-or-less Large Objects as well (i.e. like
LOBs their data isn't stored entirely within the table)?


That's correct. However, the use of LONG VARCHARs is discouraged and you
should stick with CLOBs in this case. For example, many of the new
features added to DB2 don't even consider/accept LONG VARCHARs.

Btw, if you want to have some mix of both, the VARCHAR and the CLOB world,
i.e. store short strings like VARCHARs inline with the other data and long
strings like CLOBs, then you could wrap a CLOB value in a structured type
and the inline length for the structured type (or rather the column where
the structured type is used) determines the threshold between the two
storage models.

--
Knut Stolze
DB2 Information Integration Development
IBM Germany
May 12 '06 #6

This discussion thread is closed

Replies have been disabled for this discussion.