Question about VARCHAR Vs. CHAR fields

jdokos

Hello,

I have a question about VARCHAR fields. Our application groups here
are starting to use VARCHARs much more frequently. Even VARCHAR (2) to
(9) length fields. They say this is because some of the application
programs, specifically Java Beans cannot handle the spaces after the
value in CHAR fields.

Is anyone else seeing this trend?

I know that VARCHAR fields have 2 extra bytes of overhead. Does anyone
know if there is a significant performance impact in DML against these
fields due to tracking the length?

Thanks in advance for any and all information,

Jeff

Nov 12 '05 #1

Subscribe Post Reply

11527

Ian

jdokos wrote:

Hello,

I have a question about VARCHAR fields. Our application groups here
are starting to use VARCHARs much more frequently. Even VARCHAR (2) to
(9) length fields. They say this is because some of the application
programs, specifically Java Beans cannot handle the spaces after the
value in CHAR fields.

Is anyone else seeing this trend?

I know that VARCHAR fields have 2 extra bytes of overhead. Does anyone
know if there is a significant performance impact in DML against these
fields due to tracking the length?

There is the additional storage overhead as you mention, plus this can
lead to performance issues with row overflows and/or page reorgs.

IMO, this is usually a symptom of lazy programmers rather than "the
app can't handle extra spaces".

Nov 12 '05 #2

DA Morgan

Ian wrote:

jdokos wrote:
Hello,

I have a question about VARCHAR fields. Our application groups here
are starting to use VARCHARs much more frequently. Even VARCHAR (2) to
(9) length fields. They say this is because some of the application
programs, specifically Java Beans cannot handle the spaces after the
value in CHAR fields.

Is anyone else seeing this trend?

I know that VARCHAR fields have 2 extra bytes of overhead. Does anyone
know if there is a significant performance impact in DML against these
fields due to tracking the length?

There is the additional storage overhead as you mention, plus this can
lead to performance issues with row overflows and/or page reorgs.

IMO, this is usually a symptom of lazy programmers rather than "the
app can't handle extra spaces".

Is it true that in Informix VARCHAR takes more space than CHAR? In
Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

Thanks.
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)

Nov 12 '05 #3

rkusenet

"DA Morgan" <da******@psoug.org> wrote

Is it true that in Informix VARCHAR takes more space than CHAR?
Not at all. Just one byte extra at the beginning to record the length.
In Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

same is true with informix, except that upto a length of char(15) (some say even 20)
the performance gain in char is worth the space wasted. so I would always recommend
a char field upto char(15).
The difference between char and varchar becomes stark in delete-and-load tables.
That is tables which are periodically cleaned and loaded again. A table with fixed-length
columns (char, int, date etc) loads much faster than with variable length table. I think
it is something to do with the engine calculating before inserting on which page the
row will fit.

Nov 12 '05 #4

Knut Stolze

Ian wrote:

There is the additional storage overhead as you mention, plus this can
lead to performance issues with row overflows and/or page reorgs.

This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #5

Art S. Kagel

DA Morgan wrote:

Ian wrote:
jdokos wrote:
Hello,
<SNIP>

Is it true that in Informix VARCHAR takes more space than CHAR? In
Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

Yes and no. A VARCHAR stored a 1 byte length and an LVARCHAR a 2 byte
length as part of the field, however, both types are true variable length
fields. So, if the field is full it will take a byte or two extra storage
than the equivalent CHAR, if not full it will take less storage than CHAR.

The rub and performance hit for using VARCHAR or LVARCHAR in IDS is similar
to that in DB2. If expanding a variable column in a row causes the row to
no longer fit on the data page which is its home then the row is moved to
another page and a forwarding pointer is left behind in its original
location. This is so that index keys do not have to be rewritten. Beyond
that if expanding a variable lenght field on a row causes the row to become
larger than a page (not possible in DB2, but that's another issue) IDS will
create a remainder page to hold the tails of oversized rows from several
pages and a forwarding pointer to the location of the tail is left at the
end of the home row portion. Since LVARCHARs can be as long as 32K this can
happen several times with a row taking up several full pages and a tail entry.

Usually, performance wise, I find breaking long strings into multiple fixed
length CHAR rows in a child table or by using a MULTISET is far more
efficient and no harder to code for.

Art S. Kagel

Nov 12 '05 #6

DA Morgan

rkusenet wrote:

"DA Morgan" <da******@psoug.org> wrote
Is it true that in Informix VARCHAR takes more space than CHAR?

Not at all. Just one byte extra at the beginning to record the length.
In Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

same is true with informix, except that upto a length of char(15) (some
say even 20)
the performance gain in char is worth the space wasted. so I would
always recommend
a char field upto char(15).
The difference between char and varchar becomes stark in delete-and-load
tables.
That is tables which are periodically cleaned and loaded again. A table
with fixed-length
columns (char, int, date etc) loads much faster than with variable
length table. I think
it is something to do with the engine calculating before inserting on
which page the
row will fit.

Thanks. I will update my information on Informix accordingly.

Daniel Morgan
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)

Nov 12 '05 #7

DA Morgan

Art S. Kagel wrote:

DA Morgan wrote:
Ian wrote:
jdokos wrote:

Hello,
<SNIP>

Is it true that in Informix VARCHAR takes more space than CHAR? In
Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

Yes and no. A VARCHAR stored a 1 byte length and an LVARCHAR a 2 byte
length as part of the field, however, both types are true variable
length fields. So, if the field is full it will take a byte or two
extra storage than the equivalent CHAR, if not full it will take less
storage than CHAR.

The rub and performance hit for using VARCHAR or LVARCHAR in IDS is
similar to that in DB2. If expanding a variable column in a row causes
the row to no longer fit on the data page which is its home then the row
is moved to another page and a forwarding pointer is left behind in its
original location. This is so that index keys do not have to be
rewritten. Beyond that if expanding a variable lenght field on a row
causes the row to become larger than a page (not possible in DB2, but
that's another issue) IDS will create a remainder page to hold the tails
of oversized rows from several pages and a forwarding pointer to the
location of the tail is left at the end of the home row portion. Since
LVARCHARs can be as long as 32K this can happen several times with a row
taking up several full pages and a tail entry.

Usually, performance wise, I find breaking long strings into multiple
fixed length CHAR rows in a child table or by using a MULTISET is far
more efficient and no harder to code for.

Art S. Kagel

Thanks.

Daniel Morgan
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)

Nov 12 '05 #8

Madison Pruet

As has been said before. The issue of any varchar vrs a char boils down to
how hard it is to get to the variable portion of the row. With fixed length
objects, sequential scan filtering is much easier (i.e. require fewer cpu
cycles) than having to calculate where the variable portion is. There are
several strageties used to calculate variable sized objects, but all
solutions do require some form of calculation and calculations do require
cpu cycles.

Some DBMS use null terminated strings, but then that means you have to
always scan to find the variable lengthed object. Some will split the row
into fixed sized objects and then create an array of offsets for the
variable length objects. The variable length objects may be stored in a
variable portion of the row, or outside of the row in a LOB type of object.,
or possibably in a secondary row. In that case, the varchar still requires
additional space because you have to have some form of a pointer to the
physical location of the varchar. The cost may be hidden, but it is still
there.

Even using a null terminated string for a var char requires an additional
space for the NULL. So that solution is going to use the same space as the
IDS varchar, except instead of a column size, there is the column null
character. And that solution is probably the worst to navigate because each
character in the column must be examined just to get to the end of the
column.

"rkusenet" <rk******@hotmail.com> wrote in message
news:3j************@individual.net...

"DA Morgan" <da******@psoug.org> wrote
Is it true that in Informix VARCHAR takes more space than CHAR?
Not at all. Just one byte extra at the beginning to record the length.
In Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

same is true with informix, except that upto a length of char(15) (some

say even 20) the performance gain in char is worth the space wasted. so I would always recommend a char field upto char(15).
The difference between char and varchar becomes stark in delete-and-load tables. That is tables which are periodically cleaned and loaded again. A table with fixed-length columns (char, int, date etc) loads much faster than with variable length table. I think it is something to do with the engine calculating before inserting on which page the row will fit.

Nov 12 '05 #9

Madison Pruet

Please see my comments about varchar vrs char. Don't forget, a null
terminator requires a character position as well. ;-)

"DA Morgan" <da******@psoug.org> wrote in message
news:1120679155.210606@yasure...

Ian wrote:
jdokos wrote:
Hello,

I have a question about VARCHAR fields. Our application groups here
are starting to use VARCHARs much more frequently. Even VARCHAR (2) to
(9) length fields. They say this is because some of the application
programs, specifically Java Beans cannot handle the spaces after the
value in CHAR fields.

Is anyone else seeing this trend?

I know that VARCHAR fields have 2 extra bytes of overhead. Does anyone
know if there is a significant performance impact in DML against these
fields due to tracking the length?

There is the additional storage overhead as you mention, plus this can
lead to performance issues with row overflows and/or page reorgs.

IMO, this is usually a symptom of lazy programmers rather than "the
app can't handle extra spaces".

Is it true that in Informix VARCHAR takes more space than CHAR? In
Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

Thanks.
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)

Nov 12 '05 #10

DA Morgan

Madison Pruet wrote:

Please see my comments about varchar vrs char. Don't forget, a null
terminator requires a character position as well. ;-)

Given the cost of storage these days I haven't, well at least not since
mainframes and COBOL and Y2K, cared about one byte one way or the other.

What I do care about is the horrible waste of CPU spent trimming
trailing spaces.
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)

Nov 12 '05 #11

Jan M. Nelken

Knut Stolze wrote:

This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.

That would be true providing that application programmer shows some restrains
and will not use construct like VARCHAR(1) which will use up to 6 bytes
(4 bytes of length - at least on DB2 for Linux, Unix and Windows - 1 byte for
actual data and 1 byte for null indicator). Contrast that with CHAR(1) NOT NULL
which will use exactly 1 byte.

Trend to use VARCHAR(n) for anything appears to come from java programmers
because easy mapping of VARCHAR(n) to java strings.

Jan M. Nelken

Nov 12 '05 #12

Knut Stolze

Jan M. Nelken wrote:

Knut Stolze wrote:
This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.

That would be true providing that application programmer shows some
restrains and will not use construct like VARCHAR(1) which will use up to
6 bytes (4 bytes of length - at least on DB2 for Linux, Unix and Windows -
1 byte for actual data and 1 byte for null indicator). Contrast that with
CHAR(1) NOT NULL which will use exactly 1 byte.

No question that this is indeed not very smart.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #13

Superboer

ahum how about when the engine considers a page to be full. this can be
horrible with varchars.

Superboer.

Nov 12 '05 #14

Ian

DA Morgan wrote:

Madison Pruet wrote:
Please see my comments about varchar vrs char. Don't forget, a null
terminator requires a character position as well. ;-)

Given the cost of storage these days I haven't, well at least not since
mainframes and COBOL and Y2K, cared about one byte one way or the other.

1 byte isn't much to worry about, but if you have 15 columns in a table
that each have 1 extra byte, and your table has 1 billion rows, you're
looking at a lot of extra space, which does affect performance,
maintenance, etc.

Nov 12 '05 #15

Madison Pruet

Only if the varchar column needs to be expanded.

"Superboer" <su********@planet.nl> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...

ahum how about when the engine considers a page to be full. this can be
horrible with varchars.

Superboer.

Nov 12 '05 #16

DA Morgan

Ian wrote:

DA Morgan wrote:
Madison Pruet wrote:
Please see my comments about varchar vrs char. Don't forget, a null
terminator requires a character position as well. ;-)

Given the cost of storage these days I haven't, well at least not since
mainframes and COBOL and Y2K, cared about one byte one way or the other.

1 byte isn't much to worry about, but if you have 15 columns in a table
that each have 1 extra byte, and your table has 1 billion rows, you're
looking at a lot of extra space, which does affect performance,
maintenance, etc.

15 columns 1 extra byte = 15GB. Hardly worthy of consideration for many
reasons.

Cost: I currently purchase storage at about $2.14 per GB so the cost of
this storage, and I'm including the cost of everything, is $32.10.

Performance: Unless I have a very poor design I am selecting only a few
percent of the rows in the table (remember it is a billion row table).
So the difference between having a CHAR with a space I have to trim
versus a byte saved to identify the string size, if anything, favors the
variable length string.

Your mileage based on product and version may differ but I can no
longer, in Oracle, justify CHAR for anything other than a CHAR(1) and
those don't save enough to make me want to use one.
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)

Nov 12 '05 #17

hikums

So, what's the recommendation for java programmers if a char(15) is
used to store an attribute like city, and it is padded with spaces to
the right?

Knut Stolze wrote:

Jan M. Nelken wrote:
Knut Stolze wrote:
This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.

That would be true providing that application programmer shows some
restrains and will not use construct like VARCHAR(1) which will use up to
6 bytes (4 bytes of length - at least on DB2 for Linux, Unix and Windows -
1 byte for actual data and 1 byte for null indicator). Contrast that with
CHAR(1) NOT NULL which will use exactly 1 byte.

No question that this is indeed not very smart.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #18

Knut Stolze

DA Morgan wrote:

Performance: Unless I have a very poor design I am selecting only a few
percent of the rows in the table (remember it is a billion row table).
So the difference between having a CHAR with a space I have to trim
versus a byte saved to identify the string size, if anything, favors the
variable length string.

It all depends on the actual situation, I'd say. If an index is based on
such a column, the padding could have a more significant impact because
less index keys will fit on a page and, thus, the index might grow by 1
level, requiring an additional page access for each query. On the other
side, with such a large table you'd probably have a big buffer pool and the
index data has a good chance to remain in the BP so no additional I/O
occurs and the performance impact might be negligible.

So it basically comes down to the "school" of the developer in question.
Some will pay more attention on such details (and depending on the
application, this might be a really good thing). Others care less about it
and are also right.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #19

DA Morgan

Knut Stolze wrote:

So it basically comes down to the "school" of the developer in question.
Some will pay more attention on such details (and depending on the
application, this might be a really good thing). Others care less about it
and are also right.

Lets also acknowledge the third school. The one that benchmarks it both
ways and actually has metrics to support their decision.

Ok ok I know ... just a fantasy from an academic. ;-)
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)

Nov 12 '05 #20

Knut Stolze

DA Morgan wrote:

Ok ok I know ... just a fantasy from an academic. ;-)

Tell me about it. ;-)

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #21

Phil Sherman

No single correct answer but I'd use VARCHAR(15) NOT NULL and make sure
that there are no fixed length columns in the row to the right of the
varchar.

1. Searches and index construction on fixed length columns will not have
to compute where the column is in the row.

2. City is very likely to be used to construct city, state for addresses
which would require truncation anyway. If you are transmitting this data
to a terminal; transmission costs for the spaces can add up. IBM studies
done over 25 years ago demonstrated that truncating trailing blanks
before transmission was always a beneficial tradeoff. (Increased network
speeds have changed this metric but not everyone connects with a T1 or
faster link.)

3. Most city names are less than 15 chars. I'd suspect that an analysis
of name length * frequency (your specific data) would show that varchar
ends up occupying less space than fixed length.

4. NOT NULL should always be used for City unless you have a need for a
null because the length can be set to zero. This also (usually)
simplifies application programming.

Phil Sherman

hi****@gmail.com wrote:

So, what's the recommendation for java programmers if a char(15) is
used to store an attribute like city, and it is padded with spaces to
the right?

Knut Stolze wrote:
Jan M. Nelken wrote:

Knut Stolze wrote:
This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.

That would be true providing that application programmer shows some
restrains and will not use construct like VARCHAR(1) which will use up to
6 bytes (4 bytes of length - at least on DB2 for Linux, Unix and Windows -
1 byte for actual data and 1 byte for null indicator). Contrast that with
CHAR(1) NOT NULL which will use exactly 1 byte.

No question that this is indeed not very smart.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #22

hikums

Thanks Phil.

Phil Sherman wrote:

No single correct answer but I'd use VARCHAR(15) NOT NULL and make sure
that there are no fixed length columns in the row to the right of the
varchar.

1. Searches and index construction on fixed length columns will not have
to compute where the column is in the row.

2. City is very likely to be used to construct city, state for addresses
which would require truncation anyway. If you are transmitting this data
to a terminal; transmission costs for the spaces can add up. IBM studies
done over 25 years ago demonstrated that truncating trailing blanks
before transmission was always a beneficial tradeoff. (Increased network
speeds have changed this metric but not everyone connects with a T1 or
faster link.)

3. Most city names are less than 15 chars. I'd suspect that an analysis
of name length * frequency (your specific data) would show that varchar
ends up occupying less space than fixed length.

4. NOT NULL should always be used for City unless you have a need for a
null because the length can be set to zero. This also (usually)
simplifies application programming.

Phil Sherman

hi****@gmail.com wrote:
So, what's the recommendation for java programmers if a char(15) is
used to store an attribute like city, and it is padded with spaces to
the right?

Knut Stolze wrote:
Jan M. Nelken wrote:
Knut Stolze wrote:
>This is certainly true, but you also have to keep in mind that more short
>VARCHARs might fit on a page than (padded) CHAR values. So you could
>easily have a performance benefit because less pages need to be loaded to
>satisfy a query.

That would be true providing that application programmer shows some
restrains and will not use construct like VARCHAR(1) which will use up to
6 bytes (4 bytes of length - at least on DB2 for Linux, Unix and Windows -
1 byte for actual data and 1 byte for null indicator). Contrast that with
CHAR(1) NOT NULL which will use exactly 1 byte.

No question that this is indeed not very smart.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena

Nov 12 '05 #23

Question about VARCHAR Vs. CHAR fields

Similar topics