473,395 Members | 1,539 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Question about VARCHAR Vs. CHAR fields

Hello,

I have a question about VARCHAR fields. Our application groups here
are starting to use VARCHARs much more frequently. Even VARCHAR (2) to
(9) length fields. They say this is because some of the application
programs, specifically Java Beans cannot handle the spaces after the
value in CHAR fields.

Is anyone else seeing this trend?

I know that VARCHAR fields have 2 extra bytes of overhead. Does anyone
know if there is a significant performance impact in DML against these
fields due to tracking the length?

Thanks in advance for any and all information,

Jeff

Nov 12 '05 #1
22 11527
Ian
jdokos wrote:
Hello,

I have a question about VARCHAR fields. Our application groups here
are starting to use VARCHARs much more frequently. Even VARCHAR (2) to
(9) length fields. They say this is because some of the application
programs, specifically Java Beans cannot handle the spaces after the
value in CHAR fields.

Is anyone else seeing this trend?

I know that VARCHAR fields have 2 extra bytes of overhead. Does anyone
know if there is a significant performance impact in DML against these
fields due to tracking the length?


There is the additional storage overhead as you mention, plus this can
lead to performance issues with row overflows and/or page reorgs.

IMO, this is usually a symptom of lazy programmers rather than "the
app can't handle extra spaces".

Nov 12 '05 #2
Ian wrote:
jdokos wrote:
Hello,

I have a question about VARCHAR fields. Our application groups here
are starting to use VARCHARs much more frequently. Even VARCHAR (2) to
(9) length fields. They say this is because some of the application
programs, specifically Java Beans cannot handle the spaces after the
value in CHAR fields.

Is anyone else seeing this trend?

I know that VARCHAR fields have 2 extra bytes of overhead. Does anyone
know if there is a significant performance impact in DML against these
fields due to tracking the length?

There is the additional storage overhead as you mention, plus this can
lead to performance issues with row overflows and/or page reorgs.

IMO, this is usually a symptom of lazy programmers rather than "the
app can't handle extra spaces".


Is it true that in Informix VARCHAR takes more space than CHAR? In
Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

Thanks.
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)
Nov 12 '05 #3
"DA Morgan" <da******@psoug.org> wrote
Is it true that in Informix VARCHAR takes more space than CHAR?
Not at all. Just one byte extra at the beginning to record the length.
In Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.


same is true with informix, except that upto a length of char(15) (some say even 20)
the performance gain in char is worth the space wasted. so I would always recommend
a char field upto char(15).
The difference between char and varchar becomes stark in delete-and-load tables.
That is tables which are periodically cleaned and loaded again. A table with fixed-length
columns (char, int, date etc) loads much faster than with variable length table. I think
it is something to do with the engine calculating before inserting on which page the
row will fit.
Nov 12 '05 #4
Ian wrote:
There is the additional storage overhead as you mention, plus this can
lead to performance issues with row overflows and/or page reorgs.


This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena
Nov 12 '05 #5
DA Morgan wrote:
Ian wrote:
jdokos wrote:
Hello,
<SNIP>


Is it true that in Informix VARCHAR takes more space than CHAR? In
Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.


Yes and no. A VARCHAR stored a 1 byte length and an LVARCHAR a 2 byte
length as part of the field, however, both types are true variable length
fields. So, if the field is full it will take a byte or two extra storage
than the equivalent CHAR, if not full it will take less storage than CHAR.

The rub and performance hit for using VARCHAR or LVARCHAR in IDS is similar
to that in DB2. If expanding a variable column in a row causes the row to
no longer fit on the data page which is its home then the row is moved to
another page and a forwarding pointer is left behind in its original
location. This is so that index keys do not have to be rewritten. Beyond
that if expanding a variable lenght field on a row causes the row to become
larger than a page (not possible in DB2, but that's another issue) IDS will
create a remainder page to hold the tails of oversized rows from several
pages and a forwarding pointer to the location of the tail is left at the
end of the home row portion. Since LVARCHARs can be as long as 32K this can
happen several times with a row taking up several full pages and a tail entry.

Usually, performance wise, I find breaking long strings into multiple fixed
length CHAR rows in a child table or by using a MULTISET is far more
efficient and no harder to code for.

Art S. Kagel
Nov 12 '05 #6
rkusenet wrote:
"DA Morgan" <da******@psoug.org> wrote
Is it true that in Informix VARCHAR takes more space than CHAR?

Not at all. Just one byte extra at the beginning to record the length.
In Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

same is true with informix, except that upto a length of char(15) (some
say even 20)
the performance gain in char is worth the space wasted. so I would
always recommend
a char field upto char(15).
The difference between char and varchar becomes stark in delete-and-load
tables.
That is tables which are periodically cleaned and loaded again. A table
with fixed-length
columns (char, int, date etc) loads much faster than with variable
length table. I think
it is something to do with the engine calculating before inserting on
which page the
row will fit.


Thanks. I will update my information on Informix accordingly.

Daniel Morgan
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)
Nov 12 '05 #7
Art S. Kagel wrote:
DA Morgan wrote:
Ian wrote:
jdokos wrote:

Hello,
<SNIP>

Is it true that in Informix VARCHAR takes more space than CHAR? In
Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

Yes and no. A VARCHAR stored a 1 byte length and an LVARCHAR a 2 byte
length as part of the field, however, both types are true variable
length fields. So, if the field is full it will take a byte or two
extra storage than the equivalent CHAR, if not full it will take less
storage than CHAR.

The rub and performance hit for using VARCHAR or LVARCHAR in IDS is
similar to that in DB2. If expanding a variable column in a row causes
the row to no longer fit on the data page which is its home then the row
is moved to another page and a forwarding pointer is left behind in its
original location. This is so that index keys do not have to be
rewritten. Beyond that if expanding a variable lenght field on a row
causes the row to become larger than a page (not possible in DB2, but
that's another issue) IDS will create a remainder page to hold the tails
of oversized rows from several pages and a forwarding pointer to the
location of the tail is left at the end of the home row portion. Since
LVARCHARs can be as long as 32K this can happen several times with a row
taking up several full pages and a tail entry.

Usually, performance wise, I find breaking long strings into multiple
fixed length CHAR rows in a child table or by using a MULTISET is far
more efficient and no harder to code for.

Art S. Kagel


Thanks.

Daniel Morgan
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)
Nov 12 '05 #8
As has been said before. The issue of any varchar vrs a char boils down to
how hard it is to get to the variable portion of the row. With fixed length
objects, sequential scan filtering is much easier (i.e. require fewer cpu
cycles) than having to calculate where the variable portion is. There are
several strageties used to calculate variable sized objects, but all
solutions do require some form of calculation and calculations do require
cpu cycles.

Some DBMS use null terminated strings, but then that means you have to
always scan to find the variable lengthed object. Some will split the row
into fixed sized objects and then create an array of offsets for the
variable length objects. The variable length objects may be stored in a
variable portion of the row, or outside of the row in a LOB type of object.,
or possibably in a secondary row. In that case, the varchar still requires
additional space because you have to have some form of a pointer to the
physical location of the varchar. The cost may be hidden, but it is still
there.

Even using a null terminated string for a var char requires an additional
space for the NULL. So that solution is going to use the same space as the
IDS varchar, except instead of a column size, there is the column null
character. And that solution is probably the worst to navigate because each
character in the column must be examined just to get to the end of the
column.

"rkusenet" <rk******@hotmail.com> wrote in message
news:3j************@individual.net...
"DA Morgan" <da******@psoug.org> wrote
Is it true that in Informix VARCHAR takes more space than CHAR?
Not at all. Just one byte extra at the beginning to record the length.
In Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.


same is true with informix, except that upto a length of char(15) (some

say even 20) the performance gain in char is worth the space wasted. so I would always recommend a char field upto char(15).
The difference between char and varchar becomes stark in delete-and-load tables. That is tables which are periodically cleaned and loaded again. A table with fixed-length columns (char, int, date etc) loads much faster than with variable length table. I think it is something to do with the engine calculating before inserting on which page the row will fit.

Nov 12 '05 #9
Please see my comments about varchar vrs char. Don't forget, a null
terminator requires a character position as well. ;-)

"DA Morgan" <da******@psoug.org> wrote in message
news:1120679155.210606@yasure...
Ian wrote:
jdokos wrote:
Hello,

I have a question about VARCHAR fields. Our application groups here
are starting to use VARCHARs much more frequently. Even VARCHAR (2) to
(9) length fields. They say this is because some of the application
programs, specifically Java Beans cannot handle the spaces after the
value in CHAR fields.

Is anyone else seeing this trend?

I know that VARCHAR fields have 2 extra bytes of overhead. Does anyone
know if there is a significant performance impact in DML against these
fields due to tracking the length?

There is the additional storage overhead as you mention, plus this can
lead to performance issues with row overflows and/or page reorgs.

IMO, this is usually a symptom of lazy programmers rather than "the
app can't handle extra spaces".


Is it true that in Informix VARCHAR takes more space than CHAR? In
Oracle the waste of space and CPU comes with working with CHAR and
it has been almost completely abandoned.

Thanks.
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)

Nov 12 '05 #10
Madison Pruet wrote:
Please see my comments about varchar vrs char. Don't forget, a null
terminator requires a character position as well. ;-)


Given the cost of storage these days I haven't, well at least not since
mainframes and COBOL and Y2K, cared about one byte one way or the other.

What I do care about is the horrible waste of CPU spent trimming
trailing spaces.
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)
Nov 12 '05 #11
Knut Stolze wrote:
This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.


That would be true providing that application programmer shows some restrains
and will not use construct like VARCHAR(1) which will use up to 6 bytes
(4 bytes of length - at least on DB2 for Linux, Unix and Windows - 1 byte for
actual data and 1 byte for null indicator). Contrast that with CHAR(1) NOT NULL
which will use exactly 1 byte.

Trend to use VARCHAR(n) for anything appears to come from java programmers
because easy mapping of VARCHAR(n) to java strings.

Jan M. Nelken
Nov 12 '05 #12
Jan M. Nelken wrote:
Knut Stolze wrote:
This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.


That would be true providing that application programmer shows some
restrains and will not use construct like VARCHAR(1) which will use up to
6 bytes (4 bytes of length - at least on DB2 for Linux, Unix and Windows -
1 byte for actual data and 1 byte for null indicator). Contrast that with
CHAR(1) NOT NULL which will use exactly 1 byte.


No question that this is indeed not very smart.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena
Nov 12 '05 #13
ahum how about when the engine considers a page to be full. this can be
horrible with varchars.

Superboer.

Nov 12 '05 #14
Ian
DA Morgan wrote:
Madison Pruet wrote:
Please see my comments about varchar vrs char. Don't forget, a null
terminator requires a character position as well. ;-)

Given the cost of storage these days I haven't, well at least not since
mainframes and COBOL and Y2K, cared about one byte one way or the other.


1 byte isn't much to worry about, but if you have 15 columns in a table
that each have 1 extra byte, and your table has 1 billion rows, you're
looking at a lot of extra space, which does affect performance,
maintenance, etc.

Nov 12 '05 #15
Only if the varchar column needs to be expanded.

"Superboer" <su********@planet.nl> wrote in message
news:11**********************@f14g2000cwb.googlegr oups.com...
ahum how about when the engine considers a page to be full. this can be
horrible with varchars.

Superboer.

Nov 12 '05 #16
Ian wrote:
DA Morgan wrote:
Madison Pruet wrote:
Please see my comments about varchar vrs char. Don't forget, a null
terminator requires a character position as well. ;-)


Given the cost of storage these days I haven't, well at least not since
mainframes and COBOL and Y2K, cared about one byte one way or the other.

1 byte isn't much to worry about, but if you have 15 columns in a table
that each have 1 extra byte, and your table has 1 billion rows, you're
looking at a lot of extra space, which does affect performance,
maintenance, etc.


15 columns 1 extra byte = 15GB. Hardly worthy of consideration for many
reasons.

Cost: I currently purchase storage at about $2.14 per GB so the cost of
this storage, and I'm including the cost of everything, is $32.10.

Performance: Unless I have a very poor design I am selecting only a few
percent of the rows in the table (remember it is a billion row table).
So the difference between having a CHAR with a space I have to trim
versus a byte saved to identify the string size, if anything, favors the
variable length string.

Your mileage based on product and version may differ but I can no
longer, in Oracle, justify CHAR for anything other than a CHAR(1) and
those don't save enough to make me want to use one.
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)
Nov 12 '05 #17
So, what's the recommendation for java programmers if a char(15) is
used to store an attribute like city, and it is padded with spaces to
the right?

Knut Stolze wrote:
Jan M. Nelken wrote:
Knut Stolze wrote:
This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.


That would be true providing that application programmer shows some
restrains and will not use construct like VARCHAR(1) which will use up to
6 bytes (4 bytes of length - at least on DB2 for Linux, Unix and Windows -
1 byte for actual data and 1 byte for null indicator). Contrast that with
CHAR(1) NOT NULL which will use exactly 1 byte.


No question that this is indeed not very smart.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena


Nov 12 '05 #18
DA Morgan wrote:
Performance: Unless I have a very poor design I am selecting only a few
percent of the rows in the table (remember it is a billion row table).
So the difference between having a CHAR with a space I have to trim
versus a byte saved to identify the string size, if anything, favors the
variable length string.


It all depends on the actual situation, I'd say. If an index is based on
such a column, the padding could have a more significant impact because
less index keys will fit on a page and, thus, the index might grow by 1
level, requiring an additional page access for each query. On the other
side, with such a large table you'd probably have a big buffer pool and the
index data has a good chance to remain in the BP so no additional I/O
occurs and the performance impact might be negligible.

So it basically comes down to the "school" of the developer in question.
Some will pay more attention on such details (and depending on the
application, this might be a really good thing). Others care less about it
and are also right.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena
Nov 12 '05 #19
Knut Stolze wrote:
So it basically comes down to the "school" of the developer in question.
Some will pay more attention on such details (and depending on the
application, this might be a really good thing). Others care less about it
and are also right.


Lets also acknowledge the third school. The one that benchmarks it both
ways and actually has metrics to support their decision.

Ok ok I know ... just a fantasy from an academic. ;-)
--
Daniel A. Morgan
http://www.psoug.org
da******@x.washington.edu
(replace x with u to respond)
Nov 12 '05 #20
DA Morgan wrote:
Ok ok I know ... just a fantasy from an academic. ;-)


Tell me about it. ;-)

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena
Nov 12 '05 #21
No single correct answer but I'd use VARCHAR(15) NOT NULL and make sure
that there are no fixed length columns in the row to the right of the
varchar.

1. Searches and index construction on fixed length columns will not have
to compute where the column is in the row.

2. City is very likely to be used to construct city, state for addresses
which would require truncation anyway. If you are transmitting this data
to a terminal; transmission costs for the spaces can add up. IBM studies
done over 25 years ago demonstrated that truncating trailing blanks
before transmission was always a beneficial tradeoff. (Increased network
speeds have changed this metric but not everyone connects with a T1 or
faster link.)

3. Most city names are less than 15 chars. I'd suspect that an analysis
of name length * frequency (your specific data) would show that varchar
ends up occupying less space than fixed length.

4. NOT NULL should always be used for City unless you have a need for a
null because the length can be set to zero. This also (usually)
simplifies application programming.

Phil Sherman

hi****@gmail.com wrote:
So, what's the recommendation for java programmers if a char(15) is
used to store an attribute like city, and it is padded with spaces to
the right?

Knut Stolze wrote:
Jan M. Nelken wrote:

Knut Stolze wrote:
This is certainly true, but you also have to keep in mind that more short
VARCHARs might fit on a page than (padded) CHAR values. So you could
easily have a performance benefit because less pages need to be loaded to
satisfy a query.

That would be true providing that application programmer shows some
restrains and will not use construct like VARCHAR(1) which will use up to
6 bytes (4 bytes of length - at least on DB2 for Linux, Unix and Windows -
1 byte for actual data and 1 byte for null indicator). Contrast that with
CHAR(1) NOT NULL which will use exactly 1 byte.


No question that this is indeed not very smart.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena


Nov 12 '05 #22
Thanks Phil.

Phil Sherman wrote:
No single correct answer but I'd use VARCHAR(15) NOT NULL and make sure
that there are no fixed length columns in the row to the right of the
varchar.

1. Searches and index construction on fixed length columns will not have
to compute where the column is in the row.

2. City is very likely to be used to construct city, state for addresses
which would require truncation anyway. If you are transmitting this data
to a terminal; transmission costs for the spaces can add up. IBM studies
done over 25 years ago demonstrated that truncating trailing blanks
before transmission was always a beneficial tradeoff. (Increased network
speeds have changed this metric but not everyone connects with a T1 or
faster link.)

3. Most city names are less than 15 chars. I'd suspect that an analysis
of name length * frequency (your specific data) would show that varchar
ends up occupying less space than fixed length.

4. NOT NULL should always be used for City unless you have a need for a
null because the length can be set to zero. This also (usually)
simplifies application programming.

Phil Sherman

hi****@gmail.com wrote:
So, what's the recommendation for java programmers if a char(15) is
used to store an attribute like city, and it is padded with spaces to
the right?

Knut Stolze wrote:
Jan M. Nelken wrote:
Knut Stolze wrote:
>This is certainly true, but you also have to keep in mind that more short
>VARCHARs might fit on a page than (padded) CHAR values. So you could
>easily have a performance benefit because less pages need to be loaded to
>satisfy a query.

That would be true providing that application programmer shows some
restrains and will not use construct like VARCHAR(1) which will use up to
6 bytes (4 bytes of length - at least on DB2 for Linux, Unix and Windows -
1 byte for actual data and 1 byte for null indicator). Contrast that with
CHAR(1) NOT NULL which will use exactly 1 byte.

No question that this is indeed not very smart.

--
Knut Stolze
Information Integration Development
IBM Germany / University of Jena



Nov 12 '05 #23

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Chris Dean | last post by:
this is really a mysql question rather than php but as the two are used togeter closely I hope someone will be able to advise me basically I am wondering how the addition of many large text field...
8
by: Randell D. | last post by:
Folks, I once read an article in Linux Format whereby a technical writer had made performance recommendations on a LAMP environment. One of the points raised was for small columns in a database,...
0
by: Krist | last post by:
Hi All, I have a database design question, pls give me some help.. I want to define tables for salesman's sales target commission . The commission could be given per EITHER sales amount of :...
1
by: Krist | last post by:
Hi All, There is some additional info I forget on this same topic I just posted. I have a database design question, pls give me some help.. I want to define tables for salesman's sales target...
2
by: Doug Crowson | last post by:
I have some questions about co-located joins and db partitions. I've heard various things from various people and was looking for confirmation. Assume the following tables and partitioning keys:...
10
by: hilz | last post by:
Hi all. I have a table that i create in MsAccess, using ado connection as follows: create table PAYITEM ( PAYITEM_ID COUNTER PRIMARY KEY, PAYITEM_NAME ...
4
by: Andrew S | last post by:
Hello Mr. Expert: - I have 3 tables in mysql in MyISAM table format, I am using mysql4.0 on freebsd5.3 - producttbl, productdetailentbl, pricetblN - they all have "productid" as the Primary KEY....
6
by: mike | last post by:
so I keep optimizing my fields down to the minimum character length necessary i.e., varchar(15), then I find out a month later its gotta get bigger, then a few months later, bigger again, etc. ...
9
by: JJ297 | last post by:
Plamen Ratchev helped me at this post: http://groups.google.com/group/comp.databases.ms-sqlserver/browse_thread/thread/a57716c65bc21a2f# You mentioned this: You can still keep all weekly data...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.