Datatype sizes; a space and speed issue?

Madison Kelly

Hi again,

Something I have been wondering about and haven't found an answer to
yet is how the size of a datatype (I hope that is the right term)
effects performance. What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"? How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?

I hope this isn't too basic a question! Thanks!

Madison

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #1

Subscribe Post Reply

3232

Vams

On Tuesday 22 June 2004 11:26 pm, Madison Kelly wrote:

Something I have been wondering about and haven't found an answer to
yet is how the size of a datatype (I hope that is the right term)
effects performance. What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"? How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?

There is no performance difference between varchar, char and text. Infact,
text is equal to varchar with no length specified (a Postgresql extension).

I use varchar when I want a max limit on the number of characters. Char when
I want a set number of characters (auto space padded). And use text all
other times, because it is more readable than varchar.

All this and more can be found in the Docs - 7.4.2: Ch 8: Data Types, Sec 8.3:
Character Types.

good luck,

Vams

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #2

Joel Matthew

> What effect is there if I specify "TEXT" instead of

say "VARCHAR(255)"?
Well, theoretically, in the latter case, the database will allocate 256
(257? 259?) bytes for that field in the record itself. That is, that
field will consume 256 bytes for each record stored.

In the former case, the database will (again, theoretically) allocate a
string pool (think of a separate file, as one way to do it) for the
field, and will allocate a pointer (4 or 8 bytes) into the string pool
for the field. This means that the field will consume only four or eight
bytes for each record stored, plus (don't forget this, it's important)
however many bytes are actually needed in the string pool for the text
value. If you have, for instance, a fifteen character name stored in
Unicode UTF32, there will be ninety, erm, octets (think bytes for most
CPUs) in the pool for that field of that record, plus a few pointers and
such so the system can tell where to collect garbage.
How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?
Well, that would reduce the storage requirements for that field by half.
It might also prevent you from storing necessary information. That's
easily 128 characters if you're only using US-ASCII in UTF-8, but it's
only 32 characters of Unicode in UTF-32, and it could be anywhere
between 128 and 32 in Unicode UTF-8. Making a good prediction about the
tradeoff is one of the things a database engineer is paid for.
I hope this isn't too basic a question!

Since you ask, yeah, it is, but unless things have changed here recently,
the people here aren't going to raise the oven temperature too high when
they roast me for answering it.

--
Joel Matthew <re**@ddcom.co.jp>
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #3

Stephan Szabo

On Wed, 23 Jun 2004, Joel Matthew wrote:

What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"?

Well, theoretically, in the latter case, the database will allocate 256
(257? 259?) bytes for that field in the record itself. That is, that
field will consume 256 bytes for each record stored.

Both text and varchar are stored in PostgreSQL as length + string so
varchar(256) doesn't not require storage of the unused bytes. char space
pads so it's the oddball (requiring length + padded string).

How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?

Well, that would reduce the storage requirements for that field by half.
It might also prevent you from storing necessary information. That's
easily 128 characters if you're only using US-ASCII in UTF-8, but it's
only 32 characters of Unicode in UTF-32, and it could be anywhere
between 128 and 32 in Unicode UTF-8. Making a good prediction about the
tradeoff is one of the things a database engineer is paid for.

varchar lengths should be in characters not bytes, so that should be 128
characters in any of the encodings, but the actual number of bytes that
those 128 characters can take up may vary.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #4

Madison Kelly

Vams wrote:

On Tuesday 22 June 2004 11:26 pm, Madison Kelly wrote:
Something I have been wondering about and haven't found an answer to
yet is how the size of a datatype (I hope that is the right term)
effects performance. What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"? How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?

There is no performance difference between varchar, char and text. Infact,
text is equal to varchar with no length specified (a Postgresql extension).

I use varchar when I want a max limit on the number of characters. Char when
I want a set number of characters (auto space padded). And use text all
other times, because it is more readable than varchar.

All this and more can be found in the Docs - 7.4.2: Ch 8: Data Types, Sec 8.3:
Character Types.

good luck,

Vams

Thank you very much Vams!!

Madison

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #5

Joel Matthew

> Both text and varchar are stored in PostgreSQL as length + string so

varchar(256) doesn't not require storage of the unused bytes. char space
pads so it's the oddball (requiring length + padded string).
Yeah, I saw Vams's post and r3m3mb3r3d that I'd f0rg0t.
varchar lengths should be in characters not bytes, so that should be 128
characters in any of the encodings, but the actual number of bytes that
those 128 characters can take up may vary.

And that's actually the best rationale, IMO, for making the optimization
a full-fledged feature. Boy, the things you forget ...

--
Joel Matthew <re**@ddcom.co.jp>
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #6

Alvaro Herrera

On Wed, Jun 23, 2004 at 01:15:23PM +0900, Joel Matthew wrote:

What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"?

Well, theoretically, in the latter case, the database will allocate 256
(257? 259?) bytes for that field in the record itself. That is, that
field will consume 256 bytes for each record stored.

Nope. Actually, in both cases the length will be stored first (4 bytes)
and the actual content following it, using the indicated amount of
bytes. There's absolutely no difference in storage.

A varchar(256) field will allow you to store a text not with 256 bytes
max, but 256 _chars_ max. Think multibyte encodings such as utf8 -- the
varchar(256) can take anything from 4 + 1 bytes (a single byte string) to
4 + 256 * max_bytes_per_char.
The difference you cite is for char(N) fields, which are always padded
with blanks to fill the N chars.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"There is evil in the world. There are dark, awful things. Occasionally, we get
a glimpse of them. But there are dark corners; horrors almost impossible to
imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #7

Tom Lane

Alvaro Herrera <al******@dcc.uchile.cl> writes:

The difference you cite is for char(N) fields, which are always padded
with blanks to fill the N chars.

And, again, that's N logical *characters*, not bytes.

We used to have some attempts at optimizing on the assumption that
char(n) fields were physically fixed-width, but we gave it up as a
bad job several major releases back ... it was never more than a
very marginal optimization anyway ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #8

Madison Kelly

Alvaro Herrera wrote:

On Wed, Jun 23, 2004 at 01:15:23PM +0900, Joel Matthew wrote:
What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"?

Well, theoretically, in the latter case, the database will allocate 256
(257? 259?) bytes for that field in the record itself. That is, that
field will consume 256 bytes for each record stored.

Nope. Actually, in both cases the length will be stored first (4 bytes)
and the actual content following it, using the indicated amount of
bytes. There's absolutely no difference in storage.

A varchar(256) field will allow you to store a text not with 256 bytes
max, but 256 _chars_ max. Think multibyte encodings such as utf8 -- the
varchar(256) can take anything from 4 + 1 bytes (a single byte string) to
4 + 256 * max_bytes_per_char.
The difference you cite is for char(N) fields, which are always padded
with blanks to fill the N chars.

Thank you everyone for enswering, that actually clears up a lot. Thank
you too for not flaming this n00b!

Madison

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #9

Joel Matthew

> ... the length will be stored first (4 bytes)

and the actual content following it

Section 8.3 of the manual seems to indicate that the actual character
data is stored in a separate file (background table -- I guess that
might not be a separate file?) for text, unspecified width char, and
char fields which exceed the specified length.

--
Joel Matthew <re**@ddcom.co.jp>
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #10

Joel Matthew

> We used to have some attempts at optimizing on the assumption that

char(n) fields were physically fixed-width, but we gave it up as a
bad job several major releases back ... it was never more than a
very marginal optimization anyway ...

Does that mean that PostGreSQL fixes character width at thirty-two bits,
or that it uses UTF-8, or that it just stores what it gets?

(Checked chapter 8.3 in the manual, didn't see the answer there. Not
that I really want to know. With Unicode, trying to optimize record
sizes for char/text fields is a little like trying to play Russian
Roulette. Wait, is that no longer politically correct? Should it be
called six-chamber roulette, now? Don't want to offend anyone.)

--
Joel Matthew <re**@ddcom.co.jp>
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #11

Tom Lane

Joel Matthew <re**@ddcom.co.jp> writes:

Does that mean that PostGreSQL fixes character width at thirty-two bits,
or that it uses UTF-8, or that it just stores what it gets?
We store text data in the form indicated by the database encoding
setting. UCS-32 is not a supported encoding, but UTF-8 is --- among
others.
(Checked chapter 8.3 in the manual, didn't see the answer there. Not
that I really want to know. With Unicode, trying to optimize record
sizes for char/text fields is a little like trying to play Russian
Roulette.

No, it's entirely like pointless. You just don't know how many bytes
will be taken up by N characters.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #12

Tom Lane

Joel Matthew <re**@ddcom.co.jp> writes:

... the length will be stored first (4 bytes)
and the actual content following it
Section 8.3 of the manual seems to indicate that the actual character
data is stored in a separate file (background table -- I guess that
might not be a separate file?) for text, unspecified width char, and
char fields which exceed the specified length.

We may push very-wide fields out to a separate table ("toast table").
This has nothing to do with whether the field is text, varchar(n),
unspecified varchar, or whatever, but only with the physical size
of the data.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #13

Alvaro Herrera

On Wed, Jun 23, 2004 at 02:43:06PM +0900, Joel Matthew wrote:

We used to have some attempts at optimizing on the assumption that
char(n) fields were physically fixed-width, but we gave it up as a
bad job several major releases back ... it was never more than a
very marginal optimization anyway ...
Does that mean that PostGreSQL fixes character width at thirty-two bits,
or that it uses UTF-8, or that it just stores what it gets?

It'll use utf8 if configured to do so; it'll just store what it gets if
configured as SQL_ASCII (it isn't really ASCII); or it will convert from
the client encoding to the server encoding before storing, if they are
different.
(Checked chapter 8.3 in the manual, didn't see the answer there. Not
that I really want to know. With Unicode, trying to optimize record
sizes for char/text fields is a little like trying to play Russian
Roulette. Wait, is that no longer politically correct? Should it be
called six-chamber roulette, now? Don't want to offend anyone.)

Of course, the optimization could have only worked on fixed-width
encodings (not utf8 -- maybe possible with utf32, but Postgres doesn't
support that AFAIK), but since current versions enable multibyte by
default there's really no point in trying.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"I would rather have GNU than GNOT." (ccchips, lwn.net/Articles/37595/)
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #14

by: Sanjay Minni | last post by:

What is the datatype to be used for Primary Key columns for most optimised access given that - There will be a single column primary key only - The values will only be integers (but as...

Oracle Database

A way to decrease executable sizes?

by: Filipe Martins | last post by:

Hello. I've read somewhere that the executable is smaller if we use a source file for each function! So, I tested this with gcc and it seams to confirm! What seams to happen is that if we call...

C / C++

Column Sizes

by: Brett Maton | last post by:

Hi NG, How do I find out the size of a column ? I am retrieving large objects from the pg_largeobject table and creating a files, I would like to know the length of the data column so that I...

PostgreSQL Database

Key Numeric Datatype

by: Elias Farah | last post by:

Hi All, What are people's experience in timing Query Performance? I have tried in vain, but get many unpredictable results, with execution proceeding in the background and problems of timing...

Microsoft Access / VBA

Sizes of integers

by: Michael Brennan | last post by:

Hi, I wonder if there is any good reason to let different systems have different sizes of short, int and long? I think that makes it harder to plan which type to use for your program. For...

C / C++

About font sizes

by: deko | last post by:

As I understand it, most browser manufacturers have agreed on 16px for their default font size. So, this should be an accurate conversion for percentages: px % 16 = 100 14 = 87.5 13 =...

HTML / CSS

CHAR datatype issue

by: shsandeep | last post by:

Hi, I have a column of datatype CHAR(20). When the DataStage job loads data of length 15 into it, it pads the remaining 5 spaces with blank space. This is happening with all CHAR datatype...

DB2 Database

Extent Sizes and Page sizes

by: Asphalt Blazer | last post by:

Can anybody explain to me the relation between extentsize and pagesize. When I am creating a new tablespace what do I need to be careful of? Say pagesize is 8K is keeping extentsize(pages) 64 too...

DB2 Database

159

Sizes of Integer Types

by: Bob Timpkinson | last post by:

Hi, I have a 32-bit machine... Is there anyway I can get gcc to use the following integer sizes? char: 8 bits short: 16 bits int: 32 bits long: 64 bits long long: 128 bits

C / C++

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Datatype sizes; a space and speed issue?

Similar topics