473,387 Members | 1,463 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Datatype sizes; a space and speed issue?

Hi again,

Something I have been wondering about and haven't found an answer to
yet is how the size of a datatype (I hope that is the right term)
effects performance. What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"? How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?

I hope this isn't too basic a question! Thanks!

Madison

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #1
13 3232
On Tuesday 22 June 2004 11:26 pm, Madison Kelly wrote:
Something I have been wondering about and haven't found an answer to
yet is how the size of a datatype (I hope that is the right term)
effects performance. What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"? How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?


There is no performance difference between varchar, char and text. Infact,
text is equal to varchar with no length specified (a Postgresql extension).

I use varchar when I want a max limit on the number of characters. Char when
I want a set number of characters (auto space padded). And use text all
other times, because it is more readable than varchar.

All this and more can be found in the Docs - 7.4.2: Ch 8: Data Types, Sec 8.3:
Character Types.

good luck,

Vams

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #2
> What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"?
Well, theoretically, in the latter case, the database will allocate 256
(257? 259?) bytes for that field in the record itself. That is, that
field will consume 256 bytes for each record stored.

In the former case, the database will (again, theoretically) allocate a
string pool (think of a separate file, as one way to do it) for the
field, and will allocate a pointer (4 or 8 bytes) into the string pool
for the field. This means that the field will consume only four or eight
bytes for each record stored, plus (don't forget this, it's important)
however many bytes are actually needed in the string pool for the text
value. If you have, for instance, a fifteen character name stored in
Unicode UTF32, there will be ninety, erm, octets (think bytes for most
CPUs) in the pool for that field of that record, plus a few pointers and
such so the system can tell where to collect garbage.
How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?
Well, that would reduce the storage requirements for that field by half.
It might also prevent you from storing necessary information. That's
easily 128 characters if you're only using US-ASCII in UTF-8, but it's
only 32 characters of Unicode in UTF-32, and it could be anywhere
between 128 and 32 in Unicode UTF-8. Making a good prediction about the
tradeoff is one of the things a database engineer is paid for.
I hope this isn't too basic a question!


Since you ask, yeah, it is, but unless things have changed here recently,
the people here aren't going to raise the oven temperature too high when
they roast me for answering it.

--
Joel Matthew <re**@ddcom.co.jp>
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #3

On Wed, 23 Jun 2004, Joel Matthew wrote:
What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"?


Well, theoretically, in the latter case, the database will allocate 256
(257? 259?) bytes for that field in the record itself. That is, that
field will consume 256 bytes for each record stored.


Both text and varchar are stored in PostgreSQL as length + string so
varchar(256) doesn't not require storage of the unused bytes. char space
pads so it's the oddball (requiring length + padded string).
How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?


Well, that would reduce the storage requirements for that field by half.
It might also prevent you from storing necessary information. That's
easily 128 characters if you're only using US-ASCII in UTF-8, but it's
only 32 characters of Unicode in UTF-32, and it could be anywhere
between 128 and 32 in Unicode UTF-8. Making a good prediction about the
tradeoff is one of the things a database engineer is paid for.


varchar lengths should be in characters not bytes, so that should be 128
characters in any of the encodings, but the actual number of bytes that
those 128 characters can take up may vary.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #4
Vams wrote:
On Tuesday 22 June 2004 11:26 pm, Madison Kelly wrote:
Something I have been wondering about and haven't found an answer to
yet is how the size of a datatype (I hope that is the right term)
effects performance. What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"? How much benefit is there do being more aggresive
and say cutting it in half again by using "VARCHAR(128)"?

There is no performance difference between varchar, char and text. Infact,
text is equal to varchar with no length specified (a Postgresql extension).

I use varchar when I want a max limit on the number of characters. Char when
I want a set number of characters (auto space padded). And use text all
other times, because it is more readable than varchar.

All this and more can be found in the Docs - 7.4.2: Ch 8: Data Types, Sec 8.3:
Character Types.

good luck,

Vams


Thank you very much Vams!!

Madison

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #5
> Both text and varchar are stored in PostgreSQL as length + string so
varchar(256) doesn't not require storage of the unused bytes. char space
pads so it's the oddball (requiring length + padded string).
Yeah, I saw Vams's post and r3m3mb3r3d that I'd f0rg0t.
varchar lengths should be in characters not bytes, so that should be 128
characters in any of the encodings, but the actual number of bytes that
those 128 characters can take up may vary.


And that's actually the best rationale, IMO, for making the optimization
a full-fledged feature. Boy, the things you forget ...

--
Joel Matthew <re**@ddcom.co.jp>
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #6
On Wed, Jun 23, 2004 at 01:15:23PM +0900, Joel Matthew wrote:
What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"?


Well, theoretically, in the latter case, the database will allocate 256
(257? 259?) bytes for that field in the record itself. That is, that
field will consume 256 bytes for each record stored.


Nope. Actually, in both cases the length will be stored first (4 bytes)
and the actual content following it, using the indicated amount of
bytes. There's absolutely no difference in storage.

A varchar(256) field will allow you to store a text not with 256 bytes
max, but 256 _chars_ max. Think multibyte encodings such as utf8 -- the
varchar(256) can take anything from 4 + 1 bytes (a single byte string) to
4 + 256 * max_bytes_per_char.
The difference you cite is for char(N) fields, which are always padded
with blanks to fill the N chars.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"There is evil in the world. There are dark, awful things. Occasionally, we get
a glimpse of them. But there are dark corners; horrors almost impossible to
imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #7
Alvaro Herrera <al******@dcc.uchile.cl> writes:
The difference you cite is for char(N) fields, which are always padded
with blanks to fill the N chars.


And, again, that's N logical *characters*, not bytes.

We used to have some attempts at optimizing on the assumption that
char(n) fields were physically fixed-width, but we gave it up as a
bad job several major releases back ... it was never more than a
very marginal optimization anyway ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #8
Alvaro Herrera wrote:
On Wed, Jun 23, 2004 at 01:15:23PM +0900, Joel Matthew wrote:
What effect is there if I specify "TEXT" instead of
say "VARCHAR(255)"?


Well, theoretically, in the latter case, the database will allocate 256
(257? 259?) bytes for that field in the record itself. That is, that
field will consume 256 bytes for each record stored.

Nope. Actually, in both cases the length will be stored first (4 bytes)
and the actual content following it, using the indicated amount of
bytes. There's absolutely no difference in storage.

A varchar(256) field will allow you to store a text not with 256 bytes
max, but 256 _chars_ max. Think multibyte encodings such as utf8 -- the
varchar(256) can take anything from 4 + 1 bytes (a single byte string) to
4 + 256 * max_bytes_per_char.
The difference you cite is for char(N) fields, which are always padded
with blanks to fill the N chars.


Thank you everyone for enswering, that actually clears up a lot. Thank
you too for not flaming this n00b!

Madison

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #9
> ... the length will be stored first (4 bytes)
and the actual content following it


Section 8.3 of the manual seems to indicate that the actual character
data is stored in a separate file (background table -- I guess that
might not be a separate file?) for text, unspecified width char, and
char fields which exceed the specified length.

--
Joel Matthew <re**@ddcom.co.jp>
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #10
> We used to have some attempts at optimizing on the assumption that
char(n) fields were physically fixed-width, but we gave it up as a
bad job several major releases back ... it was never more than a
very marginal optimization anyway ...


Does that mean that PostGreSQL fixes character width at thirty-two bits,
or that it uses UTF-8, or that it just stores what it gets?

(Checked chapter 8.3 in the manual, didn't see the answer there. Not
that I really want to know. With Unicode, trying to optimize record
sizes for char/text fields is a little like trying to play Russian
Roulette. Wait, is that no longer politically correct? Should it be
called six-chamber roulette, now? Don't want to offend anyone.)

--
Joel Matthew <re**@ddcom.co.jp>
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #11
Joel Matthew <re**@ddcom.co.jp> writes:
Does that mean that PostGreSQL fixes character width at thirty-two bits,
or that it uses UTF-8, or that it just stores what it gets?
We store text data in the form indicated by the database encoding
setting. UCS-32 is not a supported encoding, but UTF-8 is --- among
others.
(Checked chapter 8.3 in the manual, didn't see the answer there. Not
that I really want to know. With Unicode, trying to optimize record
sizes for char/text fields is a little like trying to play Russian
Roulette.


No, it's entirely like pointless. You just don't know how many bytes
will be taken up by N characters.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #12
Joel Matthew <re**@ddcom.co.jp> writes:
... the length will be stored first (4 bytes)
and the actual content following it
Section 8.3 of the manual seems to indicate that the actual character
data is stored in a separate file (background table -- I guess that
might not be a separate file?) for text, unspecified width char, and
char fields which exceed the specified length.


We may push very-wide fields out to a separate table ("toast table").
This has nothing to do with whether the field is text, varchar(n),
unspecified varchar, or whatever, but only with the physical size
of the data.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #13
On Wed, Jun 23, 2004 at 02:43:06PM +0900, Joel Matthew wrote:
We used to have some attempts at optimizing on the assumption that
char(n) fields were physically fixed-width, but we gave it up as a
bad job several major releases back ... it was never more than a
very marginal optimization anyway ...
Does that mean that PostGreSQL fixes character width at thirty-two bits,
or that it uses UTF-8, or that it just stores what it gets?


It'll use utf8 if configured to do so; it'll just store what it gets if
configured as SQL_ASCII (it isn't really ASCII); or it will convert from
the client encoding to the server encoding before storing, if they are
different.
(Checked chapter 8.3 in the manual, didn't see the answer there. Not
that I really want to know. With Unicode, trying to optimize record
sizes for char/text fields is a little like trying to play Russian
Roulette. Wait, is that no longer politically correct? Should it be
called six-chamber roulette, now? Don't want to offend anyone.)


Of course, the optimization could have only worked on fixed-width
encodings (not utf8 -- maybe possible with utf32, but Postgres doesn't
support that AFAIK), but since current versions enable multibyte by
default there's really no point in trying.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"I would rather have GNU than GNOT." (ccchips, lwn.net/Articles/37595/)
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: Sanjay Minni | last post by:
What is the datatype to be used for Primary Key columns for most optimised access given that - There will be a single column primary key only - The values will only be integers (but as...
17
by: Filipe Martins | last post by:
Hello. I've read somewhere that the executable is smaller if we use a source file for each function! So, I tested this with gcc and it seams to confirm! What seams to happen is that if we call...
6
by: Brett Maton | last post by:
Hi NG, How do I find out the size of a column ? I am retrieving large objects from the pg_largeobject table and creating a files, I would like to know the length of the data column so that I...
14
by: Elias Farah | last post by:
Hi All, What are people's experience in timing Query Performance? I have tried in vain, but get many unpredictable results, with execution proceeding in the background and problems of timing...
14
by: Michael Brennan | last post by:
Hi, I wonder if there is any good reason to let different systems have different sizes of short, int and long? I think that makes it harder to plan which type to use for your program. For...
60
by: deko | last post by:
As I understand it, most browser manufacturers have agreed on 16px for their default font size. So, this should be an accurate conversion for percentages: px % 16 = 100 14 = 87.5 13 =...
1
by: shsandeep | last post by:
Hi, I have a column of datatype CHAR(20). When the DataStage job loads data of length 15 into it, it pads the remaining 5 spaces with blank space. This is happening with all CHAR datatype...
3
by: Asphalt Blazer | last post by:
Can anybody explain to me the relation between extentsize and pagesize. When I am creating a new tablespace what do I need to be careful of? Say pagesize is 8K is keeping extentsize(pages) 64 too...
159
by: Bob Timpkinson | last post by:
Hi, I have a 32-bit machine... Is there anyway I can get gcc to use the following integer sizes? char: 8 bits short: 16 bits int: 32 bits long: 64 bits long long: 128 bits
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.