ShmemAlloc errors

Nick Burrett

Hi,

I'm having a little difficultly with a database that I'm upgrading from
Postgres 7.2.1 to 7.3.4.

On the 7.2.1 server I did:

$ pg_dump fiveminute >fiveminute.db
$ pg_dump bandwidth >bandwidth.db

On the 7.3.2 server I did:

$ psql fiveminute <fiveminute.db
$ psql bandwidth <bandwidth.db

Both appear to import successfully. However if I then try to dump the
full database on 7.3.2 I get:

$ pg_dumpall >full.db
pg_dump: WARNING: ShmemAlloc: out of memory
pg_dump: Attempt to lock table "vs_dfa554862ac" failed. ERROR:
LockAcquire: lock table 1 is out of memory
pg_dumpall: pg_dump failed on bandwidth, exiting
$
I tried just dumping a single database and a different database. I get
the same error (note pg_dump gives no output):

$ pg_dump fiveminute
pg_dump: WARNING: ShmemAlloc: out of memory
pg_dump: Attempt to lock table "vs_surfinsavers" failed. ERROR:
LockAcquire: lock table 1 is out of memory
$

In these cases the postgres server processes don't consume much memory:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
postgres 31279 0.0 0.0 414852 960 pts/0 S 10:26 0:01
/usr/bin/postmaster -p 5432

postgres 31281 0.0 0.0 415824 868 pts/0 S 10:26 0:00 \_
postgres: stats buffer process

postgres 31282 0.0 0.0 414892 968 pts/0 S 10:26 0:00 | \_
postgres: stats collector process

postgres 32161 98.9 1.2 416076 12476 pts/0 R 12:59 1:13 \_
postgres: postgres fiveminute [local] SELECT
The machine has 1Gb RAM, runs RedHat 9, kernel-smp-2.4.20-20.9, i386
dual PIII. gcc 3.2.2 was the compiler used.

I set the kernel shared memory higher:

$ cat /proc/sys/kernel/shmmax
805306368

In postgresql.conf I have set:
shared_buffers=50000
wal_buffers = 64

I have tried default values for these and also shared_buffers=80000.

I have also tried deleting /var/lib/pgsql, running initdb and
re-importing the data.

Same results every time. 100% reproducable.

The fiveminute database is a 2.9Gb dump file. The bandwidth database is
100Mb dump file. With indexes, 16Gb disk space is required. They are
simple databases with about 3000 tables. Columns are just date, time
and 64-bit integer fields.

Upgrading to postgres 7.3 would be useful. But if I cannot dump the
data after importing it, then I stand no chance of upgrading to 7.4 in
the future.

Does anybody have any thoughts ?

Regards,
Nick.
--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #1

Subscribe Post Reply

2724

Joseph Shraibman

Nick Burrett wrote:

The machine has 1Gb RAM, runs RedHat 9, kernel-smp-2.4.20-20.9, i386
dual PIII. gcc 3.2.2 was the compiler used.

What does uname -a say?
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #2

Tom Lane

Nick Burrett <ni**@dsvr.net> writes:

$ pg_dumpall >full.db
pg_dump: WARNING: ShmemAlloc: out of memory
pg_dump: Attempt to lock table "vs_dfa554862ac" failed. ERROR:
LockAcquire: lock table 1 is out of memory
pg_dumpall: pg_dump failed on bandwidth, exiting

Looks like you need to increase max_locks_per_transaction in postgresql.conf.
(You'll need to restart the postmaster to make this take effect.)

We don't normally hear of people needing that --- is there anything
unusual about the schema of this database?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #3

Sean Chittenden

> > $ pg_dumpall >full.db

pg_dump: WARNING: ShmemAlloc: out of memory
pg_dump: Attempt to lock table "vs_dfa554862ac" failed. ERROR:
LockAcquire: lock table 1 is out of memory
pg_dumpall: pg_dump failed on bandwidth, exiting

Looks like you need to increase max_locks_per_transaction in
postgresql.conf. (You'll need to restart the postmaster to make
this take effect.)

We don't normally hear of people needing that --- is there anything
unusual about the schema of this database?

I've bumped into this a few times way back when... any chance the
error message could be improved to include a hint to increase
max_locks_per_transaction? It took a bit of time digging to figure
out that was a valid solution. -sc

--
Sean Chittenden

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #4

Tom Lane

Sean Chittenden <se**@chittenden.org> writes:

I've bumped into this a few times way back when... any chance the
error message could be improved to include a hint to increase
max_locks_per_transaction? It took a bit of time digging to figure
out that was a valid solution. -sc

Good idea --- done. (I may get some flak about violating the strings
freeze for 7.4 release, but I'll take the heat ...)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #5

Nick Burrett

Tom Lane wrote:

Nick Burrett <ni**@dsvr.net> writes:
$ pg_dumpall >full.db
pg_dump: WARNING: ShmemAlloc: out of memory
pg_dump: Attempt to lock table "vs_dfa554862ac" failed. ERROR:
LockAcquire: lock table 1 is out of memory
pg_dumpall: pg_dump failed on bandwidth, exiting

Looks like you need to increase max_locks_per_transaction in postgresql.conf.
(You'll need to restart the postmaster to make this take effect.)

I've tried that and indeeed it works. Thanks.
We don't normally hear of people needing that --- is there anything
unusual about the schema of this database?

Not particularly. The database consists of around 3000 tables created
using this:

CREATE TABLE vs_foo (date date NOT NULL,
time time NOT NULL,
bytesin int8 CHECK (bytesin >= 0),
bytesout int8 CHECK (bytesout >= 0));
Each table has around 1500 rows.

Incidently the dump and import reduced the disk space requirements from
25Gb to 9Gb. The database is vacummed monthly (data is only deleted
monthly) using VACUMM FULL. I can only presume that vacumming is not
designed to be *that* aggressive.
Cheers,
Nick.
--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #6

Tom Lane

Nick Burrett <ni**@dsvr.net> writes:

Tom Lane wrote:
We don't normally hear of people needing that --- is there anything
unusual about the schema of this database?
Not particularly. The database consists of around 3000 tables created
using this: CREATE TABLE vs_foo (date date NOT NULL,
time time NOT NULL,
bytesin int8 CHECK (bytesin >= 0),
bytesout int8 CHECK (bytesout >= 0)); Each table has around 1500 rows.

3000 tables? That's why you need so many locks. Have you thought about
collapsing these into *one* table with an extra key column? Also, it'd
likely be better to combine the date and time into a timestamp column.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #7

Nick Burrett

Tom Lane wrote:

Nick Burrett <ni**@dsvr.net> writes:
Tom Lane wrote:
We don't normally hear of people needing that --- is there anything
unusual about the schema of this database?

Not particularly. The database consists of around 3000 tables created
using this:

CREATE TABLE vs_foo (date date NOT NULL,
time time NOT NULL,
bytesin int8 CHECK (bytesin >= 0),
bytesout int8 CHECK (bytesout >= 0));

Each table has around 1500 rows.

3000 tables? That's why you need so many locks.

I'm surprised that I've never hit this problem before though.
Have you thought about
collapsing these into *one* table with an extra key column? Also, it'd
likely be better to combine the date and time into a timestamp column.

I tried it back in the days when we only had around 1000 tables.
Problem was that inserts and deletes took a *very* long time. IIRC a
one row insert was taking over 10 seconds. I think this was because the
index files were growing to several gigabytes.
Having everything in one large table would have been great and would
have made life much easier.

date and time were split to simplify queries. I think it had an impact
on index sizes.
Regards,
Nick.

--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #8

Tom Lane

Nick Burrett <ni**@dsvr.net> writes:

Tom Lane wrote:
Have you thought about
collapsing these into *one* table with an extra key column? Also, it'd
likely be better to combine the date and time into a timestamp column.
I tried it back in the days when we only had around 1000 tables.
Problem was that inserts and deletes took a *very* long time. IIRC a
one row insert was taking over 10 seconds. I think this was because the
index files were growing to several gigabytes.

That really shouldn't happen. If the date/time range that you are
storing moves over time (ie, you delete old stuff at the same rate you
are adding new stuff) then I can see that you would have had an
index-bloat problem, since the old btree code was unable to recycle
no-longer-needed subtrees. (This issue should go away in 7.4, FWIW.)
However, that should only affect the size of the index on disk; I can't
see that it would have much of an impact on insert or delete time.
For sure I don't see how an insert could take 10 seconds. What indexes
and foreign keys do you have on these tables?

It would be worth revisiting this and trying to understand where the
performance problem is, rather than just assuming that you have to work
around it.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #9

Nick Burrett

Tom Lane wrote:

Nick Burrett <ni**@dsvr.net> writes:
Tom Lane wrote:
Have you thought about
collapsing these into *one* table with an extra key column? Also, it'd
likely be better to combine the date and time into a timestamp column.

I tried it back in the days when we only had around 1000 tables.
Problem was that inserts and deletes took a *very* long time. IIRC a
one row insert was taking over 10 seconds. I think this was because the
index files were growing to several gigabytes.

That really shouldn't happen. If the date/time range that you are
storing moves over time (ie, you delete old stuff at the same rate you
are adding new stuff) then I can see that you would have had an
index-bloat problem, since the old btree code was unable to recycle
no-longer-needed subtrees. (This issue should go away in 7.4, FWIW.)
However, that should only affect the size of the index on disk; I can't
see that it would have much of an impact on insert or delete time.
For sure I don't see how an insert could take 10 seconds. What indexes
and foreign keys do you have on these tables?

It would be worth revisiting this and trying to understand where the
performance problem is, rather than just assuming that you have to work
around it.

Luckily my CVS tree has such information. Basically this database holds
samples of the number of bytes transferred in a five minute period for 3
months for 3000 servers. The data is used for billing purposes which
work on bytes transferred in a month, so things like RRD and MRTG are
not suitable. Five minute data is useful for spotting unusual traffic
patterns such that you might get from DoS attacks.

At the start of the fourth month, the data from the first month is
deleted and the database vacummed.

The original table definition was exactly this:

CREATE TABLE fiveminute ( server CHAR(32),
stamp TIMESTAMP,
bytesin BIGINT CHECK (bytesin >= 0),
bytesout BIGINT CHECK (bytesout >= 0));

CREATE UNIQUE INDEX fiveminute_idx ON fiveminute(server,stamp);

It probably would have made sense to just have an index on the server
column, but I can't remember why (decision was made Nov-2000). I
suspect that is the cause of the index bloat.

The database never survived operational use. The original import of
data took so long (2+ days) that the re-design was done almost immediately.

Converting back would be most useful. I might try to do this on a 7.4
beta database.

Regards,
Nick.

--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #10

Tom Lane

Nick Burrett <ni**@dsvr.net> writes:

The original table definition was exactly this: CREATE TABLE fiveminute ( server CHAR(32),
stamp TIMESTAMP,
bytesin BIGINT CHECK (bytesin >= 0),
bytesout BIGINT CHECK (bytesout >= 0)); CREATE UNIQUE INDEX fiveminute_idx ON fiveminute(server,stamp); The database never survived operational use. The original import of
data took so long (2+ days) that the re-design was done almost immediately.

It'd probably be better to load the initial data first and then build
the index; boosting sort_mem while building the index should be a help
too. But there's still something very fishy here. You said you had
1500 * 3000 = 4.5million rows in the tables. Even with the dumbest
approach to loading the data, I can't conceive of it taking 2 days.
Even on my oldest and slowest development machine, I can load a table
that size in about ten minutes, and create an index on it in another ten.

One thing that comes to mind is that what you eliminated appears to have
been an indexed char(32) column. We have seen reports of very slow
behavior on certain platforms if a non-C locale is used --- apparently
strcoll() is just horrendously slow in some implementations. It'd be
worth ensuring that your database is initdb'd in C locale. Also, is
there a good reason to pad every server name to 32 characters? Is the
32-character limit even real, or did you pluck that number from the air?
Usually I recommend type text (or equivalently, varchar with no specific
length limit) unless you have a clear application-driven need for a
specific length limit --- and even then it should be varchar(n) not
char(n).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 12 '05 #11

Manfred Koizar

On Fri, 17 Oct 2003 15:47:10 +0100, Nick Burrett <ni**@dsvr.net>
wrote:

CREATE TABLE fiveminute ( server CHAR(32),
stamp TIMESTAMP,
bytesin BIGINT CHECK (bytesin >= 0),
bytesout BIGINT CHECK (bytesout >= 0));

CREATE UNIQUE INDEX fiveminute_idx ON fiveminute(server,stamp);

Making this

CREATE TABLE server (
id int NOT NULL PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE fiveminute (
serverid int NOT NULL REFERENCES server,
stamp timestamp NOT NULL,
bytesin bigint CHECK (bytesin >= 0),
bytesout bigint CHECK (bytesout >= 0),
PRIMARY KEY (serverid, stamp)
);

should give you a much smaller index on fiveminute.

If you have to load lots of data initially, better create the tables
without primary and foreign keys, import data and then

ALTER TABLE server ADD PRIMARY KEY (id);
ALTER TABLE fiveminute ADD PRIMARY KEY (serverid, stamp);
ALTER TABLE fiveminute
ADD FOREIGN KEY (serverid) REFERENCES server;

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #12

Tom Lane

Manfred Koizar <mk*****@aon.at> writes:

[ factoring out server names into a separate table ]
should give you a much smaller index on fiveminute.

True, but it would also turn all his queries into joins. Might or might
not be a win in practice. (Worth considering, though.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #13

Greg Stark

Nick Burrett <ni**@dsvr.net> writes:

At the start of the fourth month, the data from the first month is deleted and
the database vacummed.
When dropping a quarter of the records of a large table you would need a very
large setting free space map. For an occasional periodic job like this you can
just use VACUUM FULL to rebuild the table and free up the space.
CREATE TABLE fiveminute ( server CHAR(32),
You know CHAR() is space padded, right?
CREATE UNIQUE INDEX fiveminute_idx ON fiveminute(server,stamp);

It probably would have made sense to just have an index on the server column,
but I can't remember why (decision was made Nov-2000). I suspect that is the
cause of the index bloat.
Well, having a unique index is good conceptually because it prevents duplicate
insertions from application errors. But it's probably not worth the
performance hit, given that there are a lot more errors it won't catch.
The database never survived operational use. The original import of data took
so long (2+ days) that the re-design was done almost immediately.

How were you importing? The fastest way would be to generate an ascii file in
the format \copy expects.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #14

Greg Stark

Greg Stark <gs*****@MIT.EDU> writes:

CREATE UNIQUE INDEX fiveminute_idx ON fiveminute(server,stamp);

It probably would have made sense to just have an index on the server column,
but I can't remember why (decision was made Nov-2000). I suspect that is the
cause of the index bloat.

There's also another option.

Create a single normalized table, but create separate partial indexes for each
server.

Ie:

create index idx_server_1 (fiveminute) on tab where serverid = 1;
create index idx_server_2 (fiveminute) on tab where serverid = 2;
create index idx_server_3 (fiveminute) on tab where serverid = 3;
....

This is a management headache, creating a new index every time you add a new
server. And the performance of the optimizer when there are thousands of
indexes to choose from would be, well, it would be an interesting experiment
:)

But it should be faster than having a single two-column index, largely because
it has half the data to store and read through.

This assumes all queries on the table have a clause like "where serverid = n".
It won't work for "where serverid between n and m" or for things like "order
by serverid, fiveminute"
Also, incidentally, do you have a lot of people downloading more than 4Gb in
five minutes? Maybe just regular integers would be big enough? They would be
faster.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 12 '05 #15

Holger Marzen

On Fri, 17 Oct 2003, Nick Burrett wrote:

It would be worth revisiting this and trying to understand where the
performance problem is, rather than just assuming that you have to work
around it.

Luckily my CVS tree has such information. Basically this database holds
samples of the number of bytes transferred in a five minute period for 3
months for 3000 servers. The data is used for billing purposes which
work on bytes transferred in a month, so things like RRD and MRTG are
not suitable. Five minute data is useful for spotting unusual traffic
patterns such that you might get from DoS attacks.

I use PostgreSQL for counting network traffic, I use a sample every five
minutes. Because my queries became too slow I simply added another table
that holds the data per day. Every day, yesterday's data get added,
inserted into the "day"-table and deleted from the 5-minutes-table. I
don't need the 5 minutes accuracy for all of the data.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #16

Greg Stark

Holger Marzen <ho****@marzen.de> writes:

I use PostgreSQL for counting network traffic, I use a sample every five
minutes. Because my queries became too slow I simply added another table
that holds the data per day. Every day, yesterday's data get added,
inserted into the "day"-table and deleted from the 5-minutes-table. I
don't need the 5 minutes accuracy for all of the data.

The original poster said he needed the 5 minute data.

However, perhaps a combination could be a good compromise. We used to keep raw
one-record-per-hit data in a table and queried that for statistics. Later we
aggregated the data once per hour but kept the raw data as well. The reports
used the aggregate data for speed but the raw data was still available for
debugging or auditing.

This was very handy when the database became too large, we started purging the
raw data after 30 days but the reports were all still fine as we could keep
the aggregate data indefinitely.

--
greg
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #17

Holger Marzen

On Sun, 19 Oct 2003, Greg Stark wrote:

Holger Marzen <ho****@marzen.de> writes:
I use PostgreSQL for counting network traffic, I use a sample every five
minutes. Because my queries became too slow I simply added another table
that holds the data per day. Every day, yesterday's data get added,
inserted into the "day"-table and deleted from the 5-minutes-table. I
don't need the 5 minutes accuracy for all of the data.
The original poster said he needed the 5 minute data.

Yes, but how long? Really for months? The above way of compressing data
can be altered, e.g. he could keep the 5 minutes data for a week or a
month and use the daily data for billing.
However, perhaps a combination could be a good compromise. We used to keep raw
one-record-per-hit data in a table and queried that for statistics. Later we
aggregated the data once per hour but kept the raw data as well. The reports
used the aggregate data for speed but the raw data was still available for
debugging or auditing.
Yes, exactly.
This was very handy when the database became too large, we started purging the
raw data after 30 days but the reports were all still fine as we could keep
the aggregate data indefinitely.

Yup.

--
PGP/GPG Key-ID:
http://blackhole.pca.dfn.de:11371/pk...rch=0xB5A1AFE1

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 12 '05 #18

Nick Burrett

Holger Marzen wrote:

On Sun, 19 Oct 2003, Greg Stark wrote:

Holger Marzen <ho****@marzen.de> writes:

I use PostgreSQL for counting network traffic, I use a sample every five
minutes. Because my queries became too slow I simply added another table
that holds the data per day. Every day, yesterday's data get added,
inserted into the "day"-table and deleted from the 5-minutes-table. I
don't need the 5 minutes accuracy for all of the data.

The original poster said he needed the 5 minute data.

Yes, but how long? Really for months? The above way of compressing data
can be altered, e.g. he could keep the 5 minutes data for a week or a
month and use the daily data for billing.

As was stated previously, the five-minute data is kept for 3 months to
examine traffic trends in more detail than can be gathered from weekly
or monthly data.

Each day, end-of-day totals for the five-minute data are calculated and
stored in a seperate database and kept for a minimum of 7 years.

The end-of-day data is used for billing. The five-minute data is used
when customers start to question the billing. We can look at the
five-minute data to see whether something unusual has happened and take
appropriate action.

I see nothing wrong with what I'm trying to achieve. Why should I
compromise the data that I store because I am having problems with the
speed of the database ?
The end-of-day data is held in a seperate database because the table
names are exactly the same as those in the five-minute database.
However the table formats are different (we don't have a time column).

Regards,
Nick.
--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #19

Nick Burrett

Tom Lane wrote:

Nick Burrett <ni**@dsvr.net> writes:
The original table definition was exactly this:
CREATE TABLE fiveminute ( server CHAR(32),
stamp TIMESTAMP,
bytesin BIGINT CHECK (bytesin >= 0),
bytesout BIGINT CHECK (bytesout >= 0));

CREATE UNIQUE INDEX fiveminute_idx ON fiveminute(server,stamp);

The database never survived operational use. The original import of
data took so long (2+ days) that the re-design was done almost immediately.

It'd probably be better to load the initial data first and then build
the index; boosting sort_mem while building the index should be a help
too. But there's still something very fishy here. You said you had
1500 * 3000 = 4.5million rows in the tables. Even with the dumbest
approach to loading the data, I can't conceive of it taking 2 days.
Even on my oldest and slowest development machine, I can load a table
that size in about ten minutes, and create an index on it in another ten.

What I'll do is try the setup again based on the recommendations that
have appeared here and see what results I get.

The original data was inserted using INSERT statements rather than COPY.
I can't remember whether indexes were dropped before-hand or not.
One thing that comes to mind is that what you eliminated appears to have
been an indexed char(32) column. We have seen reports of very slow
behavior on certain platforms if a non-C locale is used --- apparently
strcoll() is just horrendously slow in some implementations. It'd be
worth ensuring that your database is initdb'd in C locale. Also, is
there a good reason to pad every server name to 32 characters? Is the
32-character limit even real, or did you pluck that number from the air?
Usually I recommend type text (or equivalently, varchar with no specific
length limit) unless you have a clear application-driven need for a
specific length limit --- and even then it should be varchar(n) not
char(n).

The 32-character limit is a very real limit. However the all the data
is machine generated, so removing the limits is certainly possible.
Though the limit was put in place to retain some data integrity.

The original database used to be in MySQL. The CREATE INDEX
documentation for MySQL states that you must specify a prefix length for
TEXT columns.

I expected that specifying a specific length for a column would allow
for more efficient indexing and searching because:

a) you already know the exact length of the column
b) potentially the column-data could be stored starting on a word-boundary
c) apply string comparison functions that are optimised for data
starting on word-boundaries (i.e. by comparing words rather than bytes).
Certainly for the C-locale.
Regards,
Nick.
--
Nick Burrett
Network Engineer, Designer Servers Ltd. http://www.dsvr.co.uk
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #20

Tom Lane

Nick Burrett <ni**@dsvr.net> writes:

I expected that specifying a specific length for a column would allow
for more efficient indexing and searching because: a) you already know the exact length of the column
b) potentially the column-data could be stored starting on a word-boundary
c) apply string comparison functions that are optimised for data
starting on word-boundaries (i.e. by comparing words rather than bytes).
Certainly for the C-locale.

None of those considerations really apply for Postgres. We used to have
some (very marginal anyway) optimizations that assumed CHAR(n) is
fixed-width, but they went away when we added support for multibyte
character encodings. In any case there isn't anything here that
wouldn't be swamped by increased I/O demands due to the wasted space.
Maybe if all your machine names run between 29 and 32 characters it'd
be worth doing, but if you're paying any actual space cost to padding
I think it has to be a net loss.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #21

Similar topics