By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
435,039 Members | 1,824 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 435,039 IT Pros & Developers. It's quick & easy.

how does license work for non-profit companies?

P: n/a
A non-profit organization is interested in a new data application that
would use a SQL storage system. I'm interested to know how non-profit
companies that are not selling products are considered for licensing.
Can they use PostgreSQL just like anyone else or do non-profits qualify
as commercial use and need to consider other licensing?

D. Stimits, stimits AT comcast DOT net

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #1
Share this Question
Share on Google+
16 Replies


P: n/a
"D. Stimits" <st*****@comcast.net> wrote:
A non-profit organization is interested in a new data application that
would use a SQL storage system. I'm interested to know how non-profit
companies that are not selling products are considered for licensing.
Can they use PostgreSQL just like anyone else or do non-profits qualify
as commercial use and need to consider other licensing?


I am not a lawyer.

The BSD license (under which PostgreSQL is distributed) does not differentiate
between different types or classes of users. You are free to use PostgreSQL
for whatever you need, for no fee, as long as you don't violate the terms of the
license.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #2

P: n/a
"D. Stimits" <st*****@comcast.net> wrote:
A non-profit organization is interested in a new data application that
would use a SQL storage system. I'm interested to know how non-profit
companies that are not selling products are considered for licensing.
Can they use PostgreSQL just like anyone else or do non-profits qualify
as commercial use and need to consider other licensing?


I am not a lawyer.

The BSD license (under which PostgreSQL is distributed) does not differentiate
between different types or classes of users. You are free to use PostgreSQL
for whatever you need, for no fee, as long as you don't violate the terms of the
license.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #3

P: n/a
On Tue, 2004-06-15 at 08:20, D. Stimits wrote:
A non-profit organization is interested in a new data application that
would use a SQL storage system. I'm interested to know how non-profit
companies that are not selling products are considered for licensing.
Can they use PostgreSQL just like anyone else or do non-profits qualify
as commercial use and need to consider other licensing?

D. Stimits, stimits AT comcast DOT net


The basic philosophy of the PostgreSQL Global Development Group seems to
be: Do what you want with it, just don't sue us.
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #4

P: n/a
On Tue, 2004-06-15 at 08:20, D. Stimits wrote:
A non-profit organization is interested in a new data application that
would use a SQL storage system. I'm interested to know how non-profit
companies that are not selling products are considered for licensing.
Can they use PostgreSQL just like anyone else or do non-profits qualify
as commercial use and need to consider other licensing?

D. Stimits, stimits AT comcast DOT net


The basic philosophy of the PostgreSQL Global Development Group seems to
be: Do what you want with it, just don't sue us.
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #5

P: n/a
On Tue, 15 Jun 2004 14:20:21 +0000 "D. Stimits" <st*****@comcast.net> wrote:
A non-profit organization is interested in a new data application that
would use a SQL storage system. I'm interested to know how non-profit
companies that are not selling products are considered for licensing.
Can they use PostgreSQL just like anyone else or do non-profits qualify
as commercial use and need to consider other licensing?


umm, go read the license.

the BSD license is quite simple and quite clear. nobody has to pay.
ever.

richard
--
Richard Welty rw****@averillpark.net
Averill Park Networking 518-573-7592
Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #6

P: n/a
On Tue, 15 Jun 2004 14:20:21 +0000 "D. Stimits" <st*****@comcast.net> wrote:
A non-profit organization is interested in a new data application that
would use a SQL storage system. I'm interested to know how non-profit
companies that are not selling products are considered for licensing.
Can they use PostgreSQL just like anyone else or do non-profits qualify
as commercial use and need to consider other licensing?


umm, go read the license.

the BSD license is quite simple and quite clear. nobody has to pay.
ever.

richard
--
Richard Welty rw****@averillpark.net
Averill Park Networking 518-573-7592
Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 23 '05 #7

P: n/a
On Tue, 2004-06-15 at 22:38, Richard Welty wrote:
On Tue, 15 Jun 2004 14:20:21 +0000 "D. Stimits" <st*****@comcast.net> wrote:
A non-profit organization is interested in a new data application that
would use a SQL storage system. I'm interested to know how non-profit
companies that are not selling products are considered for licensing.
Can they use PostgreSQL just like anyone else or do non-profits qualify
as commercial use and need to consider other licensing?


umm, go read the license.

the BSD license is quite simple and quite clear. nobody has to pay.
ever.


Doesn't the open source community just ROCK?!

It's like Findhorn for Techies.

--
Andrew Kelly
Online Products Developer
Transparency International e.V.
Otto-Suhr-Allee 97/99
10585 Berlin, Germany
Tel: +4930 3530 5701
Fax: +4930 3530 5702
mailto:de***@corisweb.org

http://www.corisweb.org
http://www.transparency.org
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #8

P: n/a
On Tue, 2004-06-15 at 22:38, Richard Welty wrote:
On Tue, 15 Jun 2004 14:20:21 +0000 "D. Stimits" <st*****@comcast.net> wrote:
A non-profit organization is interested in a new data application that
would use a SQL storage system. I'm interested to know how non-profit
companies that are not selling products are considered for licensing.
Can they use PostgreSQL just like anyone else or do non-profits qualify
as commercial use and need to consider other licensing?


umm, go read the license.

the BSD license is quite simple and quite clear. nobody has to pay.
ever.


Doesn't the open source community just ROCK?!

It's like Findhorn for Techies.

--
Andrew Kelly
Online Products Developer
Transparency International e.V.
Otto-Suhr-Allee 97/99
10585 Berlin, Germany
Tel: +4930 3530 5701
Fax: +4930 3530 5702
mailto:de***@corisweb.org

http://www.corisweb.org
http://www.transparency.org
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #9

P: n/a
Hi,

We're working since 4 years on Postgres for a e-crm web based
application that deals with emails.

As our customers bases keep growing (more than one million rows in
sereval tables) we started to face performance issue with 7.2 and
decided to upgrade to 7.4 since one month.

The upgrade process was ok.

But we still face performances problems.

We decided to import the database into SQLServer to make some benchmarks
between the two with same hardware and same data.

Please find included main query we're testing on, the explain plan and
indexes list.

Here is the result :

Total running time Postges: 10 - 11s
Total running time sql Server : < 1s

We must admit that we're quite disappointed with the result and
currently think there is something wrong in the way we're working with
Postgres.

What do you think? is there some wrong with the configuration? with the
way we're building the query?

Thanks a lot for your answers.

Christophe Musielak
Christine Bruzaud
Akio Software


PS : Concerning the seq_scan shown in the explain plan, as there is
indexes on the tables as show below, we think Postgres is choosing
seq_scan versus index_scan to improve performance. We tried to force
using index_scan with 'set enable_seqscan = false) but there is no
improvement in the total duration of the query.
------------------------------------------------------------------------

We're running our tests on :

Table threads rows number : 125 000
Table emails rows number : 650 000
Table operators rows number : 50
Total nb rows returned by the select : 116 000 (without the LIMIT)
explain SELECT t.*, substring(t.subject::text, 0, 30) AS subject,
e.email,
o.lastname AS "operator"
FROM threads t JOIN emails e ON (e.id = t.client_email_id)
JOIN operators o ON (o.id = t.operator_id)
WHERE t.mailbox_id IN ( 2,3,5,20,21,13,22,23,24,25,26,19 )
and t.desktop = 2
ORDER BY t.date_last asc, t.id asc
LIMIT 16 OFFSET 0;

QUERY PLAN
------------------------------------------------------------------

Limit (cost=93539.02..93539.06 rows=16 width=619)
-> Sort (cost=93539.02..93745.69 rows=82669 width=619)
Sort Key: t.date_last, t.id
-> Hash Join (cost=26186.03..73789.00 rows=82669 width=619)
Hash Cond: ("outer".operator_id = "inner".id)
-> Merge Join (cost=26183.07..69487.26 rows=82668
width=609)
Merge Cond: ("outer".id = "inner".client_email_id)
-> Index Scan using emails_pkey on emails e
(cost=0.00..37782.57 rows=654511 width=30)
-> Sort (cost=26183.07..26389.74 rows=82667
width=583)
Sort Key: t.client_email_id
-> Seq Scan on threads t
(cost=0.00..19431.23
rows=82667 width=583)
Filter: (((mailbox_id = 2) OR
(mailbox_id
= 3) OR (mailbox_id = 5) OR (mailbox_id = 20) OR (mailbox_id = 21) OR
(mailbox_id = 13) OR (mailbox_id = 22) OR (mailbox_id =
23)
OR (mailbox_id = 24) OR (mailbox_id = 25) OR (mailbox_id = 26)
OR (mailbox_id = 19)) AND (desktop = 2))
-> Hash (cost=2.85..2.85 rows=44 width=14)
-> Seq Scan on operators o (cost=0.00..2.85
rows=44
width=14)
------------------------------------------------------------------

Indexes on threads :

threads_pkey PRIMARY KEY (id)
CREATE INDEX threads_date_last_idx ON threads (date_last);
CREATE INDEX threads_desktop_idx ON threads (desktop);
CREATE INDEX threads_operator_id_idx ON threads (operator_id);
CREATE UNIQUE INDEX threads_pri_dlast_id_idx ON threads (priority,
date_last, id);

Indexes on emails :

emails_pkey PRIMARY KEY (id)

Indexes on operators :

operators_pkey PRIMARY KEY (id)

-----------------------------------------------------------------
postgresql.conf modified parameters :

shared_buffers = 14000 # = 112 Mo with 10 Mo used by wal_buffers
sort_mem = 51200 # = 50 Mo
vacuum_mem = 102400 # = 100 Mo
# Recommended : 64 MB for 1 - 2 Go RAM
wal_buffers = 1280 # = 10240 Ko = 10 Mo
effective_cache_size = 65536 # Choice : 50% of RAM
# <=> 1 * 1024 * 1024 * .50 / 8 = 65536
random_page_cost = 2 # make the planner favor indexscans
cpu_tuple_cost = 0.042 # got the planner to choose the index

stats_command_string = true
stats_block_level = true
stats_row_level = true

datestyle = 'iso, dmy'
------------------------------------------------------------------



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #10

P: n/a
Hi,

We're working since 4 years on Postgres for a e-crm web based
application that deals with emails.

As our customers bases keep growing (more than one million rows in
sereval tables) we started to face performance issue with 7.2 and
decided to upgrade to 7.4 since one month.

The upgrade process was ok.

But we still face performances problems.

We decided to import the database into SQLServer to make some benchmarks
between the two with same hardware and same data.

Please find included main query we're testing on, the explain plan and
indexes list.

Here is the result :

Total running time Postges: 10 - 11s
Total running time sql Server : < 1s

We must admit that we're quite disappointed with the result and
currently think there is something wrong in the way we're working with
Postgres.

What do you think? is there some wrong with the configuration? with the
way we're building the query?

Thanks a lot for your answers.

Christophe Musielak
Christine Bruzaud
Akio Software


PS : Concerning the seq_scan shown in the explain plan, as there is
indexes on the tables as show below, we think Postgres is choosing
seq_scan versus index_scan to improve performance. We tried to force
using index_scan with 'set enable_seqscan = false) but there is no
improvement in the total duration of the query.
------------------------------------------------------------------------

We're running our tests on :

Table threads rows number : 125 000
Table emails rows number : 650 000
Table operators rows number : 50
Total nb rows returned by the select : 116 000 (without the LIMIT)
explain SELECT t.*, substring(t.subject::text, 0, 30) AS subject,
e.email,
o.lastname AS "operator"
FROM threads t JOIN emails e ON (e.id = t.client_email_id)
JOIN operators o ON (o.id = t.operator_id)
WHERE t.mailbox_id IN ( 2,3,5,20,21,13,22,23,24,25,26,19 )
and t.desktop = 2
ORDER BY t.date_last asc, t.id asc
LIMIT 16 OFFSET 0;

QUERY PLAN
------------------------------------------------------------------

Limit (cost=93539.02..93539.06 rows=16 width=619)
-> Sort (cost=93539.02..93745.69 rows=82669 width=619)
Sort Key: t.date_last, t.id
-> Hash Join (cost=26186.03..73789.00 rows=82669 width=619)
Hash Cond: ("outer".operator_id = "inner".id)
-> Merge Join (cost=26183.07..69487.26 rows=82668
width=609)
Merge Cond: ("outer".id = "inner".client_email_id)
-> Index Scan using emails_pkey on emails e
(cost=0.00..37782.57 rows=654511 width=30)
-> Sort (cost=26183.07..26389.74 rows=82667
width=583)
Sort Key: t.client_email_id
-> Seq Scan on threads t
(cost=0.00..19431.23
rows=82667 width=583)
Filter: (((mailbox_id = 2) OR
(mailbox_id
= 3) OR (mailbox_id = 5) OR (mailbox_id = 20) OR (mailbox_id = 21) OR
(mailbox_id = 13) OR (mailbox_id = 22) OR (mailbox_id =
23)
OR (mailbox_id = 24) OR (mailbox_id = 25) OR (mailbox_id = 26)
OR (mailbox_id = 19)) AND (desktop = 2))
-> Hash (cost=2.85..2.85 rows=44 width=14)
-> Seq Scan on operators o (cost=0.00..2.85
rows=44
width=14)
------------------------------------------------------------------

Indexes on threads :

threads_pkey PRIMARY KEY (id)
CREATE INDEX threads_date_last_idx ON threads (date_last);
CREATE INDEX threads_desktop_idx ON threads (desktop);
CREATE INDEX threads_operator_id_idx ON threads (operator_id);
CREATE UNIQUE INDEX threads_pri_dlast_id_idx ON threads (priority,
date_last, id);

Indexes on emails :

emails_pkey PRIMARY KEY (id)

Indexes on operators :

operators_pkey PRIMARY KEY (id)

-----------------------------------------------------------------
postgresql.conf modified parameters :

shared_buffers = 14000 # = 112 Mo with 10 Mo used by wal_buffers
sort_mem = 51200 # = 50 Mo
vacuum_mem = 102400 # = 100 Mo
# Recommended : 64 MB for 1 - 2 Go RAM
wal_buffers = 1280 # = 10240 Ko = 10 Mo
effective_cache_size = 65536 # Choice : 50% of RAM
# <=> 1 * 1024 * 1024 * .50 / 8 = 65536
random_page_cost = 2 # make the planner favor indexscans
cpu_tuple_cost = 0.042 # got the planner to choose the index

stats_command_string = true
stats_block_level = true
stats_row_level = true

datestyle = 'iso, dmy'
------------------------------------------------------------------



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #11

P: n/a
Christophe Musielak wrote:
Hi,

We're working since 4 years on Postgres for a e-crm web based
application that deals with emails.
A few immediate observations:

1. Try not to reply to an existing message when posting a new question,
it can mess up threaded views.
2. The performance list is probably the best place for this.
Total running time Postges: 10 - 11s
Total running time sql Server : < 1s

We must admit that we're quite disappointed with the result and
currently think there is something wrong in the way we're working with
Postgres.
Hmm - it's not common to get that sort of difference.
PS : Concerning the seq_scan shown in the explain plan, as there is
indexes on the tables as show below, we think Postgres is choosing
seq_scan versus index_scan to improve performance. We tried to force
using index_scan with 'set enable_seqscan = false) but there is no
improvement in the total duration of the query.
Which suggests PG is getting it right.
explain SELECT t.*, substring(t.subject::text, 0, 30) AS subject,
e.email,
o.lastname AS "operator"
FROM threads t JOIN emails e ON (e.id = t.client_email_id)
JOIN operators o ON (o.id = t.operator_id)
WHERE t.mailbox_id IN ( 2,3,5,20,21,13,22,23,24,25,26,19 )
and t.desktop = 2
ORDER BY t.date_last asc, t.id asc
LIMIT 16 OFFSET 0;
OK, so you're filtering on desktop,mailbox_id mostly. Could you post the
output of EXPLAIN ANALYSE rather than just EXPLAIN. That'll show us not
just what PG thinks should happen, but what actually did happen too.

From the explain, the greatest cost seems to be the join on threads to
email, where it's using the index scan on emails_pkey.

QUERY PLAN
------------------------------------------------------------------

Limit (cost=93539.02..93539.06 rows=16 width=619)
-> Sort (cost=93539.02..93745.69 rows=82669 width=619)
Sort Key: t.date_last, t.id
-> Hash Join (cost=26186.03..73789.00 rows=82669 width=619)
Hash Cond: ("outer".operator_id = "inner".id)
-> Merge Join (cost=26183.07..69487.26 rows=82668
width=609)
Merge Cond: ("outer".id = "inner".client_email_id)
-> Index Scan using emails_pkey on emails e
(cost=0.00..37782.57 rows=654511 width=30)
-> Sort (cost=26183.07..26389.74 rows=82667
width=583)
Sort Key: t.client_email_id
-> Seq Scan on threads t
(cost=0.00..19431.23
rows=82667 width=583)
Filter: (((mailbox_id = 2) OR
(mailbox_id
= 3) OR (mailbox_id = 5) OR (mailbox_id = 20) OR (mailbox_id = 21) OR
(mailbox_id = 13) OR (mailbox_id = 22) OR (mailbox_id =
23)
OR (mailbox_id = 24) OR (mailbox_id = 25) OR (mailbox_id = 26)
OR (mailbox_id = 19)) AND (desktop = 2))
-> Hash (cost=2.85..2.85 rows=44 width=14)
-> Seq Scan on operators o (cost=0.00..2.85
rows=44
width=14)
------------------------------------------------------------------

Indexes on threads :

threads_pkey PRIMARY KEY (id)
CREATE INDEX threads_date_last_idx ON threads (date_last);
CREATE INDEX threads_desktop_idx ON threads (desktop);
CREATE INDEX threads_operator_id_idx ON threads (operator_id);
CREATE UNIQUE INDEX threads_pri_dlast_id_idx ON threads (priority,
date_last, id);


None of these indexes are going to be much use here - you're mostly
filtering on mailbox_id. Unless "desktop" is very selective then a
sequential-scan seems sensible.

I'd recommend running an EXPLAIN ANALYSE and posting that to the
performance list.

You might want to include your postgresql.conf settings along with some
more details on your hardware. b
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #12

P: n/a
Christophe Musielak wrote:
Hi,

We're working since 4 years on Postgres for a e-crm web based
application that deals with emails.
A few immediate observations:

1. Try not to reply to an existing message when posting a new question,
it can mess up threaded views.
2. The performance list is probably the best place for this.
Total running time Postges: 10 - 11s
Total running time sql Server : < 1s

We must admit that we're quite disappointed with the result and
currently think there is something wrong in the way we're working with
Postgres.
Hmm - it's not common to get that sort of difference.
PS : Concerning the seq_scan shown in the explain plan, as there is
indexes on the tables as show below, we think Postgres is choosing
seq_scan versus index_scan to improve performance. We tried to force
using index_scan with 'set enable_seqscan = false) but there is no
improvement in the total duration of the query.
Which suggests PG is getting it right.
explain SELECT t.*, substring(t.subject::text, 0, 30) AS subject,
e.email,
o.lastname AS "operator"
FROM threads t JOIN emails e ON (e.id = t.client_email_id)
JOIN operators o ON (o.id = t.operator_id)
WHERE t.mailbox_id IN ( 2,3,5,20,21,13,22,23,24,25,26,19 )
and t.desktop = 2
ORDER BY t.date_last asc, t.id asc
LIMIT 16 OFFSET 0;
OK, so you're filtering on desktop,mailbox_id mostly. Could you post the
output of EXPLAIN ANALYSE rather than just EXPLAIN. That'll show us not
just what PG thinks should happen, but what actually did happen too.

From the explain, the greatest cost seems to be the join on threads to
email, where it's using the index scan on emails_pkey.

QUERY PLAN
------------------------------------------------------------------

Limit (cost=93539.02..93539.06 rows=16 width=619)
-> Sort (cost=93539.02..93745.69 rows=82669 width=619)
Sort Key: t.date_last, t.id
-> Hash Join (cost=26186.03..73789.00 rows=82669 width=619)
Hash Cond: ("outer".operator_id = "inner".id)
-> Merge Join (cost=26183.07..69487.26 rows=82668
width=609)
Merge Cond: ("outer".id = "inner".client_email_id)
-> Index Scan using emails_pkey on emails e
(cost=0.00..37782.57 rows=654511 width=30)
-> Sort (cost=26183.07..26389.74 rows=82667
width=583)
Sort Key: t.client_email_id
-> Seq Scan on threads t
(cost=0.00..19431.23
rows=82667 width=583)
Filter: (((mailbox_id = 2) OR
(mailbox_id
= 3) OR (mailbox_id = 5) OR (mailbox_id = 20) OR (mailbox_id = 21) OR
(mailbox_id = 13) OR (mailbox_id = 22) OR (mailbox_id =
23)
OR (mailbox_id = 24) OR (mailbox_id = 25) OR (mailbox_id = 26)
OR (mailbox_id = 19)) AND (desktop = 2))
-> Hash (cost=2.85..2.85 rows=44 width=14)
-> Seq Scan on operators o (cost=0.00..2.85
rows=44
width=14)
------------------------------------------------------------------

Indexes on threads :

threads_pkey PRIMARY KEY (id)
CREATE INDEX threads_date_last_idx ON threads (date_last);
CREATE INDEX threads_desktop_idx ON threads (desktop);
CREATE INDEX threads_operator_id_idx ON threads (operator_id);
CREATE UNIQUE INDEX threads_pri_dlast_id_idx ON threads (priority,
date_last, id);


None of these indexes are going to be much use here - you're mostly
filtering on mailbox_id. Unless "desktop" is very selective then a
sequential-scan seems sensible.

I'd recommend running an EXPLAIN ANALYSE and posting that to the
performance list.

You might want to include your postgresql.conf settings along with some
more details on your hardware. b
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #13

P: n/a
Just to cover the ovbious solution ...

How often do you run "vacuum analyze" on this database? If the data has been
around for a long time, have you considered using CLUSTER to physically reorder
the tables?

Christophe Musielak <cm*******@akio-software.com> wrote:
Hi,

We're working since 4 years on Postgres for a e-crm web based
application that deals with emails.

As our customers bases keep growing (more than one million rows in
sereval tables) we started to face performance issue with 7.2 and
decided to upgrade to 7.4 since one month.

The upgrade process was ok.

But we still face performances problems.

We decided to import the database into SQLServer to make some benchmarks
between the two with same hardware and same data.

Please find included main query we're testing on, the explain plan and
indexes list.

Here is the result :

Total running time Postges: 10 - 11s
Total running time sql Server : < 1s

We must admit that we're quite disappointed with the result and
currently think there is something wrong in the way we're working with
Postgres.

What do you think? is there some wrong with the configuration? with the
way we're building the query?

Thanks a lot for your answers.

Christophe Musielak
Christine Bruzaud
Akio Software


PS : Concerning the seq_scan shown in the explain plan, as there is
indexes on the tables as show below, we think Postgres is choosing
seq_scan versus index_scan to improve performance. We tried to force
using index_scan with 'set enable_seqscan = false) but there is no
improvement in the total duration of the query.
------------------------------------------------------------------------

We're running our tests on :

Table threads rows number : 125 000
Table emails rows number : 650 000
Table operators rows number : 50
Total nb rows returned by the select : 116 000 (without the LIMIT)
explain SELECT t.*, substring(t.subject::text, 0, 30) AS subject,
e.email,
o.lastname AS "operator"
FROM threads t JOIN emails e ON (e.id = t.client_email_id)
JOIN operators o ON (o.id = t.operator_id)
WHERE t.mailbox_id IN ( 2,3,5,20,21,13,22,23,24,25,26,19 )
and t.desktop = 2
ORDER BY t.date_last asc, t.id asc
LIMIT 16 OFFSET 0;

QUERY PLAN
------------------------------------------------------------------

Limit (cost=93539.02..93539.06 rows=16 width=619)
-> Sort (cost=93539.02..93745.69 rows=82669 width=619)
Sort Key: t.date_last, t.id
-> Hash Join (cost=26186.03..73789.00 rows=82669 width=619)
Hash Cond: ("outer".operator_id = "inner".id)
-> Merge Join (cost=26183.07..69487.26 rows=82668
width=609)
Merge Cond: ("outer".id = "inner".client_email_id)
-> Index Scan using emails_pkey on emails e
(cost=0.00..37782.57 rows=654511 width=30)
-> Sort (cost=26183.07..26389.74 rows=82667
width=583)
Sort Key: t.client_email_id
-> Seq Scan on threads t
(cost=0.00..19431.23
rows=82667 width=583)
Filter: (((mailbox_id = 2) OR
(mailbox_id
= 3) OR (mailbox_id = 5) OR (mailbox_id = 20) OR (mailbox_id = 21) OR
(mailbox_id = 13) OR (mailbox_id = 22) OR (mailbox_id =
23)
OR (mailbox_id = 24) OR (mailbox_id = 25) OR (mailbox_id = 26)
OR (mailbox_id = 19)) AND (desktop = 2))
-> Hash (cost=2.85..2.85 rows=44 width=14)
-> Seq Scan on operators o (cost=0.00..2.85
rows=44
width=14)
------------------------------------------------------------------

Indexes on threads :

threads_pkey PRIMARY KEY (id)
CREATE INDEX threads_date_last_idx ON threads (date_last);
CREATE INDEX threads_desktop_idx ON threads (desktop);
CREATE INDEX threads_operator_id_idx ON threads (operator_id);
CREATE UNIQUE INDEX threads_pri_dlast_id_idx ON threads (priority,
date_last, id);

Indexes on emails :

emails_pkey PRIMARY KEY (id)

Indexes on operators :

operators_pkey PRIMARY KEY (id)

-----------------------------------------------------------------
postgresql.conf modified parameters :

shared_buffers = 14000 # = 112 Mo with 10 Mo used by wal_buffers
sort_mem = 51200 # = 50 Mo
vacuum_mem = 102400 # = 100 Mo
# Recommended : 64 MB for 1 - 2 Go RAM
wal_buffers = 1280 # = 10240 Ko = 10 Mo
effective_cache_size = 65536 # Choice : 50% of RAM
# <=> 1 * 1024 * 1024 * .50 / 8 = 65536
random_page_cost = 2 # make the planner favor indexscans
cpu_tuple_cost = 0.042 # got the planner to choose the index

stats_command_string = true
stats_block_level = true
stats_row_level = true

datestyle = 'iso, dmy'
------------------------------------------------------------------



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #14

P: n/a
Just to cover the ovbious solution ...

How often do you run "vacuum analyze" on this database? If the data has been
around for a long time, have you considered using CLUSTER to physically reorder
the tables?

Christophe Musielak <cm*******@akio-software.com> wrote:
Hi,

We're working since 4 years on Postgres for a e-crm web based
application that deals with emails.

As our customers bases keep growing (more than one million rows in
sereval tables) we started to face performance issue with 7.2 and
decided to upgrade to 7.4 since one month.

The upgrade process was ok.

But we still face performances problems.

We decided to import the database into SQLServer to make some benchmarks
between the two with same hardware and same data.

Please find included main query we're testing on, the explain plan and
indexes list.

Here is the result :

Total running time Postges: 10 - 11s
Total running time sql Server : < 1s

We must admit that we're quite disappointed with the result and
currently think there is something wrong in the way we're working with
Postgres.

What do you think? is there some wrong with the configuration? with the
way we're building the query?

Thanks a lot for your answers.

Christophe Musielak
Christine Bruzaud
Akio Software


PS : Concerning the seq_scan shown in the explain plan, as there is
indexes on the tables as show below, we think Postgres is choosing
seq_scan versus index_scan to improve performance. We tried to force
using index_scan with 'set enable_seqscan = false) but there is no
improvement in the total duration of the query.
------------------------------------------------------------------------

We're running our tests on :

Table threads rows number : 125 000
Table emails rows number : 650 000
Table operators rows number : 50
Total nb rows returned by the select : 116 000 (without the LIMIT)
explain SELECT t.*, substring(t.subject::text, 0, 30) AS subject,
e.email,
o.lastname AS "operator"
FROM threads t JOIN emails e ON (e.id = t.client_email_id)
JOIN operators o ON (o.id = t.operator_id)
WHERE t.mailbox_id IN ( 2,3,5,20,21,13,22,23,24,25,26,19 )
and t.desktop = 2
ORDER BY t.date_last asc, t.id asc
LIMIT 16 OFFSET 0;

QUERY PLAN
------------------------------------------------------------------

Limit (cost=93539.02..93539.06 rows=16 width=619)
-> Sort (cost=93539.02..93745.69 rows=82669 width=619)
Sort Key: t.date_last, t.id
-> Hash Join (cost=26186.03..73789.00 rows=82669 width=619)
Hash Cond: ("outer".operator_id = "inner".id)
-> Merge Join (cost=26183.07..69487.26 rows=82668
width=609)
Merge Cond: ("outer".id = "inner".client_email_id)
-> Index Scan using emails_pkey on emails e
(cost=0.00..37782.57 rows=654511 width=30)
-> Sort (cost=26183.07..26389.74 rows=82667
width=583)
Sort Key: t.client_email_id
-> Seq Scan on threads t
(cost=0.00..19431.23
rows=82667 width=583)
Filter: (((mailbox_id = 2) OR
(mailbox_id
= 3) OR (mailbox_id = 5) OR (mailbox_id = 20) OR (mailbox_id = 21) OR
(mailbox_id = 13) OR (mailbox_id = 22) OR (mailbox_id =
23)
OR (mailbox_id = 24) OR (mailbox_id = 25) OR (mailbox_id = 26)
OR (mailbox_id = 19)) AND (desktop = 2))
-> Hash (cost=2.85..2.85 rows=44 width=14)
-> Seq Scan on operators o (cost=0.00..2.85
rows=44
width=14)
------------------------------------------------------------------

Indexes on threads :

threads_pkey PRIMARY KEY (id)
CREATE INDEX threads_date_last_idx ON threads (date_last);
CREATE INDEX threads_desktop_idx ON threads (desktop);
CREATE INDEX threads_operator_id_idx ON threads (operator_id);
CREATE UNIQUE INDEX threads_pri_dlast_id_idx ON threads (priority,
date_last, id);

Indexes on emails :

emails_pkey PRIMARY KEY (id)

Indexes on operators :

operators_pkey PRIMARY KEY (id)

-----------------------------------------------------------------
postgresql.conf modified parameters :

shared_buffers = 14000 # = 112 Mo with 10 Mo used by wal_buffers
sort_mem = 51200 # = 50 Mo
vacuum_mem = 102400 # = 100 Mo
# Recommended : 64 MB for 1 - 2 Go RAM
wal_buffers = 1280 # = 10240 Ko = 10 Mo
effective_cache_size = 65536 # Choice : 50% of RAM
# <=> 1 * 1024 * 1024 * .50 / 8 = 65536
random_page_cost = 2 # make the planner favor indexscans
cpu_tuple_cost = 0.042 # got the planner to choose the index

stats_command_string = true
stats_block_level = true
stats_row_level = true

datestyle = 'iso, dmy'
------------------------------------------------------------------



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 23 '05 #15

P: n/a
Christophe Musielak <cm*******@akio-software.com> writes:
-> Seq Scan on threads t
(cost=0.00..19431.23
rows=82667 width=583)
Filter: (((mailbox_id = 2) OR
(mailbox_id
= 3) OR (mailbox_id = 5) OR (mailbox_id = 20) OR (mailbox_id = 21) OR
(mailbox_id = 13) OR (mailbox_id = 22) OR (mailbox_id =
23)
OR (mailbox_id = 24) OR (mailbox_id = 25) OR (mailbox_id = 26)
OR (mailbox_id = 19)) AND (desktop = 2))


Although it's hard to be sure without EXPLAIN ANALYZE output, it seems
like an index on threads(mailbox_id) might help here. You have a bunch
of other indexes on threads that are useless for this query :-( ...
I hope they are some good for other queries, else they're just slowing
down updates.

Also a larger value of sort_mem might help. At least by the planner's
estimates, the other big hit is the use of a merge join between emails
and threads. I think a hash join would work better, but it's not going
to pick that unless the inner relation will fit in sort_mem.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #16

P: n/a
Christophe Musielak <cm*******@akio-software.com> writes:
-> Seq Scan on threads t
(cost=0.00..19431.23
rows=82667 width=583)
Filter: (((mailbox_id = 2) OR
(mailbox_id
= 3) OR (mailbox_id = 5) OR (mailbox_id = 20) OR (mailbox_id = 21) OR
(mailbox_id = 13) OR (mailbox_id = 22) OR (mailbox_id =
23)
OR (mailbox_id = 24) OR (mailbox_id = 25) OR (mailbox_id = 26)
OR (mailbox_id = 19)) AND (desktop = 2))


Although it's hard to be sure without EXPLAIN ANALYZE output, it seems
like an index on threads(mailbox_id) might help here. You have a bunch
of other indexes on threads that are useless for this query :-( ...
I hope they are some good for other queries, else they're just slowing
down updates.

Also a larger value of sort_mem might help. At least by the planner's
estimates, the other big hit is the use of a merge join between emails
and threads. I think a hash join would work better, but it's not going
to pick that unless the inner relation will fit in sort_mem.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #17

This discussion thread is closed

Replies have been disabled for this discussion.