At 11:25 AM 1/27/2004 -0500, Tom Lane wrote:
Martijn van Oosterhout <kl*****@svana. org> writes:I'm afraid I'll have to defer to someone else (Tom?) as why the
estimate wasout by three orders of magnitude.
I'd like to know that, too.I'd suggest playing around with statistics and seeing if you can work out
why they were so bad.
Could we see the pg_stats row for the ip_saddr column? Also, does the
estimate get better if you increase the stats target for ip_saddr and
re-analyze?
regards, tom lane
OK - the statistics and various queries follow. If n_distinct is the
estimate of unique values then it is still off, but the plan does change.
How does postgresql do the analysis?
select version();
version
-------------------------------------------------------------
PostgreSQL 7.4.1 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)
VACUUM full cust_ulog ;
VACUUM
Time: 96916.323 ms
analyze cust_ulog;
Time: 25969.562 ms
explain select ip_saddr,count( *),sum(ip_totle n) from cust_ulog
where true
group by ip_saddr order by sum(ip_totlen) desc limit 10
QUERY PLAN
---------------------------------------------------------------------------------------
Limit (cost=331739.22 ..331739.24 rows=10 width=13)
-> Sort (cost=331739.22 ..331743.03 rows=1523 width=13)
Sort Key: sum(ip_totlen)
-> HashAggregate (cost=331651.09 ..331658.71 rows=1523 width=13)
-> Seq Scan on cust_ulog (cost=0.00..262 156.91
rows=9265891 width=13)
(5 rows)
Time: 163.512 ms
select * from pg_stats where tablename ='cust_ulog' and attname='ip_sad dr';
schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_val s
| most_common_fre qs |
histogram_bound s | correlation
------------+-----------+----------+-----------+-----------+------------+---------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
public | cust_ulog | ip_saddr | 0 | 11 | 1523 |
{202.185.236.30 ,202.185.236.6, 202.185.236.11, 202.185.236.25, 202.185.236.7,2 03.121.96.19,20 2.185.236.23,20 2.75.129.102,19 2.228.128.20,20 2.188.1.5}
|
{0.243667,0.092 6667,0.0553333, 0.0433333,0.031 ,0.0183333,0.01 43333,0.0116667 ,0.011,0.011}
|
{4.2.49.2,61.14 9.58.98,66.128. 175.107,69.56.6 3.214,195.92.67 .209,202.185.23 1.40,202.185.23 6.47,203.70.228 .182,208.41.95. 194,212.253.48. 34,221.232.160. 6}
| 0.00267716
(1 row)
alter table cust_ulog alter column ip_saddr set statistics 50;
ALTER TABLE
Time: 15.605 ms
analyze cust_ulog;
ANALYZE
Time: 51251.862 ms
select correlation,att name from pg_stats where tablename ='cust_ulog' and
attname='ip_sad dr';
correlation | attname
-------------+----------
0.0286273 | ip_saddr
select n_distinct from pg_stats where tablename ='cust_ulog' and
attname='ip_sad dr';
n_distinct
------------
6714
(1 row)
Time: 16.134 ms
alter table cust_ulog alter column ip_saddr set statistics 100;
ALTER TABLE
Time: 13.728 ms
analyze cust_ulog;
ANALYZE
Time: 55412.288 ms
select n_distinct from pg_stats where tablename ='cust_ulog' and
attname='ip_sad dr';
n_distinct
------------
12939
(1 row)
Time: 26.338 ms
explain select ip_saddr,count( *),sum(ip_totle n) from cust_ulog
where true
group by ip_saddr order by sum(ip_totlen) desc limit 10
;
QUERY PLAN
---------------------------------------------------------------------------------------------
Limit (cost=2096080.3 4..2096080.37 rows=10 width=13)
-> Sort (cost=2096080.3 4..2096112.69 rows=12939 width=13)
Sort Key: sum(ip_totlen)
-> GroupAggregate (cost=2002684.6 6..2095196.65 rows=12939 width=13)
-> Sort (cost=2002684.6 6..2025796.48 rows=9244729 width=13)
Sort Key: ip_saddr
-> Seq Scan on cust_ulog (cost=0.00..261 945.29
rows=9244729 width=13)
(7 rows)
Time: 29.819 ms
select * from pg_stats where tablename ='cust_ulog' and attname='ip_sad dr';
schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_val s | most_common_fre qs | histogram_bound s | correlation
------------+-----------+----------+-----------+-----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----
-
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------
-----
-
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------
-----
-
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
public | cust_ulog | ip_saddr | 0 | 11 | 12939 |
{202.185.236.30 ,202.185.236.6, 202.185.236.11, 202.185.236.25, 202.185.236.7,2 03.121.96.19,20 2.185.236.23,20 2.185.236.47,19 2.228.128.20,20 2.75.129.102,20 2.188.1.5,202.1 85.236.3,202.75 .129.101,203.12 1.96.24,203.121 .96.16,202.185. 236.34,202.185. 236.31,202.188. 0.132,202.188.0 .133,202.185.23 6.12,202.12.28. 131,202.188.1.2 3,202.188.1.25, 202.188.1.4,202 .12.29.25,202.1 85.231.40,202.1 85.236.45,202.1 85.236.26,192.2 28.128.11,193.0 .0.193,202.129. 169.40,202.146. 72.66,210.186.7 5.154,219.93.92 .82,202.185.236 .35,192.228.128 .1
2,202
..
185.90.118,202. 185.236.5,202.1 85.236.8,202.18 7.125.12,211.97 .56.133,103.221 .179.184,202.18 5.66.41,202.186 .47.155,202.186 .140.50,202.188 .172.163,216.39 .48.40,216.218. 158.87,216.218. 201.91,216.218. 224.26,216.219. 254.60,202.185. 254.231,202.186 .49.133,202.187 .31.12,202.187. 58.165,202.188. 0.164,203.121.1 5.140,203.193.9 6.68,207.218.16 5.207,61.6.38.1 39,61.175.132.2 28,62.62.139.25 3,63.163.210.22 7,64.61.25.155, 66.48.78.31}
|
{0.248767,0.099 7333,0.0566667, 0.0382667,0.034 5667,0.0147667, 0.0138,0.0118,0 .0107667,0.0098 3333,0.00946667 ,0.0093,0.00926 667,0.00903333, 0.00793333,0.00 776667,0.0077,0 .0074,0.0073333 3,0.0049,0.0044 3333,0.0044,0.0 044,0.00426667, 0.00406667,0.00 38,0.0031,0.002 8,0.00193333,0. 0018,0.0016,0.0 0146667,0.0014, 0.00123333,0.00 113333,0.0011,0 .00103333,0.000 8,0.0008,0.0007 33333,0.0006,0. 0006,0.00056666 7,0.000566667,0 .000566667,0.00 0533333,0.00053 3333,0.00046666 7,0.000433333,0 .000433333,0.00 04,0.000366667, 0.000366667,0.0 00
333,0
.. 000133333} |
{4.3.7.155,4.40 .42.46,12.8.192 .29,12.148.209. 196,18.206.0.76 ,24.81.163.171, 24.145.188.188, 24.194.25.199,2 4.228.82.49,61. 6.98.138,61.6.1 54.62,61.48.54. 46,61.88.0.6,61 .149.232.192,61 .222.96.205,62. 12.112.134,63.1 3.136.190,63.18 5.112.170,64.40 .227.22,64.146. 63.170,64.218.2 31.6,64.231.230 .239,65.43.172. 73,65.92.251.30 ,65.124.135.2,6 5.220.24.19,66. 50.11.33,66.92. 93.137,66.142.2 30.36,66.190.20 1.9,66.250.5.18 2,67.39.64.151, 67.72.200.129,6 7.119.178.208,6 7.227.12.5,68.4 7.247.99,68.76. 186.67,68.97.15 .56,68.146.69.1 71
..22,2
0
7.206.201.250,2 08.188.24.184,2 09.122.233.245, 210.3.135.230,2 10.54.85.168,21 0.169.237.132,2 10.186.123.51,2 10.187.228.241, 210.201.230.58, 211.24.46.39,21 1.104.217.229,2 11.173.53.3,212 .162.76.197,213 .152.72.122,216 .68.94.36,216.1 76.89.162,217.4 2.54.38,217.233 .45.74,218.47.2 49.76,218.117.8 4.68,218.244.59 .57,219.92.84.8 ,219.93.108.1,2 19.95.1.248,219 .95.164.90,219. 109.238.196,219 .252.67.13,220. 201.96.35,221.2 32.160.6}
| 0.0258759
(1 row)
Time: 4.600 ms
select count(*) from cust_ulog ;
count
---------
9461332
(1 row)
Time: 46468.211 ms
fwlogs=# explain select count(distinct( ip_saddr)) from cust_ulog;
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=285057.11 ..285057.11 rows=1 width=11)
-> Seq Scan on cust_ulog (cost=0.00..261 945.29 rows=9244729 width=11)
(2 rows)
Time: 1.087 ms
fwlogs=# select count(distinct( ip_saddr)) from cust_ulog;
count
---------
1391110
(1 row)
Time: 226998.021 ms
***Version 7.3.4 statistics:
select * from pg_stats where tablename ='cust_ulog' and attname='ip_sad dr'
schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_val s
| most_common_fre qs |
histogram_bound s | correlation
------------+-----------+----------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
public | hlbb_ulog | ip_saddr | 0 | 12 | 1643 |
{202.185.236.30 ,202.185.236.6, 202.185.236.11, 202.185.236.25, 202.185.236.7,2 03.121.96.19,20 2.185.236.23,19 2.228.128.20,20 2.185.236.47,20 2.188.0.133}
|
{0.238333,0.091 ,0.0536667,0.03 7,0.035,0.01633 33,0.015,0.012, 0.0103333,0.009 66667}
|
{4.2.49.3,61.6. 158.105,66.44.2 .9,74.16.241.11 3,192.228.128.1 1,202.108.249.2 1,202.185.236.3 4,203.69.19.3,2 09.112.0.54,216 .87.206.244,221 .199.145.131}
| 0.0245979
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster