473,503 Members | 11,968 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with query: indexes on timestamps

Ok, I've tried a number of things here and I know I'm missing something but at
this point my head is spinning (i.e. lack of sleep, too much coffee, etc...)

My environment is PG 7.4.3 on Linux with 512Mb of ram and swap. This was just
upgraded from 7.4 (just to make sure I'm current). Some of my settings in
postgresql are giving fatal errors but I don't think my issue is related to my
query problems. I also have a laptop running with the same basic specs (no
raid, slower processor).

I use a recent pgadmin-III as my client.

We're also running this query in MS-SQL.

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour'::text, report.stamp)
ORDER BY date_part('hour'::text, report.stamp);

Here is the plan I get:

QUERY PLAN
----------------------------------------------------------------------------
Sort (cost=47420.64..47421.14 rows=200 width=8)
Sort Key: date_part('hour'::text, stamp)
-> HashAggregate (cost=47412.00..47413.00 rows=200 width=8)
-> Seq Scan on report (cost=0.00..42412.00 rows=1000000 width=8)
(4 rows)
Now from from I understand that, the index I created would not be used since I
would be looking at every row to do the date part. The query under 7.4 ran in
about 8 seconds. In 7.4.3, its taking 37 seconds for the same plan (which is
fine for the system not be tuned yet). On my laptop its taking 6 seconds.
MS-SQL is taking 8 seconds. These runs are after I do vacuum full, vacuum
analyse and reindex on the database and table respectively

My question: How can I get this query to use an index build on the date_part
function. On the MS-SQL side, creating a computed column with the date part and
then don't an index on that column bring the query done to 2 seconds.

I tried creating this function:

CREATE OR REPLACE FUNCTION whathour(timestamptz)
RETURNS int4 AS
'begin
return date_part(\'hour\',$1);
end;'
LANGUAGE 'plpgsql' IMMUTABLE;

and then and index:

CREATE INDEX hour_idx
ON report
USING btree
(stamp)
WHERE whathour(stamp) >= 0 AND whathour(stamp) <= 23;

but I get the same plan- which makes sense to me because I'm again inspect
quiet a few row. I'm sure I'm missing something...

I couldn't see from the docs how to make a column equal a function (like
MS-SQL's computed column) but to me it seems like I should not have to do
something like that since it really is wasting space in the table. I hoping a
partial index or a function index will solve this and be just as efficient.
However, that method **does** work. Is there a better way?

Thanks to all in advance.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #1
2 3813
Keith C. Perry wrote:

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour'::text, report.stamp)
ORDER BY date_part('hour'::text, report.stamp);


You will always get a sequential scan with this query - there is no
other way to count the rows.

With PostgreSQL being MVCC based, you can't know whether a row is
visible to you without checking it - visiting the index won't help. Even
if it could, you'd still have to visit every row in the index.

Assuming the table is a log, with always increasing timestamps, I'd
create a summary table and query that.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #2
Quoting Richard Huxton <de*@archonet.com>:
Keith C. Perry wrote:

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour'::text, report.stamp)
ORDER BY date_part('hour'::text, report.stamp);


You will always get a sequential scan with this query - there is no
other way to count the rows.

With PostgreSQL being MVCC based, you can't know whether a row is
visible to you without checking it - visiting the index won't help. Even
if it could, you'd still have to visit every row in the index.

Assuming the table is a log, with always increasing timestamps, I'd
create a summary table and query that.


Yea, actually it a proxy server log each month the databasae is 500k records. I
have two months loaded only to put some stress on the server. Some ever month
I'm loading the data just so I can do some analysis. The optimization question
came up when one of the other database folks wanted to play with the database in
MS-SQL server.

How can I add a column that respresents a function that returns just the
date_part? I wondering if that will increase the speed of the query in similar
fashion as the MS-SQL did.

I hadn't though about the MVCC vs. file locking issue. The MS-SQL server does
not have any load on it and I'm sure if other users were hitting it the same
table with the same query, PG would be perform better.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
3921
by: lkrubner | last post by:
This might be an idiot question, but how do you group by timestamps by date? I mean, given a large number of timestamps, spanning many months, how do grab them and say how many are from each day?...
1
3178
by: Gregory.Spencer | last post by:
Hi there, Using PHPMyAdmin and it is very usefully reporting problems with my MySQL DB. "PRIMARY and INDEX keys should not both be set for column `column_name`" and
7
31537
by: Steve | last post by:
I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three "Left Outer Joins" in order to return every transaction for a specific set of criteria. Using three "Left Outer Joins"...
4
1629
by: trint | last post by:
Ok, This script is something I wrote for bringing up a report in reporting services and it is really slow...Is their any problems with it or is their better syntax to speed it up and still provide...
2
1578
by: g_chime | last post by:
I have a large table (~200 columns, ~10 million rows) and I am trying to query by two float rows, say f1 and f2: SELECT... FROM large_table WHERE f1 BETWEEN 10.82 AND 113.998 AND f2 BETWEEN...
6
1885
by: Steven D.Arnold | last post by:
I have a query which does not use column indexes that it should use. I have discovered some interesting behaviors of Postgres which may indicate a bug in the database's query planning. Take a...
2
720
by: Zygo Blaxell | last post by:
I have a table with a few million rows of temperature data keyed by timestamp. I want to group these rows by timestamp intervals (e.g. every 32 seconds), compute aggregate functions on the...
8
2622
by: Xu, Wei | last post by:
Hi, I have wrote the following sql sentence.Do you have comments to improve the performance.I have created all the indexed. But it's still very slow.Thanks The primary key is proj_ID and...
1
1325
by: pootlecat | last post by:
Hello everyone, I have a fair sized table now (1,955,041 rows) and it currently has two indexes: PRIMARY is the ID number and Keywords is a FULLTEXT index of the Keywords column (Text). ...
0
7296
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7364
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
7470
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5604
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
3186
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
3174
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1524
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
751
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
405
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.