Help with query: indexes on timestamps

Keith C. Perry

Ok, I've tried a number of things here and I know I'm missing something but at
this point my head is spinning (i.e. lack of sleep, too much coffee, etc...)

My environment is PG 7.4.3 on Linux with 512Mb of ram and swap. This was just
upgraded from 7.4 (just to make sure I'm current). Some of my settings in
postgresql are giving fatal errors but I don't think my issue is related to my
query problems. I also have a laptop running with the same basic specs (no
raid, slower processor).

I use a recent pgadmin-III as my client.

We're also running this query in MS-SQL.

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour'::text, report.stamp)
ORDER BY date_part('hour'::text, report.stamp);

Here is the plan I get:

QUERY PLAN
----------------------------------------------------------------------------
Sort (cost=47420.64..47421.14 rows=200 width=8)
Sort Key: date_part('hour'::text, stamp)
-> HashAggregate (cost=47412.00..47413.00 rows=200 width=8)
-> Seq Scan on report (cost=0.00..42412.00 rows=1000000 width=8)
(4 rows)
Now from from I understand that, the index I created would not be used since I
would be looking at every row to do the date part. The query under 7.4 ran in
about 8 seconds. In 7.4.3, its taking 37 seconds for the same plan (which is
fine for the system not be tuned yet). On my laptop its taking 6 seconds.
MS-SQL is taking 8 seconds. These runs are after I do vacuum full, vacuum
analyse and reindex on the database and table respectively

My question: How can I get this query to use an index build on the date_part
function. On the MS-SQL side, creating a computed column with the date part and
then don't an index on that column bring the query done to 2 seconds.

I tried creating this function:

CREATE OR REPLACE FUNCTION whathour(timestamptz)
RETURNS int4 AS
'begin
return date_part(\'hour\',$1);
end;'
LANGUAGE 'plpgsql' IMMUTABLE;

and then and index:

CREATE INDEX hour_idx
ON report
USING btree
(stamp)
WHERE whathour(stamp) >= 0 AND whathour(stamp) <= 23;

but I get the same plan- which makes sense to me because I'm again inspect
quiet a few row. I'm sure I'm missing something...

I couldn't see from the docs how to make a column equal a function (like
MS-SQL's computed column) but to me it seems like I should not have to do
something like that since it really is wasting space in the table. I hoping a
partial index or a function index will solve this and be just as efficient.
However, that method **does** work. Is there a better way?

Thanks to all in advance.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #1

Subscribe Reply

3813

Richard Huxton

Keith C. Perry wrote:

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour'::text, report.stamp)
ORDER BY date_part('hour'::text, report.stamp);

You will always get a sequential scan with this query - there is no
other way to count the rows.

With PostgreSQL being MVCC based, you can't know whether a row is
visible to you without checking it - visiting the index won't help. Even
if it could, you'd still have to visit every row in the index.

Assuming the table is a log, with always increasing timestamps, I'd
create a summary table and query that.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #2

Keith C. Perry

Quoting Richard Huxton <de*@archonet.com>:

Keith C. Perry wrote:

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour'::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour'::text, report.stamp)
ORDER BY date_part('hour'::text, report.stamp);

You will always get a sequential scan with this query - there is no
other way to count the rows.

With PostgreSQL being MVCC based, you can't know whether a row is
visible to you without checking it - visiting the index won't help. Even
if it could, you'd still have to visit every row in the index.

Assuming the table is a log, with always increasing timestamps, I'd
create a summary table and query that.

Yea, actually it a proxy server log each month the databasae is 500k records. I
have two months loaded only to put some stress on the server. Some ever month
I'm loading the data just so I can do some analysis. The optimization question
came up when one of the other database folks wanted to play with the database in
MS-SQL server.

How can I add a column that respresents a function that returns just the
date_part? I wondering if that will increase the speed of the query in similar
fashion as the MS-SQL did.

I hadn't though about the MVCC vs. file locking issue. The MS-SQL server does
not have any load on it and I'm sure if other users were hitting it the same
table with the same query, PG would be perform better.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #3

Similar topics

3921

how to group timestamps by date?

by: lkrubner | last post by:

This might be an idiot question, but how do you group by timestamps by date? I mean, given a large number of timestamps, spanning many months, how do grab them and say how many are from each day?...

PHP

3178

Index's in MySQL - Multiple index warning from PHPMyAdmin - please help.

by: Gregory.Spencer | last post by:

Hi there, Using PHPMyAdmin and it is very usefully reporting problems with my MySQL DB. "PRIMARY and INDEX keys should not both be set for column `column_name`" and

MySQL Database

31537

Can someone help me with multiple "Left Outer Joins"?

by: Steve | last post by:

I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three "Left Outer Joins" in order to return every transaction for a specific set of criteria. Using three "Left Outer Joins"...

Microsoft SQL Server

1629

Can I speed up this script...help?

by: trint | last post by:

Ok, This script is something I wrote for bringing up a report in reporting services and it is really slow...Is their any problems with it or is their better syntax to speed it up and still provide...

Microsoft SQL Server

1578

Help needed with slow query

by: g_chime | last post by:

I have a large table (~200 columns, ~10 million rows) and I am trying to query by two float rows, say f1 and f2: SELECT... FROM large_table WHERE f1 BETWEEN 10.82 AND 113.998 AND f2 BETWEEN...

MySQL Database

1885

bug in query planning?

by: Steven D.Arnold | last post by:

I have a query which does not use column indexes that it should use. I have discovered some interesting behaviors of Postgres which may indicate a bug in the database's query planning. Take a...

PostgreSQL Database

720

"Interesting" query planning for timestamp ranges in where clause?

by: Zygo Blaxell | last post by:

I have a table with a few million rows of temperature data keyed by timestamp. I want to group these rows by timestamp intervals (e.g. every 32 seconds), compute aggregate functions on the...

PostgreSQL Database

2622

Please help the performance issue.

by: Xu, Wei | last post by:

Hi, I have wrote the following sql sentence.Do you have comments to improve the performance.I have created all the indexed. But it's still very slow.Thanks The primary key is proj_ID and...

Microsoft SQL Server

1325

Help with Indexes, Unions and Joins greatly appreciated!

by: pootlecat | last post by:

Hello everyone, I have a fair sized table now (1,955,041 rows) and it currently has two indexes: PRIMARY is the ID number and Keywords is a FULLTEXT index of the Keywords column (Text). ...

MySQL Database

7296

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7364

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

7470

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5604

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

3186

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

3174

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

1524

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

751

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

405

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General