473,657 Members | 2,690 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with query: indexes on timestamps

Ok, I've tried a number of things here and I know I'm missing something but at
this point my head is spinning (i.e. lack of sleep, too much coffee, etc...)

My environment is PG 7.4.3 on Linux with 512Mb of ram and swap. This was just
upgraded from 7.4 (just to make sure I'm current). Some of my settings in
postgresql are giving fatal errors but I don't think my issue is related to my
query problems. I also have a laptop running with the same basic specs (no
raid, slower processor).

I use a recent pgadmin-III as my client.

We're also running this query in MS-SQL.

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour '::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour '::text, report.stamp)
ORDER BY date_part('hour '::text, report.stamp);

Here is the plan I get:

QUERY PLAN
----------------------------------------------------------------------------
Sort (cost=47420.64. .47421.14 rows=200 width=8)
Sort Key: date_part('hour '::text, stamp)
-> HashAggregate (cost=47412.00. .47413.00 rows=200 width=8)
-> Seq Scan on report (cost=0.00..424 12.00 rows=1000000 width=8)
(4 rows)
Now from from I understand that, the index I created would not be used since I
would be looking at every row to do the date part. The query under 7.4 ran in
about 8 seconds. In 7.4.3, its taking 37 seconds for the same plan (which is
fine for the system not be tuned yet). On my laptop its taking 6 seconds.
MS-SQL is taking 8 seconds. These runs are after I do vacuum full, vacuum
analyse and reindex on the database and table respectively

My question: How can I get this query to use an index build on the date_part
function. On the MS-SQL side, creating a computed column with the date part and
then don't an index on that column bring the query done to 2 seconds.

I tried creating this function:

CREATE OR REPLACE FUNCTION whathour(timest amptz)
RETURNS int4 AS
'begin
return date_part(\'hou r\',$1);
end;'
LANGUAGE 'plpgsql' IMMUTABLE;

and then and index:

CREATE INDEX hour_idx
ON report
USING btree
(stamp)
WHERE whathour(stamp) >= 0 AND whathour(stamp) <= 23;

but I get the same plan- which makes sense to me because I'm again inspect
quiet a few row. I'm sure I'm missing something...

I couldn't see from the docs how to make a column equal a function (like
MS-SQL's computed column) but to me it seems like I should not have to do
something like that since it really is wasting space in the table. I hoping a
partial index or a function index will solve this and be just as efficient.
However, that method **does** work. Is there a better way?

Thanks to all in advance.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

_______________ _______________ ______
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postg resql.org

Nov 23 '05 #1
2 3825
Keith C. Perry wrote:

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour '::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour '::text, report.stamp)
ORDER BY date_part('hour '::text, report.stamp);


You will always get a sequential scan with this query - there is no
other way to count the rows.

With PostgreSQL being MVCC based, you can't know whether a row is
visible to you without checking it - visiting the index won't help. Even
if it could, you'd still have to visit every row in the index.

Assuming the table is a log, with always increasing timestamps, I'd
create a summary table and query that.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #2
Quoting Richard Huxton <de*@archonet.c om>:
Keith C. Perry wrote:

I have a table with with 1 million records in it. Here is the definition

CREATE TABLE report
(
match int4,
action varchar(16),
stamp timestamptz,
account varchar(32),
ipaddress inet,
profile varchar(16),
rating text,
url text
)
WITHOUT OIDS;

The is one index:

CREATE INDEX stamp_idx
ON report
USING btree
(stamp);

That query I'm running is:

SELECT date_part('hour '::text, report.stamp) AS "hour", count(*) AS count
FROM report
GROUP BY date_part('hour '::text, report.stamp)
ORDER BY date_part('hour '::text, report.stamp);


You will always get a sequential scan with this query - there is no
other way to count the rows.

With PostgreSQL being MVCC based, you can't know whether a row is
visible to you without checking it - visiting the index won't help. Even
if it could, you'd still have to visit every row in the index.

Assuming the table is a log, with always increasing timestamps, I'd
create a summary table and query that.


Yea, actually it a proxy server log each month the databasae is 500k records. I
have two months loaded only to put some stress on the server. Some ever month
I'm loading the data just so I can do some analysis. The optimization question
came up when one of the other database folks wanted to play with the database in
MS-SQL server.

How can I add a column that respresents a function that returns just the
date_part? I wondering if that will increase the speed of the query in similar
fashion as the MS-SQL did.

I hadn't though about the MVCC vs. file locking issue. The MS-SQL server does
not have any load on it and I'm sure if other users were hitting it the same
table with the same query, PG would be perform better.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

_______________ _______________ ______
This email account is being host by:
VCSN, Inc : http://vcsn.com

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
3928
by: lkrubner | last post by:
This might be an idiot question, but how do you group by timestamps by date? I mean, given a large number of timestamps, spanning many months, how do grab them and say how many are from each day? If the timestamps measure visits to a web site, how to easily say there were 45 visits on January 4th? The first idea that occurs to me is to put them all in an array and then loop through the array and use date() on each, one at a time,...
1
3186
by: Gregory.Spencer | last post by:
Hi there, Using PHPMyAdmin and it is very usefully reporting problems with my MySQL DB. "PRIMARY and INDEX keys should not both be set for column `column_name`" and
7
31549
by: Steve | last post by:
I have a SQL query I'm invoking via VB6 & ADO 2.8, that requires three "Left Outer Joins" in order to return every transaction for a specific set of criteria. Using three "Left Outer Joins" slows the system down considerably. I've tried creating a temp db, but I can't figure out how to execute two select commands. (It throws the exception "The column prefix 'tempdb' does not match with a table name or alias name used in the query.")
4
1640
by: trint | last post by:
Ok, This script is something I wrote for bringing up a report in reporting services and it is really slow...Is their any problems with it or is their better syntax to speed it up and still provide the same report results?: SELECT t1.MemberId, t1.PeriodID, t8.start_date, t6.amount_type_id, t6.amount_type, SUM(CASE WHEN t2.amountTypeId = 7 THEN t2.amount WHEN t2.amountTypeId = 23 THEN - t2.amount END) AS Purchase,
2
1582
by: g_chime | last post by:
I have a large table (~200 columns, ~10 million rows) and I am trying to query by two float rows, say f1 and f2: SELECT... FROM large_table WHERE f1 BETWEEN 10.82 AND 113.998 AND f2 BETWEEN -124.99 AND 23.5; The query is extremely slow; it takes hours.
6
1907
by: Steven D.Arnold | last post by:
I have a query which does not use column indexes that it should use. I have discovered some interesting behaviors of Postgres which may indicate a bug in the database's query planning. Take a look at the query below. There is a btree index on both m.account_id and a.account_id. Query (1) does not use the index on the messages table, instead opting for a full table scan, thus killing performance. The messages table can contain...
2
720
by: Zygo Blaxell | last post by:
I have a table with a few million rows of temperature data keyed by timestamp. I want to group these rows by timestamp intervals (e.g. every 32 seconds), compute aggregate functions on the columns, and ultimately feed the result into a graph-drawing web thingy. I'm trying a few different ways to get what seems to be the same data, and seeing some odd behavior from the query planner. The table looks like this:
8
2633
by: Xu, Wei | last post by:
Hi, I have wrote the following sql sentence.Do you have comments to improve the performance.I have created all the indexed. But it's still very slow.Thanks The primary key is proj_ID and Task_UID. SELECT PR.PROJ_NAME AS PRName, PR.PROJ_ID As PRProjID, PR.TASK_UID As PRTaskUID, 'Dev' AS GroupType, Feat.PROJ_ID As FeatProjID, Feat.TASK_UID As FeatTaskUID, Feat.FeatureID AS FeatureID,
1
1334
by: pootlecat | last post by:
Hello everyone, I have a fair sized table now (1,955,041 rows) and it currently has two indexes: PRIMARY is the ID number and Keywords is a FULLTEXT index of the Keywords column (Text). +----------------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------------+------------------+------+-----+---------+----------------+
0
8394
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8306
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8825
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8732
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8503
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7327
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6164
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4152
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4304
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.