Inner join question - PostgreSQL Database

Randall Skelton

Greetings all,

I am trying to do what should be a simple join but the tables are large
and it is taking a long, long time. I have the feeling that I have
stuffed up something in the syntax.

Here is what I have:

telemetry=> select (tq1.timestamp = tq2.timestamp) as timestamp,
tq1.value as q1, tq2.value as q2 from cal_quat_1 tq1 inner join
cal_quat_2 as tq2 using (timestamp) where timestamp > '2004-01-12
09:47:56.0000 +0' and timestamp < '2004-01-12 09:50:44.7187 +0' order
by timestamp;

telemetry=> \d cal_quat_1
Table "cal_quat_1"
Column | Type | Modifiers
-----------+--------------------------+-----------
timestamp | timestamp with time zone |
value | double precision |

telemetry=> \d cal_quat_2
Table "cal_quat_2"
Column | Type | Modifiers
-----------+--------------------------+-----------
timestamp | timestamp with time zone |
value | double precision |

My understanding of an inner join is that the query above will restrict
to finding tq1.timestamp, tq1.value and then move onto t12.value to
search the subset. I have tried this with and without the '=' sign and
it isn't clear if it is making any difference at all (the timestamps
are identical in the range of interest). I have not allowed the query
to finish as it seems to take more than 10 minutes. Both timestamps
are indexed and I expect about 150 rows to be returned. At the end of
the day, I have four identical tables of quaternions (timestamp, value)
and I need to extract them all for a range of timestamps.

Cheers,
Randall
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 22 '05 #1

Subscribe Post Reply

2099

Nick Barr

Randall Skelton wrote:

Greetings all,

I am trying to do what should be a simple join but the tables are
large and it is taking a long, long time. I have the feeling that I
have stuffed up something in the syntax.

Here is what I have:

telemetry=> select (tq1.timestamp = tq2.timestamp) as timestamp,
tq1.value as q1, tq2.value as q2 from cal_quat_1 tq1 inner join
cal_quat_2 as tq2 using (timestamp) where timestamp > '2004-01-12
09:47:56.0000 +0' and timestamp < '2004-01-12 09:50:44.7187 +0' order
by timestamp;

telemetry=> \d cal_quat_1
Table "cal_quat_1"
Column | Type | Modifiers
-----------+--------------------------+-----------
timestamp | timestamp with time zone |
value | double precision |

telemetry=> \d cal_quat_2
Table "cal_quat_2"
Column | Type | Modifiers
-----------+--------------------------+-----------
timestamp | timestamp with time zone |
value | double precision |

My understanding of an inner join is that the query above will
restrict to finding tq1.timestamp, tq1.value and then move onto
t12.value to search the subset. I have tried this with and without
the '=' sign and it isn't clear if it is making any difference at all
(the timestamps are identical in the range of interest). I have not
allowed the query to finish as it seems to take more than 10 minutes.
Both timestamps are indexed and I expect about 150 rows to be
returned. At the end of the day, I have four identical tables of
quaternions (timestamp, value) and I need to extract them all for a
range of timestamps.

Cheers,
Randall
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

We need more information to be able to help further. Can you supply:

1. Total number of rows in each table.
2. Results from "explain analyze <your query>"
3. key configuration values from postgresql.conf
4. Basic hardware config. (CPU type and number, Total RAM, HDD type,
size and speed)

But in the mean time can you try the following query instead.

select (tq1.timestamp = tq2.timestamp) as timestamp, tq1.value as q1,
tq2.value as q2 from cal_quat_1 tq1, cal_quat_2 as tq2 WHERE
tq1.timestamp=tq2.timestamp AND tq1.timestamp BETWEEN '2004-01-12
09:47:56.0000 +0'::timestamp AND '2004-01-12 09:50:44.7187
+0'::timestamp order by tq1.timestamp;

As far as I know, and someone please correct me, this allows the planner
the most flexibility when figuring out the optimum plan.
Thanks

Nick

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 22 '05 #2

Jan Poslusny

Hi,
try this on psql console:
explain analyze select tq1.*, tq2.* from
cal_quat_1 tq1, cal_quat_2 tq2
where tq1.timestamp = tq2.timestamp
and tq1.timestamp > '2004-01-12 09:47:56.0000 +0'::timestamp with time zone
and tq1.timestamp < '2004-01-12 09:50:44.7187 +0'::timestamp with time zone
order by tq1.timestamp;
.... and examine generated query plan (or post it)
regards, pajout
P.S.
And what about vacuum full analyze ? :)

Randall Skelton wrote:

Greetings all,

I am trying to do what should be a simple join but the tables are
large and it is taking a long, long time. I have the feeling that I
have stuffed up something in the syntax.

Here is what I have:

telemetry=> select (tq1.timestamp = tq2.timestamp) as timestamp,
tq1.value as q1, tq2.value as q2 from cal_quat_1 tq1 inner join
cal_quat_2 as tq2 using (timestamp) where timestamp > '2004-01-12
09:47:56.0000 +0' and timestamp < '2004-01-12 09:50:44.7187 +0' order
by timestamp;

telemetry=> \d cal_quat_1
Table "cal_quat_1"
Column | Type | Modifiers
-----------+--------------------------+-----------
timestamp | timestamp with time zone |
value | double precision |

telemetry=> \d cal_quat_2
Table "cal_quat_2"
Column | Type | Modifiers
-----------+--------------------------+-----------
timestamp | timestamp with time zone |
value | double precision |

My understanding of an inner join is that the query above will
restrict to finding tq1.timestamp, tq1.value and then move onto
t12.value to search the subset. I have tried this with and without
the '=' sign and it isn't clear if it is making any difference at all
(the timestamps are identical in the range of interest). I have not
allowed the query to finish as it seems to take more than 10 minutes.
Both timestamps are indexed and I expect about 150 rows to be
returned. At the end of the day, I have four identical tables of
quaternions (timestamp, value) and I need to extract them all for a
range of timestamps.

Cheers,
Randall
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #3

Randall Skelton

The main problem was that there wasn't an index on cal_qat_1. The
other indexes were fine so I don't know what happened to the first
one...

Nevertheless, it still takes longer than I would like. As requested:

telemetry=> explain analyze select tq1.*, tq2.* from
telemetry-> cal_quat_1 tq1, cal_quat_2 tq2
telemetry-> where tq1.timestamp = tq2.timestamp
telemetry-> and tq1.timestamp > '2004-01-12 09:47:56.0000
+0'::timestamp with time zone
telemetry-> and tq1.timestamp < '2004-01-12 09:50:44.7187
+0'::timestamp with time zone
telemetry-> order by tq1.timestamp;
NOTICE: QUERY PLAN:

Merge Join (cost=517417.89..2795472.80 rows=177664640 width=32)
(actual time=64878.04..64936.41 rows=142 loops=1)
-> Index Scan using cal_quat_1_timestamp on cal_quat_1 tq1
(cost=0.00..50549.03 rows=13329 width=16) (actual time=73.29..129.66
rows=142 loops=1)
-> Sort (cost=517417.89..517417.89 rows=2665818 width=16) (actual
time=62310.53..63727.33 rows=1020155 loops=1)
-> Seq Scan on cal_quat_2 tq2 (cost=0.00..43638.18
rows=2665818 width=16) (actual time=14.12..13462.19 rows=2665818
loops=1)
Total runtime: 65424.79 msec

Each table currently has 2665818 rows but grows by 86400/day. With
regards to hardware, the machine is a Sunfire 3600 (4 x 750MHz, 4GB
RAM, DB is on on a fiber channel disk array).

We are using 7.2.1.

Cheers,
Randall

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 22 '05 #4

Martijn van Oosterhout

On Thu, Feb 19, 2004 at 01:23:34PM -0500, Randall Skelton wrote:

The main problem was that there wasn't an index on cal_qat_1. The
other indexes were fine so I don't know what happened to the first
one...
How about an index on cal_quat_2.timestamp?

Hope this helps,

--
Martijn van Oosterhout <kl*****@svana.org> http://svana.org/kleptog/ If the Catholic church can survive the printing press, science fiction
will certainly weather the advent of bookwarez.
http://craphound.com/ebooksneitherenorbooks.txt - Cory Doctorow

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQFANRa1Y5Twig3Ge+YRApOeAJ9Up9cKE+9C30V/FGaPYQZ4D3bQIQCgmJh4
P4n0fkezMSr7f/pCLO+p43g=
=i7sj
-----END PGP SIGNATURE-----

Nov 22 '05 #5

Tom Lane

Randall Skelton <sk*****@brutus.uwaterloo.ca> writes:

Nevertheless, it still takes longer than I would like. As requested: Merge Join (cost=517417.89..2795472.80 rows=177664640 width=32)
(actual time=64878.04..64936.41 rows=142 loops=1)
-> Index Scan using cal_quat_1_timestamp on cal_quat_1 tq1
(cost=0.00..50549.03 rows=13329 width=16) (actual time=73.29..129.66
rows=142 loops=1)
-> Sort (cost=517417.89..517417.89 rows=2665818 width=16) (actual
time=62310.53..63727.33 rows=1020155 loops=1)
-> Seq Scan on cal_quat_2 tq2 (cost=0.00..43638.18
rows=2665818 width=16) (actual time=14.12..13462.19 rows=2665818
loops=1)
Total runtime: 65424.79 msec

I think the problem is the gross misestimation of the number of rows
involved --- first within the timestamp interval (13329 vs actual 142)
and then for the join result (177664640 is just silly). With more
accurate estimates you would probably have gotten the double indexscan
plan that you really want.

The estimates look remarkably default-ish, however --- if I'm doing the
math correctly, the selectivity is being estimated as 0.005 at each
step, which just happens to be the default estimate in the absence of
any statistics. Have you ANALYZEd these tables lately? If you have,
try increasing the statistics target for the timestamp rows (see ALTER
TABLE) and analyze again.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Nov 22 '05 #6

Similar topics

Inner Join / Indexes Hell

by: Ike | last post by:

Oh I have a nasty query which runs incredibly slowly. I am running MySQL 4.0.20-standard. Thus, in trying to expedite the query, I am trying to set indexes in my tables. My query requires four...

MySQL Database

Need help on inner joins.

by: Prem | last post by:

Hi, I am having many problems with inner join. my first problem is : 1) I want to know the precedance while evaluating query with multiple joins. eg. select Employees.FirstName,...

Microsoft SQL Server

Inner Join question

by: kieran | last post by:

Hi, I have the following sql statement. I originally had the statement with two INNER JOINS but in some situations was getting an error so changed the last INNER JOIN to a LEFT OUTER JOIN (as...

Microsoft SQL Server

INNER JOIN question

by: Steve | last post by:

Hi, I am real new to databases and hoping someone can help. Main-table is a huge spreadsheet that I imported into Access Site-table has a bunch of addresses pulled from Main-table, quite a few...

Microsoft Access / VBA

SQL statement on inner join that hits another table twice?

by: dmonroe | last post by:

hi group -- Im having a nested inner join problem with an Access SQl statement/Query design. Im running the query from ASP and not usng the access interface at all. Here's the tables: ...

Microsoft Access / VBA

Can inner join replace multiple loop selects?

by: MP | last post by:

Hi trying to begin to learn database using vb6, ado/adox, mdb format, sql (not using access...just mdb format via ado) i need to group the values of multiple fields - get their possible...

Microsoft Access / VBA

Inner join example

by: Zeff | last post by:

Hi all, I have a relational database, where all info is kept in separate tables and just the id's from those tables are stored in one central table (tblMaster)... I want to perform a query, so...

Microsoft Access / VBA

Performance between Standard Join and Inner Join

by: Chamnap | last post by:

Hello, everyone I have one question about the standard join and inner join, which one is faster and more reliable? Can you recommend me to use? Please, explain me... Thanks Chamnap

Microsoft SQL Server

left outer join/inner join statement

by: teneesh | last post by:

Here I have a code for a view that has been created by a developer on my team. I am trying to use the very same code to create a view for a different formid/quesid. But I cannot figure out how this...

Microsoft SQL Server

INNER JOIN & IS NOT NULL probs

by: MATTXtwo | last post by:

I have this store procedure to select data from table with join like this...SELECT tblPeribadi.Personel_No, tblPeribadi.Nama,tblCompany.Keterangan as Company_Code, tblPeribadi.Jawatan,...

Microsoft SQL Server

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General