problem with huge joins

Kolus Maximiliano

Hello,

I don't know if this is *entirely* a postgresql question or if it's
my screwup with the database design.

First, a little introduction to the problem: I maintain a database
of open proxies (they're used for tagging and/or rejecting email) and I use
postgres as it's backend. The list itself is contained in three tables:

tHost (458K rows)
Hostid integer (PK + HASH INDEX)
Ip inet
Rdns varchar

tList (403K rows)
Listid integer (PK)
Input integer
Output integer (HASH INDEX)
Date timestamp

tPort (986K rows)
Portid integer (PK)
Listid integer (HASH INDEX)
Protocol integer
Port integer (BTREE INDEX)

Every hour, I must build a couple of files with the list itself, in
order to do this I need to retrieve output ip address (list+host) and for
each ip address port/protocol combinations, ie:

192.168.1.1 1080 3 (for socks 4)

In order to get this, I run this query:

SELECT ip, TO_CHAR(date, 'YYYY-MM-DD'), protocol, port
FROM tProxyPort, tProxyList, tProxyHost
WHERE tProxyPort.listId=tProxyList.listId
AND tProxyList.output=tProxyHost.hostId
ORDER BY ip, port

Whose query plan is:

Sort (cost=311874.07..311874.07 rows=986130 width=44) (actual
time=300086.42..302580.25 rows=986130 loops=1)
-> Hash Join (cost=39735.96..96907.83 rows=986130 width=44) (actual
time=86226.28..223195.50 rows=986130 loops=1)
-> Seq Scan on tport (cost=0.00..18629.30 rows=986130 width=12)
(actual time=0.15..25910.56 rows=986130 loops=1)
-> Hash (cost=35972.38..35972.38 rows=403034 width=32) (actual
time=86194.99..86194.99 rows=0 loops=1)
-> Hash Join (cost=9787.92..35972.38 rows=403034 width=32)
(actual time=12180.64..84316.65 rows=403927 loops=1)
-> Seq Scan on thost (cost=0.00..7850.41 rows=457341
width=16) (actual time=619.09..10032.85 rows=458787 loops=1)
-> Hash (cost=6812.34..6812.34 rows=403034 width=16)
(actual time=6656.36..6656.36 rows=0 loops=1)
-> Seq Scan on tlist (cost=0.00..6812.34
rows=403034 width=16) (actual time=6.90..5030.22 rows=403927 loops=1)
Total runtime: 317046.69 msec

As you can see, it takes eons to complete and I couldn't find a way
to make it faster (the hash indexes are one of the attempts and yes, I did a
vacuum). I don't think that a table with nearly one million entries is "that
much" for postgres and I'm convinced I'm doing something wrong. Any ideas on
how to speed this thing up will be appreciated.

Version:
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4
(shared buffers: 128)

CPU:
Pentium III (Katmai) (600Mhz)

/proc/meminfo:

total: used: free: shared: buffers: cached:
Mem: 263475200 218435584 45039616 0 17240064 161374208
Swap: 511959040 7389184 504569856
MemTotal: 257300 kB
MemFree: 43984 kB
MemShared: 0 kB
Buffers: 16836 kB
Cached: 155348 kB
SwapCached: 2244 kB
Active: 131204 kB
Inactive: 68760 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 257300 kB
LowFree: 43984 kB
SwapTotal: 499960 kB
SwapFree: 492744 kB

Thanks in advance.

Nov 12 '05 #1

Subscribe Post Reply

1305

Tom Lane

Kolus Maximiliano <Ko***************@bcr.com.ar> writes:

In order to get this, I run this query: SELECT ip, TO_CHAR(date, 'YYYY-MM-DD'), protocol, port
FROM tProxyPort, tProxyList, tProxyHost
WHERE tProxyPort.listId=tProxyList.listId
AND tProxyList.output=tProxyHost.hostId
ORDER BY ip, port Whose query plan is: Sort (cost=311874.07..311874.07 rows=986130 width=44) (actual
time=300086.42..302580.25 rows=986130 loops=1)
-> Hash Join (cost=39735.96..96907.83 rows=986130 width=44) (actual
time=86226.28..223195.50 rows=986130 loops=1)
-> Seq Scan on tport (cost=0.00..18629.30 rows=986130 width=12)
(actual time=0.15..25910.56 rows=986130 loops=1)
-> Hash (cost=35972.38..35972.38 rows=403034 width=32) (actual
time=86194.99..86194.99 rows=0 loops=1)
-> Hash Join (cost=9787.92..35972.38 rows=403034 width=32)
(actual time=12180.64..84316.65 rows=403927 loops=1)
-> Seq Scan on thost (cost=0.00..7850.41 rows=457341
width=16) (actual time=619.09..10032.85 rows=458787 loops=1)
-> Hash (cost=6812.34..6812.34 rows=403034 width=16)
(actual time=6656.36..6656.36 rows=0 loops=1)
-> Seq Scan on tlist (cost=0.00..6812.34
rows=403034 width=16) (actual time=6.90..5030.22 rows=403927 loops=1)
Total runtime: 317046.69 msec

The joins and sort steps seem to take rather a long time. What do you
have sort_mem set to? You probably want it on the order of 10Mb so that
these joins are done in memory rather than spilling to disk.

The hash indexes are a waste of time for this :-(

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to ma*******@postgresql.org)

Nov 12 '05 #2

by: john67 | last post by:

The company I work for is about to embark on developing a commercial application that will cost us tens-of-millions to develop. When all is said and done it will have thousands of business...

Python

Multi-tables LEFT JOIN problem

by: Marek Kotowski | last post by:

In MySQL online documentation there are some examples with multi-tables left joins. But all of them are like this (taken from the documentation): SELECT ... FROM table1 LEFT JOIN table2 on...

MySQL Database

Problem with multiple joins.

by: Prem | last post by:

Hi All Database Gurus, I am trying to write code which will produce all the possible valid queries, given tables and join information for tables. Right now i am just trying to construct all the...

Microsoft SQL Server

Help with Many-to-Many-to-Many Problem

by: Keith | last post by:

I am having a problem creating a many-to-many-to-many type relationship. It works fine, but when I create a view to query it and test it, it does not generate the results I expected. Below...

Microsoft SQL Server

OUTER JOIN problem

by: Matt | last post by:

Hello I have to tables ar and arb, ar holds articles and a swedish description, arb holds descriptions in other languages. I want to retreive all articles that match a criteria from ar and...

Microsoft SQL Server

while loops within while loops causing huge memory problem

by: monomaniac21 | last post by:

hi all i have a script that retrieves rows from a single table, rows are related to eachother and are retrieved by doing a series of while loops within while loops. bcos each row contains a text...

PHP

Select Count(*) problem

by: MP | last post by:

vb6,ado,mdb,win2k i pass the sql string to the .Execute method on the open connection to Table_Name(const) db table fwiw (the connection opened via class wrapper:) msConnString = "Data Source="...

Microsoft Access / VBA

Re: PROBLEM WITH DISCOVERER JOINS

by: Ana RM | last post by:

Mark.Powell@eds.com (Mark D Powell) wrote in message news:<2687bb95.0308010642.1fc4ff1f@posting.google.com>... Hi Mark, Thanks por answer me. I do not think it is important thw warehouse...

Oracle Database

Vanishing Joins

by: TC | last post by:

I've used Access for many years. Several times, I've encountered a bug which I refer to as the "Vanishing Joins" bug. When it happens, joins vanish randomly from queries. More specifically, all...

Microsoft Access / VBA

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Similar topics