473,800 Members | 2,640 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

problem with huge joins

Hello,

I don't know if this is *entirely* a postgresql question or if it's
my screwup with the database design.

First, a little introduction to the problem: I maintain a database
of open proxies (they're used for tagging and/or rejecting email) and I use
postgres as it's backend. The list itself is contained in three tables:

tHost (458K rows)
Hostid integer (PK + HASH INDEX)
Ip inet
Rdns varchar

tList (403K rows)
Listid integer (PK)
Input integer
Output integer (HASH INDEX)
Date timestamp

tPort (986K rows)
Portid integer (PK)
Listid integer (HASH INDEX)
Protocol integer
Port integer (BTREE INDEX)

Every hour, I must build a couple of files with the list itself, in
order to do this I need to retrieve output ip address (list+host) and for
each ip address port/protocol combinations, ie:

192.168.1.1 1080 3 (for socks 4)

In order to get this, I run this query:

SELECT ip, TO_CHAR(date, 'YYYY-MM-DD'), protocol, port
FROM tProxyPort, tProxyList, tProxyHost
WHERE tProxyPort.list Id=tProxyList.l istId
AND tProxyList.outp ut=tProxyHost.h ostId
ORDER BY ip, port

Whose query plan is:

Sort (cost=311874.07 ..311874.07 rows=986130 width=44) (actual
time=300086.42. .302580.25 rows=986130 loops=1)
-> Hash Join (cost=39735.96. .96907.83 rows=986130 width=44) (actual
time=86226.28.. 223195.50 rows=986130 loops=1)
-> Seq Scan on tport (cost=0.00..186 29.30 rows=986130 width=12)
(actual time=0.15..2591 0.56 rows=986130 loops=1)
-> Hash (cost=35972.38. .35972.38 rows=403034 width=32) (actual
time=86194.99.. 86194.99 rows=0 loops=1)
-> Hash Join (cost=9787.92.. 35972.38 rows=403034 width=32)
(actual time=12180.64.. 84316.65 rows=403927 loops=1)
-> Seq Scan on thost (cost=0.00..785 0.41 rows=457341
width=16) (actual time=619.09..10 032.85 rows=458787 loops=1)
-> Hash (cost=6812.34.. 6812.34 rows=403034 width=16)
(actual time=6656.36..6 656.36 rows=0 loops=1)
-> Seq Scan on tlist (cost=0.00..681 2.34
rows=403034 width=16) (actual time=6.90..5030 .22 rows=403927 loops=1)
Total runtime: 317046.69 msec

As you can see, it takes eons to complete and I couldn't find a way
to make it faster (the hash indexes are one of the attempts and yes, I did a
vacuum). I don't think that a table with nearly one million entries is "that
much" for postgres and I'm convinced I'm doing something wrong. Any ideas on
how to speed this thing up will be appreciated.

Version:
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4
(shared buffers: 128)

CPU:
Pentium III (Katmai) (600Mhz)

/proc/meminfo:

total: used: free: shared: buffers: cached:
Mem: 263475200 218435584 45039616 0 17240064 161374208
Swap: 511959040 7389184 504569856
MemTotal: 257300 kB
MemFree: 43984 kB
MemShared: 0 kB
Buffers: 16836 kB
Cached: 155348 kB
SwapCached: 2244 kB
Active: 131204 kB
Inactive: 68760 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 257300 kB
LowFree: 43984 kB
SwapTotal: 499960 kB
SwapFree: 492744 kB

Thanks in advance.
Nov 12 '05 #1
1 1320
Kolus Maximiliano <Ko************ ***@bcr.com.ar> writes:
In order to get this, I run this query: SELECT ip, TO_CHAR(date, 'YYYY-MM-DD'), protocol, port
FROM tProxyPort, tProxyList, tProxyHost
WHERE tProxyPort.list Id=tProxyList.l istId
AND tProxyList.outp ut=tProxyHost.h ostId
ORDER BY ip, port Whose query plan is: Sort (cost=311874.07 ..311874.07 rows=986130 width=44) (actual
time=300086.42. .302580.25 rows=986130 loops=1)
-> Hash Join (cost=39735.96. .96907.83 rows=986130 width=44) (actual
time=86226.28.. 223195.50 rows=986130 loops=1)
-> Seq Scan on tport (cost=0.00..186 29.30 rows=986130 width=12)
(actual time=0.15..2591 0.56 rows=986130 loops=1)
-> Hash (cost=35972.38. .35972.38 rows=403034 width=32) (actual
time=86194.99.. 86194.99 rows=0 loops=1)
-> Hash Join (cost=9787.92.. 35972.38 rows=403034 width=32)
(actual time=12180.64.. 84316.65 rows=403927 loops=1)
-> Seq Scan on thost (cost=0.00..785 0.41 rows=457341
width=16) (actual time=619.09..10 032.85 rows=458787 loops=1)
-> Hash (cost=6812.34.. 6812.34 rows=403034 width=16)
(actual time=6656.36..6 656.36 rows=0 loops=1)
-> Seq Scan on tlist (cost=0.00..681 2.34
rows=403034 width=16) (actual time=6.90..5030 .22 rows=403927 loops=1)
Total runtime: 317046.69 msec


The joins and sort steps seem to take rather a long time. What do you
have sort_mem set to? You probably want it on the order of 10Mb so that
these joins are done in memory rather than spilling to disk.

The hash indexes are a waste of time for this :-(

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 12 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

53
3564
by: john67 | last post by:
The company I work for is about to embark on developing a commercial application that will cost us tens-of-millions to develop. When all is said and done it will have thousands of business objects/classes, some of which will have hundreds-of-thousands of instances stored in a DB. Our clients will probably have somewhere between 50-200 users working on the app during the day, possibly in mutiple offices, and then a large number of batch...
5
2365
by: Marek Kotowski | last post by:
In MySQL online documentation there are some examples with multi-tables left joins. But all of them are like this (taken from the documentation): SELECT ... FROM table1 LEFT JOIN table2 on (table1.id = table2.id) LEFT JOIN table3 on (table1.id2 = table3.id2) LEFT JOIN table4 on (table1.id3 = table4.id3)
1
4663
by: Prem | last post by:
Hi All Database Gurus, I am trying to write code which will produce all the possible valid queries, given tables and join information for tables. Right now i am just trying to construct all the sequential joins. eg. if i have 4 tables A, B, C, D and the join conditions are A Inner join B, B Inner Join C, C Left Outer join D then i am constructing joins as :
2
908
by: Keith | last post by:
I am having a problem creating a many-to-many-to-many type relationship. It works fine, but when I create a view to query it and test it, it does not generate the results I expected. Below if the DDL for the tables and the SQL for the view.
8
4975
by: Matt | last post by:
Hello I have to tables ar and arb, ar holds articles and a swedish description, arb holds descriptions in other languages. I want to retreive all articles that match a criteria from ar and also display their corresponding entries in arb, but if there is NO entry in arb I still want it to show up as NULL or something, so that I can get the attention that there IS no language associated with that article.
3
2423
by: monomaniac21 | last post by:
hi all i have a script that retrieves rows from a single table, rows are related to eachother and are retrieved by doing a series of while loops within while loops. bcos each row contains a text field they are fairly large. the net result is that when 60 or so results are reitreved the page size is 400kb! which takes too long to load. is there a way of shorterning this? freeing up the memory say, bcos what is actually displayed is not...
22
12496
by: MP | last post by:
vb6,ado,mdb,win2k i pass the sql string to the .Execute method on the open connection to Table_Name(const) db table fwiw (the connection opened via class wrapper:) msConnString = "Data Source=" & msDbFilename moConn.Properties("Persist Security Info") = False moConn.ConnectionString = msConnString moConn.CursorLocation = adUseClient moConn.Mode = adModeReadWrite' or using default...same result
1
2853
by: Ana RM | last post by:
Mark.Powell@eds.com (Mark D Powell) wrote in message news:<2687bb95.0308010642.1fc4ff1f@posting.google.com>... Hi Mark, Thanks por answer me. I do not think it is important thw warehouse knowledge to solve my problem because at the end all the tables are relational. The point is that I do not want make changes in the database, I want
36
2501
by: TC | last post by:
I've used Access for many years. Several times, I've encountered a bug which I refer to as the "Vanishing Joins" bug. When it happens, joins vanish randomly from queries. More specifically, all joins vanish from at least one (seemingly random) query. I've always regarded the Vanishing Joins bug as a symptom of corruption. When it happens, I usually give my users advice on how to recover from corruption, and how to avoid it in the future....
0
9550
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10501
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10273
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10032
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9085
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7574
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
1
4149
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3764
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2944
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.