By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
434,701 Members | 1,964 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 434,701 IT Pros & Developers. It's quick & easy.

Join works in 7.3.6, fails in 7.4.2

P: n/a
I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
off enable_hashjoin. I'm joining a table of network interfaces and
a table of networks so I can find additional info about a particular
interface's network. To speed up the join, I'm indexing the
interface IP addresses using a function that converts the IP address
to its network address; this way the join doesn't have to scan using
the << or >> operator.

Here's a reduced example of what I'm doing:

CREATE FUNCTION inet2net (INET) RETURNS INET AS '
SELECT NETWORK(SET_MASKLEN($1, 24));
' LANGUAGE SQL IMMUTABLE;

CREATE TABLE ipinterface (
ifid INTEGER NOT NULL PRIMARY KEY,
ifaddr INET NOT NULL
);

CREATE INDEX ipinterface_ifaddr_idx ON ipinterface (ifaddr);
CREATE INDEX ipinterface_ifaddrnet_idx ON ipinterface (inet2net(ifaddr));

CREATE TABLE ipnet (
netid INTEGER NOT NULL PRIMARY KEY,
netaddr INET NOT NULL,
CONSTRAINT uniq_netaddr UNIQUE (netaddr)
);

CREATE INDEX ipnet_netaddr_idx ON ipnet (netaddr);

After populating the tables, I ran VACUUM ANALYZE on both of them,
so the planner's statistics should be current.

Here's a query that illustrates the problem:

SELECT ifid, ifaddr, netid, netaddr
FROM ipinterface AS i
JOIN ipnet AS n ON (inet2net(i.ifaddr) = n.netaddr)
WHERE netid IN (10, 20);

From my sample data set (available upon request), this query returns
24 rows in 7.3.6, which is correct. Here's the 7.3.6 EXPLAIN ANALZYE:

Nested Loop (cost=0.00..533.78 rows=24 width=32) (actual time=0.20..0.37 rows=24 loops=1)
-> Index Scan using ipnet_pkey, ipnet_pkey on ipnet n (cost=0.00..6.03 rows=2 width=16) (actual time=0.11..0.12 rows=2 loops=1)
Index Cond: ((netid = 10) OR (netid = 20))
-> Index Scan using ipinterface_ifaddrnet_idx on ipinterface i (cost=0.00..262.58 rows=92 width=16) (actual time=0.06..0.10 rows=12 loops=2)
Index Cond: (inet2net(i.ifaddr) = "outer".netaddr)
Total runtime: 0.52 msec
(6 rows)

The same query in 7.4.2 returns no results. Here's its plan:

Hash Join (cost=6.04..483.92 rows=24 width=30) (actual time=299.948..299.948 rows=0 loops=1)
Hash Cond: (network(set_masklen("outer".ifaddr, 24)) = "inner".netaddr)
-> Seq Scan on ipinterface i (cost=0.00..293.32 rows=18432 width=15) (actual time=0.039..130.604 rows=18432 loops=1)
-> Hash (cost=6.03..6.03 rows=2 width=15) (actual time=0.257..0.257 rows=0 loops=1)
-> Index Scan using ipnet_pkey, ipnet_pkey on ipnet n (cost=0.00..6.03 rows=2 width=15) (actual time=0.142..0.196 rows=2 loops=1)
Index Cond: ((netid = 10) OR (netid = 20))
Total runtime: 300.775 ms
(7 rows)

If I turn off enable_hashjoin in 7.4.2 I get 24 rows, as expected:

Nested Loop (cost=0.00..534.87 rows=24 width=30) (actual time=0.301..1.094 rows=24 loops=1)
-> Index Scan using ipnet_pkey, ipnet_pkey on ipnet n (cost=0.00..6.03 rows=2 width=15) (actual time=0.132..0.180 rows=2 loops=1)
Index Cond: ((netid = 10) OR (netid = 20))
-> Index Scan using ipinterface_ifaddrnet_idx on ipinterface i (cost=0.00..262.81 rows=92 width=15) (actual time=0.088..0.242 rows=12 loops=2)
Index Cond: (network(set_masklen(i.ifaddr, 24)) = "outer".netaddr)
Total runtime: 1.914 ms
(6 rows)

Am I doing something wrong, or should I report this to the bugs
list?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Michael Fuhr <mi**@fuhr.org> writes:
I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
off enable_hashjoin. I'm joining a table of network interfaces and
a table of networks so I can find additional info about a particular
interface's network.
Hmm. The inet = operator is marked hashable in 7.4 but not in 7.3 ...
I wonder if that is a mistake? I recall looking at the datatype and
deciding there were no insignificant bits in it, but that could be
wrong. Or it could be that the network() function is taking some
shortcut it shouldn't.

Is any of this data IPv6 addresses by any chance?
From my sample data set (available upon request),


Could we see the specific values that join in 7.3 and fail to do so in
7.4?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ma*******@postgresql.org

Nov 23 '05 #2

P: n/a
I wrote:
Michael Fuhr <mi**@fuhr.org> writes:
I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
off enable_hashjoin. I'm joining a table of network interfaces and
a table of networks so I can find additional info about a particular
interface's network.
Hmm. The inet = operator is marked hashable in 7.4 but not in 7.3 ...
I wonder if that is a mistake?


Digging further, I find that indeed this seems to be a mistake. CIDR
and INET values that have the same address and masklen compare as equal
according to network_eq(), but they will not hash the same because
there's a flag identifying whether a given value is considered CIDR or
INET. And what the network() function returns is marked as a CIDR.
It's a bit surprising that your hash join produces any matches at all...

I believe I got misled on this because there is a hash index operator
class for inet; at one point during the 7.4 cycle I went around and
cleaned up cases where the equality operator's canhash flag was
inconsistent with the set of hash index opclasses. Arguably the hash
opclass is broken, although in practice people probably don't notice the
failure since a given column is likely to contain either all inet or all
cidr values. (And of course it's entirely likely that there *aren't*
any people using the inet hash opclass, period...)

I can think of a number of possible fixes:

1. Mark inet = as not hashjoinable. We'd probably want to remove the
inet hash opclass too.

2. Redefine inet = so that CIDR and INET values are never considered
equal, thus eliminating the unused field. This could be back-patched
into 7.4 but otherwise seems to have little to recommend it. It
would certainly not help solve Michael's problem.

3. Provide a specialized hash method for type inet that ignores the
iptype field.

#3 seems the most desirable going forward, but is probably impractical
to back-patch into 7.4.*, so I'm not sure what to do about the problem
in that branch. Given the relatively low incidence of the problem,
maybe it's okay to just clear the oprcanhash flag in future 7.4.*
releases. This would not fix the problem for existing installations
(unless they initdb) but any complainers could be told how to adjust
their catalogs manually.

Can anyone think of any other approaches?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #3

P: n/a
On Tue, Apr 13, 2004 at 03:42:54PM -0400, Tom Lane wrote:
Michael Fuhr <mi**@fuhr.org> writes:
I have a query that works in 7.3.6 but not in 7.4.2 unless I turn
off enable_hashjoin. I'm joining a table of network interfaces and
a table of networks so I can find additional info about a particular
interface's network.
Hmm. The inet = operator is marked hashable in 7.4 but not in 7.3 ...
I wonder if that is a mistake? I recall looking at the datatype and
deciding there were no insignificant bits in it, but that could be
wrong. Or it could be that the network() function is taking some
shortcut it shouldn't.


So would a workaround be to set oprcanhash to false for that
operator? I did the following and it appeared to solve the
problem:

UPDATE pg_operator SET oprcanhash = FALSE WHERE oid = 1201;

Or, without knowing that 1201 is the correct OID:

UPDATE pg_operator SET oprcanhash = FALSE
WHERE oprname = '='
AND oprleft IN (SELECT oid FROM pg_type WHERE typname = 'inet');
Is any of this data IPv6 addresses by any chance?


Nope -- all IPv4.
From my sample data set (available upon request),


Could we see the specific values that join in 7.3 and fail to do so in
7.4?


I can duplicate the problem with the following data:

INSERT INTO ipinterface VALUES (1, '10.0.1.1');
INSERT INTO ipinterface VALUES (2, '10.0.2.1');
INSERT INTO ipnet VALUES (10, '10.0.1.0/24');
INSERT INTO ipnet VALUES (20, '10.0.2.0/24');

Thanks for looking into this.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 23 '05 #4

P: n/a
Michael Fuhr <mi**@fuhr.org> writes:
So would a workaround be to set oprcanhash to false for that
operator?


Right, see my followup note. This may in fact be the only workable
solution for the 7.4.* series.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.