473,386 Members | 1,715 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

gist index build produces corrupt result on first access totable.

We've implemented a 5D box data type and have implemented both RTree and
GiST access methods under PostgresSQL 7.4 and PostgresSQL 7.4.1. The 5D box
internally looks like:
struct Box5D{
float minBounds[5];
float maxBounds[5];
};
and so takes up 40 bytes and is of fixed length. The GiST access methods
are basically a generalization of the 2D box indexing access methods from
Refraction Research's PostGIS.

We've tested this by building an index on a table that contains 30000 rows
(no nulls) representing small non-overlapping boxes on a 2D plane (ie, all
the values for indices 0 and 1 vary; the values for indices 2,3, and 4 are
constant). Then we try a select on the table with an overlaps test in the
where clause that should result in 66 rows being returned.
We've used an "explain select ..... " to verify that the index is used
during the search.

An RTree index built on the table works fine, always. The ability to build
a GiST index is less reliable; sometimes the index is built properly and
subsequent searchs on it work perfectly. Sometimes the index is built
incorrectly, and subsequent searchs on it fail. We have examined the keys
passed to our GiST consistent access method during the select and have
observed that when the index hasn't been built properly, the first key
passed to the consistent method is gibberish (strange values for indices 0
and 1, strange nonzero values for indices 2,3 and 4). When the index has
been built properly, the first key passed to the consistent method has
reasonable looking values.

We've been able to isolate this behavior somewhat: the index build only
seems to produce corrupt results if the data connection that sent the
command did not perform any preceding statements that would have caused the
boxes to have been read from disk. For example, the sequence:
<open connection to db>
create index box_index on box_table using gist(box_column);

would produce a corrupt index.
However, the sequence

<open connection to db>
select count(*) from box_table where box_column && '0 0 0 0 0 1 1 1 1 1';
create index box_index on box_table using gist(box_column);

produces a valid index,
as does the sequence:

<open connection to db>
create index box_index on box_table using gist(box_column);
drop index box_index;
create index box_index on box_table using gist(box_column);

In short, it appears that an operation that forces the boxes to read from a
table
needs to be done to "prime the index building pump".

We've tried installing PostGIS and it does not seem to be affected by any
similar problems.
The notable differences between our code and Refraction's are:
* our input geometries are fixed length, rather than variable length
* our compress method just returns the GISTENTRY * it was passed as an
argument. We've tried returning a deep copy of the GISENTRY *, but that
made no difference.
We've looked at the contrib/cube code, but it's also a variable length
parameter, and it uses a very different parameter passing mechanism.
In the short term, we can work around the issue, but it would be nice to be
able to fix the problem. Any ideas are welcome.

Thank you.
Eric.

**********************************************
Eric Davies, M.Sc.
Barrodale Computing Services Ltd.
Tel: (250) 472-4372 Fax: (250) 472-4373
Web: http://www.barrodale.com
Email: er**@barrodale.com
**********************************************
Mailing Address:
P.O. Box 3075 STN CSC
Victoria BC Canada V8W 3W2

Shipping Address:
Hut R, McKenzie Avenue
University of Victoria
Victoria BC Canada V8W 3W2
**********************************************

Nov 23 '05 #1
3 2179
Eric Davies <er**@barrodale.com> writes:
We've looked at the contrib/cube code, but it's also a variable length
parameter, and it uses a very different parameter passing mechanism.


Different how? I have a hard-to-pin-down intuition that your problem is
closely associated with this issue, but without details it's impossible
to say more. IIRC there were once assumptions in rtree and/or gist
about the indexable datatype being pass-by-ref and/or varlena, and I'm
not sure how much of that has been fixed.

This seems a bit far afield for pgsql-general, however. I would
suggest offering more details on -hackers. If you could post your code
for people to play with, you'd likely get answers pretty quickly.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #2


Tom Lane wrote:
Eric Davies <er**@barrodale.com> writes:
We've looked at the contrib/cube code, but it's also a variable length
parameter, and it uses a very different parameter passing mechanism.

Different how? I have a hard-to-pin-down intuition that your problem is
closely associated with this issue, but without details it's impossible
to say more. IIRC there were once assumptions in rtree and/or gist
about the indexable datatype being pass-by-ref and/or varlena, and I'm
not sure how much of that has been fixed.

contribs rtree_gist, btree_gist and seg uses pass-by-ref fixed length value for
storage, and there isn't such problem.
It seems to me, the problem may be in union or picksplit methods, but I needs
more info.

I advice to play with gevel module (
http://www.sai.msu.su/~megera/postgr...l/README.gevel,
http://www.sai.msu.su/~megera/postgr...l/gevel.tar.gz
)

--
Teodor Sigaev E-mail: te****@sigaev.ru

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #3

At 10:28 PM 3/3/2004, Tom Lane wrote:
Eric Davies <er**@barrodale.com> writes:
We've looked at the contrib/cube code, but it's also a variable length
parameter, and it uses a very different parameter passing mechanism.
Different how? I have a hard-to-pin-down intuition that your problem is


the parameter passing mechanism used by Postgis (and our 5D box) is to
pass a FunctionCallInfoData structure, via a PG_FUNCTION_ARGS macro.
The Postgis penalty function starts with:
/*
** The GiST Penalty method for boxes
** As in the R-tree paper, we use change in area as our penalty metric
*/
PG_FUNCTION_INFO_V1(gbox_penalty);
Datum gbox_penalty(PG_FUNCTION_ARGS)
{
GISTENTRY *origentry = (GISTENTRY *) PG_GETARG_POINTER(0);

the cube code appears to pass each argument separately, as one would
normally do with a
C program. Ie, the g_cube_penalty function starts with:
/*
** The GiST Penalty method for boxes
** As in the R-tree paper, we use change in area as our penalty metric
*/
float *
g_cube_penalty(GISTENTRY *origentry, GISTENTRY *newentry, float *result)
{
NDBOX *ud;
double tmp1,

Again, the postgis code seems to work fine with GiST indexes, so I would
rule out the argument passing mechanism as being of influence.

closely associated with this issue, but without details it's impossible
to say more. IIRC there were once assumptions in rtree and/or gist
about the indexable datatype being pass-by-ref and/or varlena, and I'm
not sure how much of that has been fixed.

This seems a bit far afield for pgsql-general, however. I would
suggest offering more details on -hackers. If you could post your code
for people to play with, you'd likely get answers pretty quickly.
I figured likewise, but the etiquete for the postgresql.org group required
I try the more general group first. :-).
I'll give that a shot now.
regards, tom lane


**********************************************
Eric Davies, M.Sc.
Barrodale Computing Services Ltd.
Tel: (250) 472-4372 Fax: (250) 472-4373
Web: http://www.barrodale.com
Email: er**@barrodale.com
**********************************************
Mailing Address:
P.O. Box 3075 STN CSC
Victoria BC Canada V8W 3W2

Shipping Address:
Hut R, McKenzie Avenue
University of Victoria
Victoria BC Canada V8W 3W2
**********************************************

Nov 23 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Dmitry Tkach | last post by:
Hi, everybody! I am trying to create a custom GiST index in 7.3, but getting an error, that I don't know how to interpret: testdb=# create table gist_test (field int8); CREATE TABLE testdb=#...
0
by: George Essig | last post by:
I have installed tsearch2 and have noticed that the gist index used to do searches grows and grows as I update rows, delete rows, or run VACUUM FULL ANALYZE. Below are some details: PostgreSQL...
14
by: Sean C. | last post by:
Helpful folks, Most of my previous experience with DB2 was on s390 mainframe systems and the optimizer on this platform always seemed very predictable and consistent. Since moving to a WinNT/UDB...
8
by: Mike | last post by:
Hello, I have a few rather urgent questions that I hope someone can help with (I need to figure this out prior to a meeting tomorrow.) First, a bit of background: The company I work for is...
4
by: aaj | last post by:
Hi all I have been given a corrupt access 2000 backend and asked to salvage the data.(youve gussed it, no backup) On trying to open it just kicks me out saying its not a valid database file -...
25
by: kapilk | last post by:
Sir, I know that the array index starts in C from 0 and not 1 can any body pls. tell me the reason. Is it because in the subscript i can have a unsigned integer and these start from 0 ...
10
by: Reynard Hilman | last post by:
Hi, I have been having this problem where the database size suddenly grows from the normal size of about 300Mb to 12Gb in one night. When I look up the table size, the biggest one is only 41Mb...
3
by: Net Virtual Mailing Lists | last post by:
Hello, I have a table like this with some indexes as identified: CREATE TABLE sometable ( data TEXT, data_fti TSVECTOR, category1 INTEGER, category2 INTEGER,
4
by: BravoFoxtrot | last post by:
Hi, I'm trying to build an index into a multi dimensional associative array. I may not know how many dimensions there are so i want to pass the array indexes as a variable. $arrayToAccess =...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.