473,326 Members | 2,012 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,326 software developers and data experts.

locating set of points close to one another (within a threshold)

Hi all

I have a large data set of points situated in 3d space. I have a simple
primary key and an x, y and z value.

What I would like is an efficient method for finding the group of
points within a threshold.

So far I have tested the following however it is very slow.

---------------
select *
from locations a full outer join locations b
on a.ID < b.ID and a.X-b.X<2 and a.Y-b.Y<2 and a.Z-b.Z<2
where a.ID is not null and b.ID is not null
---------------

If anyone knows of a more efficient method to arrive at this results it
would be most appreciated.

Thanks in advance
Bevan

May 6 '06 #1
8 1540
(be*******@gmail.com) writes:
I have a large data set of points situated in 3d space. I have a simple
primary key and an x, y and z value.
Why is (x, y, z) not the primary key? Or put differently: what does it
mean if you have to points with the same coordinates?

If nothing else, the ID column serves to make the rows wider, which
means fewer rows per pages, which means worse performance.
What I would like is an efficient method for finding the group of
points within a threshold.

So far I have tested the following however it is very slow.

---------------
select *
from locations a full outer join locations b
on a.ID < b.ID and a.X-b.X<2 and a.Y-b.Y<2 and a.Z-b.Z<2
where a.ID is not null and b.ID is not null
---------------


Not that I think that it matters much for execution time, but FULL JOIN
seems flat wrong here, and with the conditions in the WHERE clause you
make it an inner join anyway. I would write the query as:

select *
from locations a
cross join locations b
WHERE a.X < b.X + 2 and a.Y < b.Y + 2 and a.Z < b.Z + 2

It's important to have one coordinate alone on operator, that increases
the chance for an index being used. You do have an index on (x, y, z),
don't you?

Still I am far from certain that SQL Server will use an index. It would be
interesting to see the query plan for this query.

This is quite a difficult problem for a relational engine, and it could
be a case where a cursor is the best. Add a clustered index on (x, y, z).
Iterate over the points ordered by X. If the current point and the
previous point are less than 2 from each other, save the combination
into a temp table. The tricky part is that if you are at point N, it
may proximite to more than point you've already passed over. So you can
keep data in varaiables. You could use a table variable, but I think
that it's better to use the table itself. This is where it is important
that you have a clustered index on X.

Normally cursors are a poor solution, but the point here is that you
would only need to make one iteration over the table.

Faster than a cursor though, would be to get all data to a client
program (or a .Net stored procedure if you are on SQL 2005), since
traditional languages have better structures for this sort of problem.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
May 6 '06 #2
Erland Sommarskog (es****@sommarskog.se) writes:
This is quite a difficult problem for a relational engine, and it could
be a case where a cursor is the best.


I should stress that this is just a wild idea. It may very well prove
that a cursor solution is horribly slow - as cursor solutions often are.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
May 6 '06 #3
Erland really appreciate the effort you've put into this message it has
helped alot.

The reason there is a key as well as the location data is that is the
number the samples are collected and tacked against. There can be
points with the same coordinates in real life however there is a
problem where the coordinate location is confirmed with a more precise
method and it is given a new name - ideally it would not be possible to
enter this into the database however since the planning phase is not
mandatory it has to have a merge character... I hope in the future the
work flow can develop to stop these types of things happening.

I wasn't able to get the query to use the index (I do have a clustered
index on x, y, z) with either the cross join restricting on the where
or the full outer restricted on the null key.

I've had to give up on it for now but have printed your message and
will try to get something effective working - I will test a cursor
later today and see how it will run.

Unfortunately this database is still in 2000 however once it moves to
2005 I can see as you state this problem can be solved in a more
conventional manner.

Thanks again for your effort and I hope to be able to post a method
back that runs more efficiently in the near futuer.

Cheers
Bevan

May 6 '06 #4
Why would you have a NULL identifier? I guess that two things can have
the same location, otherwise (x,y,z) would be a key. And should your
query look like this, since you want to go in both directions.on an
axis? The OUTER JOIN makes no sense.

SELECT A.node_id, B.node_id
FROM Locations AS A, Locations AS b
WHERE A.node_id < B.node_id
AND ABS(A.x-B.x) <= 2
AND ABS(A.y-B.y) <= 2
AND ABS(A.z-B.z) <= 2

I did something like in 2-D for neighborhoods based on quadrants. If a
node was in one quadrant, I assumed that the nine adjacent quads
centered on his quad would have the nearest neighbor, so i only did a
distanve formula on popint in those quads. It meant carrying an extra
pair of quad co-ordinates, but it saved searching all the nodes -- 9
quads versus ~15,000 quads.

May 7 '06 #5
(be*******@gmail.com) writes:
I have a large data set of points situated in 3d space. I have a simple
primary key and an x, y and z value.

What I would like is an efficient method for finding the group of
points within a threshold.

So far I have tested the following however it is very slow.

---------------
select *
from locations a full outer join locations b
on a.ID < b.ID and a.X-b.X<2 and a.Y-b.Y<2 and a.Z-b.Z<2
where a.ID is not null and b.ID is not null
---------------


As indicated by Celko's posting there is most probably an error here:
you need abs(a.X - b.X) etc, else you will get far too many points.

Mathematically it would also make more sense to use

sqrt(square(a.X - b.X) + square(a.Y - b.Y) + square(a.Z - b.Z)) < cutoff
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
May 7 '06 #6
I haven't implemented this on a large scale, but I'll pass on
what I think is a good approach. I suggested it in a different
context in the thread found here:
http://groups.google.com/groups/sear...exemplar+skass
Variations on this idea are used in GIS and other such systems.
Note that my comments aren't specific to your exact question,
but they should apply reasonably well.

Very briefly, the idea is to maintain a set of points *not*
in your set that are evenly scattered through space - the
lattice points of a coarse grid. These will serve as neighborhood
markers, and with each point, you will store its NeighborhoodID.
You can think of these as neighborhood post offices, for example.
The coarseness of the grid will be chosen depending on what kinds
of questions you ask, but typically the size of the neighborhoods
reflects the meaning of "close".

Points can only be close to each other if they live in neighborhoods
that are close to each other, or equivalently, if their neighborhood
post offices are close to each other.

For example, if your 3D coordinates range from 0 to 1000
each, you might slice up space every 100 units and use these
as lattice points: {i*100+50,j*100+50,k+100*50) for 0<=i,j,k<10.
So you have 1000 lattice points in this case. No point in
space is more than 100*SQRT(3) ~ 86.6 units from one of these 1000
points, so this will work if the "closeness" you need is on this
order ("close" should mean same or adjacent neighborhood).

No matter how many points you have here, you have
only 1K neighborhoods. If you had 10M points, you
now have 10K points/neighborhood on average.

Now the part that reduces the cost of querying: For many queries,
you can first search or look up pairs of post offices, not points,
that meet your requirement of close. (This could be "adjacent
pairs" in most cases, and post office adjacencies could be
maintained permanently). There is one Mega-pair of post offices
in this example, but 100 Tera-pairs of points
(in nanoseconds, this is 1 millisecond vs. 1 day).
Even if you still have to look at all pairs of points in
adjacent neighborhoods, you have made progress, but for many
problems, you might be able to narrow down the search further.

You might maintain various values in your tables: for points,
the distance to the neighborhood post office and the post
office ID, or the same for all adjacent neighborhood post
offices. For post offices, you might maintain the number of
points in the neighborhood. You might use a fixed rectangular
lattice with a canonical way of enumerating neighbors, so you
can pack things into arrays, or you might allow the lattice to
be dynamic, depending on where the points are. You might maintain
all inter-postoffice distances or just nearby ones. And you
might use triggers a lot if there are many updates and
inserts, but if the data is relatively static, you might
maintain larger and well-indexed collections of
relevant calculations. Depending on how real-time the
system is, you might treat parts of it like a warehousing
problem. And on and on (there are many books on the subject,
on subjects like "spatial databases" and geodatabases -
I haven't read whole books on it)

This is only a sketch, and it is not trivial (but not impossible)
to implement, but maybe it will help. The alternative is
probably to look for GIS-type third-party products that can
connect to SQL data. I'm sure there are some out there, but
they may not be cheap.

The shorter answer is that there is probably no single fast
query to do most things like this if none of this framework is
in place. At best, maybe you can generate a crude lattice on
the fly with derived tables from Numbers tables - but that is
basically putting some of this framework in place.

Steve Kass
Drew University

be*******@gmail.com wrote:
Hi all

I have a large data set of points situated in 3d space. I have a simple
primary key and an x, y and z value.

What I would like is an efficient method for finding the group of
points within a threshold.

So far I have tested the following however it is very slow.

---------------
select *
from locations a full outer join locations b
on a.ID < b.ID and a.X-b.X<2 and a.Y-b.Y<2 and a.Z-b.Z<2
where a.ID is not null and b.ID is not null
---------------

If anyone knows of a more efficient method to arrive at this results it
would be most appreciated.

Thanks in advance
Bevan

May 7 '06 #7
Another MVP colleague pointed me to this link:

http://research.microsoft.com/resear...SR-TR-2005-122

The link is for SQL 2005, but the algorithm as such might be applicable.
--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pro...ads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinf...ons/books.mspx
May 7 '06 #8
Square both sides to avoid conversions to FLOAT that SQRT() will give
you.

POWER ((A.x - B.x), 2) + POWER ((A.y - B.y), 2)+ POWER ((A.z - B.z), 2)
< POWER(cutoff , 2)

May 8 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Anne | last post by:
hie there, i have a couple of questions that i need help on. the first is regarding the closing of a browser.i have an exit button which on click i would like it to totally close the browser, and...
0
by: Johann Blake | last post by:
I'm having trouble grasping how ASP.NET correctly locates resources. There is plenty of documentation on this subject but some things are not clear at all. In my ASP.NET application, I have...
0
by: dmlinliverpool | last post by:
I am running VS.net 2005 Express and Sql Server 2005 Express. The DB and VS are both on the same PC. I cannot connect to a database from within VS. In Database Explorer I click Connect To...
2
by: niks | last post by:
Is there a standard way of traversing the DOM to find all the javascript in a document? As far as I know, the only legal positions for javascript in the DOM is within a <scriptelement or in the...
6
by: Jack | last post by:
I have a WebRequest object that I use to log into a site and then post some XML. In doing this I set the KeepAlive = true so that it maintains the connection and does operates undo the initial...
3
by: azrael | last post by:
Hy guys. I'd like to ask you for a favour. I tried several times to implement the otsu threshold filter in python. but I failed every time. I found the soucre code i n Java from the ImageJ...
1
by: =?Utf-8?B?TWF5?= | last post by:
We are getting WebForm_DoPostBackWithOptions is not defined and other Java script errors related to WebResource.axd file. I narrowed down the problem and found out the cause but can't come up with...
4
by: David24 | last post by:
i am modifying a simple motion detector from the Aforge Library. It simply compares 2 frames and draws the difference as semitransparent red pixels. I need to store the (constantly changing) average...
2
by: oprah.chopra | last post by:
I have a large data file of upto 1 million x,y,z coordinates of points. I want to identify which points are within 0.01 mm from each other. I can compare the distance from each point to every other...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.