Nimit wrote:[color=blue]
> Hi NC. You've helped me with this problem a few months back if you
> remember. The post is at:
http://tinyurl.com/8d2ro. The query I am
> using is pretty much the one you suggested, where the knows table joins
> on itself once for each degree of separation we want to calculate.
>
> To fill everyone else in, what I have is a social network (like
> friendster) stored in a database. The database has a "knows" table that
> has a row for each relation between two people. If you go to the post
> I've mentioned above, you will probably get a better understanding of
> what exactly my problem is. I didn't mention it in my original post
> because I didn't want to make it too long or go off topic, but I guess
> it was necessary.
>
> I took your suggestion to try out Heap tables and it slashed my time a
> LOT. I can do the 3rd degree search in about 3.75 seconds now, verus
> about 17-18 seconds using the normal tables. That is awesome! I think
> that is the answer I was looking for.
>
> However, for that 3.75 seconds, my system totally freezes...like,
> winamp stops playing, and everything stops responding. This scares me a
> bit, because how many of these queries could this system handle at the
> same time? Also, my laptop is has pretty good specs, pentium M 1.7ghz,
> with 1GB of RAM, so I don't think it's the hardware. Any thoughts on
> this?[/color]
As suspected, the real problem lay elsewhere.
Here is an approach that should sinificantly reduce your load.
People are not going to be updating their contacts that frequently and
the amount of updating is (in the grand scheme of things) relatively
minor. Ie. once established, the table does not change quickly.
Therefore, have a separate table where for each pair of people you make
a one time computation of their degree of separation (or whatever is
important) and put it in as: Person1, Person2, separationInfo (degree,
otherDetails) indexed on Person1 and Person2, of course. otherDetails
could be a string: a minimal chain to get to the person or whatever
might be interesting to you. At 10^8 pairs, that is a lot of entries
so you can be happy that you live in an era where 100 Gig hardrives can
be had.
But the important point is that updating can be done in the background
with an appropriate message pump so other apps can get their slices of
time (By the way, I'm sure you've come across Dijkstra's algorithm for
these types of computations). At the same time, extraction of the info
you want is now fast because mySQL just has to return a contiguous
block of 10K entries (if you wanted them all), that the info for each
person is grouped in one block. The central idea here is that you've
cached all your possible responses.
But beware that the memory requirements for this solution grow
quadratically with the number of people.
Csaba Gabor from Vienna