Advice about fetching user information

Sandman

So, I have this content management system I've developed myself. The
system has a solid community part where members can register and then
participate in forums, write weblogs and a ton of other things.

So, in instances where I list, for example, the latest weblogs. I list
the headline of the weblog, the date and the name of the member who
wrote it.

Now, the name isn't just "Smith", but rather Smith's online status,
his nick and his "label", and can look like this:

<imageSmith M:34 (Newbie)

The image is either a green or red dot depicting his online status.
When I want to display this, I use this function:

print member_name(12);

With 12 being the id number of Smith, and the only information about
the writer of the weblog (i.e. I don't save his nick or label in the
weblog post, and certainly not his online status).

So, when using member_name() I look up the id 12 in a prefetched array
of members, displaying the correct information. In a compact format:

## index.php
require_once("member.php");
print member_name(12);

## member.php
$q=mysql_query("select id, name, label from members");
while ($r=mysql_fetch_assoc($q)){
$GLOBALS["members"][$r["id"]] = $r;
}

function member_name($id){
$m = $GLOBALS["members"][$id];
$online = in_array($id, $GLOBALS["surfers"]) ? "on" : "off";
return icon("online/$online") . " $m[name] $m[label]";
}

##

So, whenever requiring member.php, all members are prefetched and put
into $GLOBALS["members"], which has worked just fine. Then I have
infromation about any given member at my fingertips.

But now I have some sites which are beginning to get a pretty large
number of members, so prefetching them all becomes timeconsuming
(sometimes up to one second, which is unacceptably slow).

I can't fetch information about each single individual at the point of
showin his or hers information since there can be long lists of for
example weblogs and there might be 40-50 member names to be shown,
which would result in 40-50 seperate mysql requests, which would flood
the MySSQL server, I'm afraid.
So, my question is this; what other way can I do this in? Is there a
faster way to read thousands or even tens of thousands of information
points and put them in an array quickly?

Or is there a better way to read information about single members at
runtime?

Aggregation is most certainly an option. Can I write a tab separated
text file and read that one in member.txt - would it be faster? Should
I write one text file per user and read that file when needed?

Anyone got any experience in doing this? How did you solve it?

Thanks in advance. :)

--
Sandman[.net]

Nov 23 '06 #1

Subscribe Post Reply

2649

Jerry Stuckle

Sandman wrote:

So, I have this content management system I've developed myself. The
system has a solid community part where members can register and then
participate in forums, write weblogs and a ton of other things.

So, in instances where I list, for example, the latest weblogs. I list
the headline of the weblog, the date and the name of the member who
wrote it.

Now, the name isn't just "Smith", but rather Smith's online status,
his nick and his "label", and can look like this:

<imageSmith M:34 (Newbie)

The image is either a green or red dot depicting his online status.
When I want to display this, I use this function:

print member_name(12);

With 12 being the id number of Smith, and the only information about
the writer of the weblog (i.e. I don't save his nick or label in the
weblog post, and certainly not his online status).

So, when using member_name() I look up the id 12 in a prefetched array
of members, displaying the correct information. In a compact format:

## index.php
require_once("member.php");
print member_name(12);

## member.php
$q=mysql_query("select id, name, label from members");
while ($r=mysql_fetch_assoc($q)){
$GLOBALS["members"][$r["id"]] = $r;
}

function member_name($id){
$m = $GLOBALS["members"][$id];
$online = in_array($id, $GLOBALS["surfers"]) ? "on" : "off";
return icon("online/$online") . " $m[name] $m[label]";
}

##

So, whenever requiring member.php, all members are prefetched and put
into $GLOBALS["members"], which has worked just fine. Then I have
infromation about any given member at my fingertips.

But now I have some sites which are beginning to get a pretty large
number of members, so prefetching them all becomes timeconsuming
(sometimes up to one second, which is unacceptably slow).

I can't fetch information about each single individual at the point of
showin his or hers information since there can be long lists of for
example weblogs and there might be 40-50 member names to be shown,
which would result in 40-50 seperate mysql requests, which would flood
the MySSQL server, I'm afraid.
So, my question is this; what other way can I do this in? Is there a
faster way to read thousands or even tens of thousands of information
points and put them in an array quickly?

Or is there a better way to read information about single members at
runtime?

Aggregation is most certainly an option. Can I write a tab separated
text file and read that one in member.txt - would it be faster? Should
I write one text file per user and read that file when needed?

Anyone got any experience in doing this? How did you solve it?

Thanks in advance. :)

Looks like you need as SQL database.

You may have to change your logic a little - I don't know enough of the
detaisl. But it's the right way to go.

You can implement a small, lightly loaded site like this in flat files,
as you're doing. But there comes a time where that is no longer
feasible, and it's time to start using a database of some sort - that's
what they were made for. And a SQL RDB makes a lot of sense.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Nov 23 '06 #2

Sandman

In article <Ef******************************@comcast.com>,
Jerry Stuckle <js*******@attglobal.netwrote:

<snip>

## index.php
require_once("member.php");
print member_name(12);

## member.php
$q=mysql_query("select id, name, label from members");
while ($r=mysql_fetch_assoc($q)){
$GLOBALS["members"][$r["id"]] = $r;
}

function member_name($id){
$m = $GLOBALS["members"][$id];
$online = in_array($id, $GLOBALS["surfers"]) ? "on" : "off";
return icon("online/$online") . " $m[name] $m[label]";
}

##

<snip>

Looks like you need as SQL database.

You may have to change your logic a little - I don't know enough of the
detaisl. But it's the right way to go.

You can implement a small, lightly loaded site like this in flat files,
as you're doing. But there comes a time where that is no longer
feasible, and it's time to start using a database of some sort - that's
what they were made for. And a SQL RDB makes a lot of sense.

Eh? I *AM* using MySQL. As shown in my code above. Using MySQL for
this takes too long.
--
Sandman[.net]

Nov 23 '06 #3

Erwin Moller

Sandman wrote:

So, I have this content management system I've developed myself. The
system has a solid community part where members can register and then
participate in forums, write weblogs and a ton of other things.

So, in instances where I list, for example, the latest weblogs. I list
the headline of the weblog, the date and the name of the member who
wrote it.

Now, the name isn't just "Smith", but rather Smith's online status,
his nick and his "label", and can look like this:

<imageSmith M:34 (Newbie)

The image is either a green or red dot depicting his online status.
When I want to display this, I use this function:

print member_name(12);

With 12 being the id number of Smith, and the only information about
the writer of the weblog (i.e. I don't save his nick or label in the
weblog post, and certainly not his online status).

So, when using member_name() I look up the id 12 in a prefetched array
of members, displaying the correct information. In a compact format:

## index.php
require_once("member.php");
print member_name(12);

## member.php
$q=mysql_query("select id, name, label from members");
while ($r=mysql_fetch_assoc($q)){
$GLOBALS["members"][$r["id"]] = $r;
}

function member_name($id){
$m = $GLOBALS["members"][$id];
$online = in_array($id, $GLOBALS["surfers"]) ? "on" : "off";
return icon("online/$online") . " $m[name] $m[label]";
}

##

So, whenever requiring member.php, all members are prefetched and put
into $GLOBALS["members"], which has worked just fine. Then I have
infromation about any given member at my fingertips.

But now I have some sites which are beginning to get a pretty large
number of members, so prefetching them all becomes timeconsuming
(sometimes up to one second, which is unacceptably slow).

I can't fetch information about each single individual at the point of
showin his or hers information since there can be long lists of for
example weblogs and there might be 40-50 member names to be shown,
which would result in 40-50 seperate mysql requests, which would flood
the MySSQL server, I'm afraid.
So, my question is this; what other way can I do this in? Is there a
faster way to read thousands or even tens of thousands of information
points and put them in an array quickly?

Or is there a better way to read information about single members at
runtime?

Aggregation is most certainly an option. Can I write a tab separated
text file and read that one in member.txt - would it be faster? Should
I write one text file per user and read that file when needed?

Anyone got any experience in doing this? How did you solve it?

Thanks in advance. :)

Hi,
First: Prefetching all members into memory for each page where you need one
or two members is absolutely a no-go. You must change that of course.
Just fetch the information you need WHEN you need it.

If I understand correctly your main problem lies in the fact you must fetch
the online/offline information for each possible user.
Correct?

This is how I solved this problem when I was in a similar situation:
(a rough quick overview to show the idea)

1) Create a table like
CREATE TABLE tblOnlineUsers(
userid integer REFERENCES tbluser.userid,
lastActive timestamp
)

2) Whenever a logged-in user does something, update the timestamp in this
table for that userid.

3) When you are on a page that needs to know the status of online users,
just load it into a array with a query like:
SELECT userid FROM tblOnlineUsers WHERE (
lastActive XXX
)

Where XXX is (now - your sessiontimeout).

Now you need some logic to delete the userids for stale users.
(Users that didn't log out, but just closed their browser or walked away to
get a nice good lunch of three hours)
eg
When a random number between 0 and 1 is smaller than 0.1 you delete all
records older than the sessiontimeout.

You also need to make sure the userid in that table is unique.

But this solution has the advantage that:
- You don't load all users into memory for each scriptinvocation
- Your tblOnlineUsers can be searched very quickly.

Hope that helps.

Regards,
Erwin Moller

Nov 23 '06 #4

Sandman

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

First: Prefetching all members into memory for each page where you need one
or two members is absolutely a no-go. You must change that of course.
Just fetch the information you need WHEN you need it.

Yes, but if I need to fetch information for 100 members on one page
making 100 individual SQL queries is also a no-go.

If I understand correctly your main problem lies in the fact you must fetch
the online/offline information for each possible user.
Correct?

No, there are lots of other information also being fetched. The
online/offline status is kept in a seperate db and isn't a problem at
all actually. Nicknames, labels, ages, full name and such things are
what is needed. All of that is in the member DB

This is how I solved this problem when I was in a similar situation:
(a rough quick overview to show the idea)

1) Create a table like
CREATE TABLE tblOnlineUsers(
userid integer REFERENCES tbluser.userid,
lastActive timestamp
)

2) Whenever a logged-in user does something, update the timestamp in this
table for that userid.

That's exactly what I'm doing.

3) When you are on a page that needs to know the status of online users,
just load it into a array with a query like:
SELECT userid FROM tblOnlineUsers WHERE (
lastActive XXX
)

Where XXX is (now - your sessiontimeout).

Now you need some logic to delete the userids for stale users.

I do that in my five minute aggregation script (that aggregates lots
of stuff for the site). There I delete from member.surfers where
datetime < date_sub(now(), interval 10 minute)

(Users that didn't log out, but just closed their browser or walked away to
get a nice good lunch of three hours)
eg
When a random number between 0 and 1 is smaller than 0.1 you delete all
records older than the sessiontimeout.

You also need to make sure the userid in that table is unique.

But this solution has the advantage that:
- You don't load all users into memory for each scriptinvocation
- Your tblOnlineUsers can be searched very quickly.

Yeah, I'm already doing that. I still need some way to get information
about arbitrary user X whenever I need it. One way would be to just
print "[[MEMBERINFO:123]]" and then save "123" in an array, fetch
information about all id's in that array and then preg_replace() them
all in the output buffer. But that's not very intelligent either,
since a SQL query with "select x, x, x from member where id in(<list
of 100's of ids>)" isn't very good.

So, basically, I need a function to get information about user X from
a member count of tens of thousands while not being time consuming if
I do it one time on a page or if I do it 100 times on a page.

--
Sandman[.net]

Nov 23 '06 #5

Jerry Stuckle

Sandman wrote:

In article <Ef******************************@comcast.com>,
Jerry Stuckle <js*******@attglobal.netwrote:

>><snip>

>>>## index.php
require_once("member.php");
print member_name(12);

## member.php
$q=mysql_query("select id, name, label from members");
while ($r=mysql_fetch_assoc($q)){
$GLOBALS["members"][$r["id"]] = $r;
}

function member_name($id){
$m = $GLOBALS["members"][$id];
$online = in_array($id, $GLOBALS["surfers"]) ? "on" : "off";
return icon("online/$online") . " $m[name] $m[label]";
}

##

<snip>

Looks like you need as SQL database.

You may have to change your logic a little - I don't know enough of the
detaisl. But it's the right way to go.

You can implement a small, lightly loaded site like this in flat files,
as you're doing. But there comes a time where that is no longer
feasible, and it's time to start using a database of some sort - that's
what they were made for. And a SQL RDB makes a lot of sense.

Eh? I *AM* using MySQL. As shown in my code above. Using MySQL for
this takes too long.

OK, I thought you were saying the information you're after was not on
the database. My misunderstand.

What's "too slow"? 1 second? 10 seconds? 100 seconds? How many rows
are you talking about?

Sure, you can implement this in flat files. But it's going to take
longer to search your flat files in PHP than MySQL does in compiled
code. The same is pretty much true for searching an array in PHP -
small arrays are faster, but as the array grows, the difference becomes
less.

You'd be much better off optimizing your database (and the database
design, if necessary), and potentially your queries. I can't see why it
should take all those queries to get the info you want.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Nov 23 '06 #6

Jerry Stuckle

Sandman wrote:

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

>>First: Prefetching all members into memory for each page where you need one
or two members is absolutely a no-go. You must change that of course.
Just fetch the information you need WHEN you need it.

Yes, but if I need to fetch information for 100 members on one page
making 100 individual SQL queries is also a no-go.

This is the part I don't understand. I don't make 100 individual SQL
queries.

>
>>If I understand correctly your main problem lies in the fact you must fetch
the online/offline information for each possible user.
Correct?

No, there are lots of other information also being fetched. The
online/offline status is kept in a seperate db and isn't a problem at
all actually. Nicknames, labels, ages, full name and such things are
what is needed. All of that is in the member DB

OK, that's the problem. They should be in the same DB, but perhaps in
different tables. Then a JOIN should get you everything you want in one
query.

>
>>This is how I solved this problem when I was in a similar situation:
(a rough quick overview to show the idea)

1) Create a table like
CREATE TABLE tblOnlineUsers(
userid integer REFERENCES tbluser.userid,
lastActive timestamp
)

2) Whenever a logged-in user does something, update the timestamp in this
table for that userid.

That's exactly what I'm doing.

>>3) When you are on a page that needs to know the status of online users,
just load it into a array with a query like:
SELECT userid FROM tblOnlineUsers WHERE (
lastActive XXX
)

Where XXX is (now - your sessiontimeout).

Now you need some logic to delete the userids for stale users.

I do that in my five minute aggregation script (that aggregates lots
of stuff for the site). There I delete from member.surfers where
datetime < date_sub(now(), interval 10 minute)

>>(Users that didn't log out, but just closed their browser or walked away to
get a nice good lunch of three hours)
eg
When a random number between 0 and 1 is smaller than 0.1 you delete all
records older than the sessiontimeout.

You also need to make sure the userid in that table is unique.

But this solution has the advantage that:
- You don't load all users into memory for each scriptinvocation
- Your tblOnlineUsers can be searched very quickly.

Yeah, I'm already doing that. I still need some way to get information
about arbitrary user X whenever I need it. One way would be to just
print "[[MEMBERINFO:123]]" and then save "123" in an array, fetch
information about all id's in that array and then preg_replace() them
all in the output buffer. But that's not very intelligent either,
since a SQL query with "select x, x, x from member where id in(<list
of 100's of ids>)" isn't very good.

So, basically, I need a function to get information about user X from
a member count of tens of thousands while not being time consuming if
I do it one time on a page or if I do it 100 times on a page.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Nov 23 '06 #7

Sandman

In article <45**************@attglobal.net>,
Jerry Stuckle <js*******@attglobal.netwrote:

>First: Prefetching all members into memory for each page where you need one
or two members is absolutely a no-go. You must change that of course.
Just fetch the information you need WHEN you need it.
Yes, but if I need to fetch information for 100 members on one page
making 100 individual SQL queries is also a no-go.

This is the part I don't understand. I don't make 100 individual SQL
queries.

I didn't say you did. For my problem there seems to be two obvious
solutions:

1. Fetch all information for every page (today this takes 0.8 seconds
generally, which is far too long).

2. Fetch information once per instance it is needed
1 is still the best way, performance-wise, to solve a page with 100
instances of members, while 2 works best when there is maybe 1-5
instances.

>If I understand correctly your main problem lies in the fact you must fetch
the online/offline information for each possible user.
Correct?
No, there are lots of other information also being fetched. The
online/offline status is kept in a seperate db and isn't a problem at
all actually. Nicknames, labels, ages, full name and such things are
what is needed. All of that is in the member DB

OK, that's the problem. They should be in the same DB, but perhaps in
different tables. Then a JOIN should get you everything you want in one
query.

(they are in the same db but in another table, I misspoke)
But that query would still have to fetch all thousands of members at
one go.

Again - online/offline status is *not* the reason I need this. I have
to be able to display any part of the static information about any
arbitrary member at any given time in my pages.

>But this solution has the advantage that:
- You don't load all users into memory for each scriptinvocation
- Your tblOnlineUsers can be searched very quickly.

Yeah, I'm already doing that. I still need some way to get information
about arbitrary user X whenever I need it. One way would be to just
print "[[MEMBERINFO:123]]" and then save "123" in an array, fetch
information about all id's in that array and then preg_replace() them
all in the output buffer. But that's not very intelligent either,
since a SQL query with "select x, x, x from member where id in(<list
of 100's of ids>)" isn't very good.

So, basically, I need a function to get information about user X from
a member count of tens of thousands while not being time consuming if
I do it one time on a page or if I do it 100 times on a page.

--
Sandman[.net]

Nov 23 '06 #8

Sandman

In article <qa******************************@comcast.com>,
Jerry Stuckle <js*******@attglobal.netwrote:

Eh? I *AM* using MySQL. As shown in my code above. Using MySQL for
this takes too long.

OK, I thought you were saying the information you're after was not on
the database. My misunderstand.

No problem.

What's "too slow"? 1 second? 10 seconds? 100 seconds? How many rows
are you talking about?

Everything over 0.1 seconds is way too slow. Today the tables contains
about 9000 members, and the time it takes to fetch information about
them takes around 0.8 seconds, but actually *selecting* them in MySQL
only takes about 0.16 seconds, the rest of the time, I'm assuming, is
moving that data from MySQL to a PHP array.

I assure you that I am quite conscious about time consumptions in my
PHP scripts and use a gauge() function to time pretty much every
important part of producing the HTML for a web page. Each millisecond
counts and I have a performance warning system that activates when a
page takes longer than 2 seconds to create.

All my MySQL tables are quite optimized along the usual optimization
tricks with regards to indexing and direct/matching queries.

Sure, you can implement this in flat files. But it's going to take
longer to search your flat files in PHP than MySQL does in compiled
code. The same is pretty much true for searching an array in PHP -
small arrays are faster, but as the array grows, the difference becomes
less.

But I wouldn't be searching. The flatfile would be named
"/path/to/aggregate/members/2837.txt" or something like that.

The data in that could even be serialized for easy management, so
reading it would be like:

function member_name($id){
$member = unserialize(join("", file("/path/to//2837.txt")));
return $member["name"];
}

Problem with this is that this information needs to be aggregated once
in a while...

Plus, what I'd like to know is if:

$member = unserialize(join("", file("/path/to//2837.txt")));

Is faster than:

$q=mysql_query("select x, x, x from member where id = 2837");
$member = mysql_fetch_array($q);

I guess I have to test it. I was hoping that someone here might have
known or knew of a third option for doing the same thing.

You'd be much better off optimizing your database (and the database
design, if necessary), and potentially your queries. I can't see why it
should take all those queries to get the info you want.

It's just one query... And that query takes, today, 0.8 seconds, which
is too long. And when the member count grows, it will take longer and
longer.

--
Sandman[.net]

Nov 23 '06 #9

Michael Fesser

..oO(Sandman)

>In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

>First: Prefetching all members into memory for each page where you need one
or two members is absolutely a no-go. You must change that of course.
Just fetch the information you need WHEN you need it.

Yes, but if I need to fetch information for 100 members on one page
making 100 individual SQL queries is also a no-go.

If you have to show informations about 100 members on a page then your
script should already know which members that are. If you want to show
some blog articles then you know who wrote them, so you can fetch all
the member informations with a single query, maybe even together with
the blog articles.

>[...] But that's not very intelligent either,
since a SQL query with "select x, x, x from member where id in(<list
of 100's of ids>)" isn't very good.

What's wrong with that?

>So, basically, I need a function to get information about user X from
a member count of tens of thousands while not being time consuming if
I do it one time on a page or if I do it 100 times on a page.

Is your database indexed properly? A database with some ten thousand
entries is peanuts. It shouldn't take much time to fetch a hundred
records from it.

Micha

Nov 23 '06 #10

Erwin Moller

Sandman wrote:

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

>First: Prefetching all members into memory for each page where you need
one or two members is absolutely a no-go. You must change that of course.
Just fetch the information you need WHEN you need it.

Yes, but if I need to fetch information for 100 members on one page
making 100 individual SQL queries is also a no-go.

Hi Sandman,

As Jerry and Micha indicated: Do not fetch all members into PHP memory each
page, just fetch the ones you need.
I made a few community sites that had excactly the same problems as you are
facing now, and good design solves everything.
(Luckily I designed it right from the start, in your case you must change
existing code which is always a lot more trouble.)

A few pointers that might speed things up:
1) At the start of your page decide which articles or user you need on that
page.
2) Make a query that only fetches that information.
3) Do NOT store the array (or resultset, or whatever approach you have to
represent data from the database) in a session.
Reason: I have the impression this datastructure is huge. If you are using
filebased sessionstorage (which is default), you are asking PHP to
serialize that data, safe it in a file, and unpack it next invocation of
any script that uses sessions.
This is serious overhead and could easily be a significant part of your 0.8
seconds.
4) Design your database smart. I mean by this: Design your database in such
a way that the queries you typically need run fast.
Did you normalize your database? (In case you don't know what that is: One
important thing about normalization is the fact that you represent each
piece of data only in one place(=table/column) and make the relations by
means of primary and foreign keys.)

The more joins, the slower the queries.
SO it might help to duplicate some data (denormalization) in some tables.
This can help a lot, but use it cautiously, since a good designed database
stay as close to 100% nomalized as possible.

5) Indexing a few columns might also give a huge speedadvantage.
If you are familiar with profilers, use one. MySQL, Postgres, all databases
have some tools that tell you WHAT they are doing and how much time it
takes.
If you find out that a certain column/tablescan takes up 50% of the time or
so of the total querytime, then you know that is a firstclass candidate for
indexing.

6) Prepared statements can also give some speedincrease.

7) Prepared data in the database (Last resort)
Do not use this method unless the others do not help enough.
I don't like this method, but was forced to use it myself in a few
occasions.
I'll describe a case I had and how I solved it.
It is a huge discussionplatform, and all participants had the possibility to
vote on articles of others. The weight their vote carried varried on a lot
of factors and the logic was also totally recursive, meaning that if one
visitor did one thing (eg read an article, or make a vote), all others had
to be updated.
So realtime calculation of the whole system was no option for me.
I decided to create a table that contained the calculated data, use that
data whenever I need it, and run an update every hour or so.

This method is not considered 'good programming' (at least by me), but gets
the job done, and in some situation you have no choice.
If all other approaches fail, you might do the same: create a table that
contains all data assembled excactly in the way you need it. NOT A VIEW! A
real table.

Hope that helps.

Regards,
Erwin Moller

>
>If I understand correctly your main problem lies in the fact you must
fetch the online/offline information for each possible user.
Correct?

No, there are lots of other information also being fetched. The
online/offline status is kept in a seperate db and isn't a problem at
all actually. Nicknames, labels, ages, full name and such things are
what is needed. All of that is in the member DB

>This is how I solved this problem when I was in a similar situation:
(a rough quick overview to show the idea)

1) Create a table like
CREATE TABLE tblOnlineUsers(
userid integer REFERENCES tbluser.userid,
lastActive timestamp
)

2) Whenever a logged-in user does something, update the timestamp in this
table for that userid.

That's exactly what I'm doing.

>3) When you are on a page that needs to know the status of online users,
just load it into a array with a query like:
SELECT userid FROM tblOnlineUsers WHERE (
lastActive XXX
)

Where XXX is (now - your sessiontimeout).

Now you need some logic to delete the userids for stale users.

I do that in my five minute aggregation script (that aggregates lots
of stuff for the site). There I delete from member.surfers where
datetime < date_sub(now(), interval 10 minute)

>(Users that didn't log out, but just closed their browser or walked away
to get a nice good lunch of three hours)
eg
When a random number between 0 and 1 is smaller than 0.1 you delete all
records older than the sessiontimeout.

You also need to make sure the userid in that table is unique.

But this solution has the advantage that:
- You don't load all users into memory for each scriptinvocation
- Your tblOnlineUsers can be searched very quickly.

Yeah, I'm already doing that. I still need some way to get information
about arbitrary user X whenever I need it. One way would be to just
print "[[MEMBERINFO:123]]" and then save "123" in an array, fetch
information about all id's in that array and then preg_replace() them
all in the output buffer. But that's not very intelligent either,
since a SQL query with "select x, x, x from member where id in(<list
of 100's of ids>)" isn't very good.

So, basically, I need a function to get information about user X from
a member count of tens of thousands while not being time consuming if
I do it one time on a page or if I do it 100 times on a page.

Nov 24 '06 #11

Sandman

In article <ra********************************@4ax.com>,
Michael Fesser <ne*****@gmx.dewrote:

>>First: Prefetching all members into memory for each page where you
need

one or two members is absolutely a no-go. You must change that of
course. Just fetch the information you need WHEN you need it.

Yes, but if I need to fetch information for 100 members on one page
making 100 individual SQL queries is also a no-go.

If you have to show informations about 100 members on a page then your
script should already know which members that are. If you want to show
some blog articles then you know who wrote them, so you can fetch all
the member informations with a single query, maybe even together with
the blog articles.

No, not really. For instance, forum posts can be made by a member or a
visitor (depending if the forum is open or not). So making a joined
select from the forum DB and the member DB wouldn't select posts that
aren't associated with a member.

Plus, just tagging DB posts with the member ID associated with it is
just very easy. It's when I need to show who belongs to that ID that
problems arise.

[...] But that's not very intelligent either,
since a SQL query with "select x, x, x from member where id in(<list
of 100's of ids>)" isn't very good.

What's wrong with that?

It's a slow select. Try it.

So, basically, I need a function to get information about user X from
a member count of tens of thousands while not being time consuming if
I do it one time on a page or if I do it 100 times on a page.

Is your database indexed properly?

Yep

A database with some ten thousand entries is peanuts. It shouldn't
take much time to fetch a hundred records from it.

It doesn't. Maybe you haven't been reading from the start...

How I do it today:

1. Prefetch information about ALL members (9000 entries)
+ all information always available
- Very time consuming. Somehwere around 0.8 seconds

2. When something is associated by member id, use a function
that references the array created in 1. Done.

Now, 1 is very time consuming and fetches lots of information that
isn't needed on every page, which is dumb. But when 2 arrives, how do
I fetch the information about arbitrary member id's at any given time
without spending 0.8 seconds prefetching them all? it should be
efficient when I want information about one member per page or 300
members per page.

--
Sandman[.net]

Nov 24 '06 #12

Sandman

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

First: Prefetching all members into memory for each page where you need
one or two members is absolutely a no-go. You must change that of course.
Just fetch the information you need WHEN you need it.
Yes, but if I need to fetch information for 100 members on one page
making 100 individual SQL queries is also a no-go.

Hi Sandman,

As Jerry and Micha indicated: Do not fetch all members into PHP memory each
page, just fetch the ones you need.

Right, but how?

I made a few community sites that had excactly the same problems as you are
facing now, and good design solves everything.

As usual :)

(Luckily I designed it right from the start, in your case you must change
existing code which is always a lot more trouble.)

A few pointers that might speed things up:
1) At the start of your page decide which articles or user you need on that
page.

That's impossible to know beforehand. Articles listed may or may not
contain references to members. Any given page might contain references
to member ids from 30 different sources in 30 different tables. I
can't know, at the beginning of any page, exactly what member
information will be displayed on that page.

2) Make a query that only fetches that information.

Which is good if you're looking at fetching member information about
1-50 members, but if you've got member information about 500 members
displayed on one page, that select would be dog slow.

3) Do NOT store the array (or resultset, or whatever approach you have to
represent data from the database) in a session.

As in $_SESSION? No, no way. :) I don't even use sessions. :)

4) Design your database smart. I mean by this: Design your database in such
a way that the queries you typically need run fast.

They do, they run very fast. The query runs in 0.16 seconds, the
action takes 0.8 seconds (and I'm assuming the extra time is moving
the resultset to the array).

Did you normalize your database? (In case you don't know what that is: One
important thing about normalization is the fact that you represent each
piece of data only in one place(=table/column) and make the relations by
means of primary and foreign keys.)

Yes.

The more joins, the slower the queries.

No joins in this select. It's a very simple select and looks pretty
much like this:

select field, field, field from member.member where status = 'ok' and
origin = 'LVL';

"LVL" being the code for the particular site. "origin" is indexed. The
above query, when fetching 9000 posts, takes 0.16 seconds. That I can
live with it. It's moving that information to a PHP array that seems
to take too long.

SO it might help to duplicate some data (denormalization) in some tables.
This can help a lot, but use it cautiously, since a good designed database
stay as close to 100% nomalized as possible.

Yes, the data I want is all in seperate "field, field, field".

5) Indexing a few columns might also give a huge speedadvantage.

In the query above, the only column that needs to be indexed is status
and origin, and both are.

If you are familiar with profilers, use one. MySQL, Postgres, all databases
have some tools that tell you WHAT they are doing and how much time it
takes.

Yeah, I know exactly how much time they use for these queries. And
it's too long :-D

If you find out that a certain column/tablescan takes up 50% of the time or
so of the total querytime, then you know that is a firstclass candidate for
indexing.

Exactly. These are all good advices, but I'm afraid that my main
problem is the time it takes to create the PHP array.

6) Prepared statements can also give some speedincrease.

Yeah, I've never actually used them, and wouldn't you agree that that
wouldn't help me?

7) Prepared data in the database (Last resort)
Do not use this method unless the others do not help enough.
I don't like this method, but was forced to use it myself in a few
occasions.
I'll describe a case I had and how I solved it.
It is a huge discussionplatform, and all participants had the possibility to
vote on articles of others. The weight their vote carried varried on a lot
of factors and the logic was also totally recursive, meaning that if one
visitor did one thing (eg read an article, or make a vote), all others had
to be updated.
So realtime calculation of the whole system was no option for me.
I decided to create a table that contained the calculated data, use that
data whenever I need it, and run an update every hour or so.

Yes, I do aggregation a lot in five minute intervals (such as labels,
scores and such). But that's not a part of fetching this information.

This method is not considered 'good programming' (at least by me), but gets
the job done, and in some situation you have no choice.

Indeed. I have a "What's new?" function that lists new content on the
site, from several tables (articles, calendar, project rooms, forum,
and so on). That data hgas to be aggregated into a seperate table for
speed.
Thanks for the post, very good advice, but I'm afraid that it won't
help much for me.
--
Sandman[.net]

Nov 24 '06 #13

Erwin Moller

Sandman wrote:

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

>First: Prefetching all members into memory for each page where you
need one or two members is absolutely a no-go. You must change that of
course. Just fetch the information you need WHEN you need it.

Yes, but if I need to fetch information for 100 members on one page
making 100 individual SQL queries is also a no-go.

Hi Sandman,

As Jerry and Micha indicated: Do not fetch all members into PHP memory
each page, just fetch the ones you need.

Right, but how?

Well, it is your platform, how can I tell? ;-)
Can you not find out beforehand what you need?
eg: Who wrote in this topic?
+ to who do they refer?
+ anything else that show up on the page.

I am quite sure that you can tell that beforehand, simply because you do
produce the output eventually.
But it is possible that doing this takes up more time. :-/
Hard to tell from this distance.

<snip>

>
That's impossible to know beforehand. Articles listed may or may not
contain references to members. Any given page might contain references
to member ids from 30 different sources in 30 different tables. I
can't know, at the beginning of any page, exactly what member
information will be displayed on that page.

See my former comment.

I stress this point because you claim that the query itself takes only 0.16
secs.
That means that most time is 'lost' by PHP importing this resultset.
If you can drill down the results to 1% by first finding out WHAT you
need......

>
>3) Do NOT store the array (or resultset, or whatever approach you have to
represent data from the database) in a session.

As in $_SESSION? No, no way. :) I don't even use sessions. :)

Ok good, I thought I better mention it. :-)

>
>4) Design your database smart. I mean by this: Design your database in
such a way that the queries you typically need run fast.

They do, they run very fast. The query runs in 0.16 seconds, the
action takes 0.8 seconds (and I'm assuming the extra time is moving
the resultset to the array).

Is the 0.8 secs the time needed to produce all the HTML also, or is the 0.8
secs the time needed to read the resultset into PHP's memory?

>
>Did you normalize your database? (In case you don't know what that is:
One important thing about normalization is the fact that you represent
each piece of data only in one place(=table/column) and make the
relations by means of primary and foreign keys.)

Yes.

>The more joins, the slower the queries.

No joins in this select. It's a very simple select and looks pretty
much like this:

select field, field, field from member.member where status = 'ok' and
origin = 'LVL';

"LVL" being the code for the particular site. "origin" is indexed. The
above query, when fetching 9000 posts, takes 0.16 seconds. That I can
live with it. It's moving that information to a PHP array that seems
to take too long.

>SO it might help to duplicate some data (denormalization) in some tables.
This can help a lot, but use it cautiously, since a good designed
database stay as close to 100% nomalized as possible.

Yes, the data I want is all in seperate "field, field, field".

Clear.
Just to be sure, you should of course NOT running seperate queries for each
result you need.
eg BAD:
SELECT username FROM tbluser WHERE (userid=23);
SELECT username FROM tbluser WHERE (userid=36);
SELECT username FROM tbluser WHERE (userid=123);

eg GOOD:
SELECT userid, username FROM tbluser WHERE
(userid IN (23,36,123));

Right now you are fetching them all, so in that case this is irrelevant, but
if you can switch your code in such a way you do know beforehand what you
need, it is important.

Yeah, I've never actually used them, and wouldn't you agree that that
wouldn't help me?

Correct, it would only reduce the 0.16 secs you talked about. Not the (0.8 -
0.16) secs.

>
>7) Prepared data in the database (Last resort)
Do not use this method unless the others do not help enough.
I don't like this method, but was forced to use it myself in a few
occasions.
I'll describe a case I had and how I solved it.
It is a huge discussionplatform, and all participants had the possibility
to vote on articles of others. The weight their vote carried varried on a
lot of factors and the logic was also totally recursive, meaning that if
one visitor did one thing (eg read an article, or make a vote), all
others had to be updated.
So realtime calculation of the whole system was no option for me.
I decided to create a table that contained the calculated data, use that
data whenever I need it, and run an update every hour or so.

Yes, I do aggregation a lot in five minute intervals (such as labels,
scores and such). But that's not a part of fetching this information.

>This method is not considered 'good programming' (at least by me), but
gets the job done, and in some situation you have no choice.

Indeed. I have a "What's new?" function that lists new content on the
site, from several tables (articles, calendar, project rooms, forum,
and so on). That data hgas to be aggregated into a seperate table for
speed.
Thanks for the post, very good advice, but I'm afraid that it won't
help much for me.

Well, I try. :-)

Regards,
Erwin Moller

Nov 24 '06 #14

Erwin Moller

Hi Sandman,

One last idea: stored procedures.
I don't know if mySQL has support for them, but they can speed things up
hugely. (I am a Postgres-guy)

You can move the logic from PHP to the stored procedure.
The SP will fetch the needed info from the tables based on what it found in
earlier queries (in the same SP).

I don't like SP because they hide the logic away from the language to the
database, are hard to debug, are hard to update, but they ARE fast.

The advantage is also that you do not have to place/parse/check the
recordset from MySQL memoryspace to PHP memoryspace.

The SP will simply deliver what PHP needs to know.

I once made a SP excactly for this reason, it delivered ready-to-go HTML and
removed all the logic from the language (Java in this case) to Postgres,
but then 3 years later I had to update it.... Really hard.
I avoid SP as hell since then. Maybe people with more experience than I have
with SP can do better jobs.

But I am quite sure SP CAN solve your speedproblem.

Regards,
Erwin Moller

Nov 24 '06 #15

Sandman

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

As Jerry and Micha indicated: Do not fetch all members into PHP memory
each page, just fetch the ones you need.
Right, but how?

Well, it is your platform, how can I tell? ;-)
Can you not find out beforehand what you need?
eg: Who wrote in this topic?
+ to who do they refer?
+ anything else that show up on the page.

I am quite sure that you can tell that beforehand, simply because you do
produce the output eventually.
But it is possible that doing this takes up more time. :-/
Hard to tell from this distance.

Yeah, that's the problem... Any given URL may consist of 20-30 PHP
scripts running, producing different form of output.

Just listing the last forum entries may be a simple enough task, but I
have different function from different files to prit images, member
info and such.

What I'm trying to say is that it's hard to say, when a new page
"starts" to know what member info will be presented when it has
finnished.

That's impossible to know beforehand. Articles listed may or may not
contain references to members. Any given page might contain references
to member ids from 30 different sources in 30 different tables. I
can't know, at the beginning of any page, exactly what member
information will be displayed on that page.

See my former comment.

I stress this point because you claim that the query itself takes only 0.16
secs.
That means that most time is 'lost' by PHP importing this resultset.
If you can drill down the results to 1% by first finding out WHAT you
need......

I realize the benefit of that, but it can't be done *beforehand*.

What *can* be done is doing it *after*. I.e. when presented with a
member id, the member_name() function only outputs a placeholder for
the information, like so:

Post from [[MEMBERINFO::1234]] on 2006-11-23:

And then, when processing the output buffer, I replace all these with
the proper information, and only fetch what member_name() has saved in
$GLOBALS["displayed_members"] or something like that.

Problem with THAT is that the page may contain one hundred, or maybe
five hundred such placeholders, and selecting something with "and id
in(<list of 500 id numbers>)" takes a lot of time (a lot more than
0.16 seconds). And then preg_replacing() them after selecting them
also takes some time.

I also have some pages that reference the prefetched array directly,
but those have to be edited either way, since the prefectched array
shouldn't exist.

4) Design your database smart. I mean by this: Design your database in
such a way that the queries you typically need run fast.
They do, they run very fast. The query runs in 0.16 seconds, the
action takes 0.8 seconds (and I'm assuming the extra time is moving
the resultset to the array).

Is the 0.8 secs the time needed to produce all the HTML also, or is the 0.8
secs the time needed to read the resultset into PHP's memory?

The 0.8 seconds are for the mysql_query() and mysql_fetch_array().
I.e. the query below

The more joins, the slower the queries.
No joins in this select. It's a very simple select and looks pretty
much like this:

select field, field, field from member.member where status = 'ok' and
origin = 'LVL';

"LVL" being the code for the particular site. "origin" is indexed. The
above query, when fetching 9000 posts, takes 0.16 seconds. That I can
live with it. It's moving that information to a PHP array that seems
to take too long.

SO it might help to duplicate some data (denormalization) in some tables.
This can help a lot, but use it cautiously, since a good designed
database stay as close to 100% nomalized as possible.
Yes, the data I want is all in seperate "field, field, field".

Clear.
Just to be sure, you should of course NOT running seperate queries for each
result you need.
eg BAD:
SELECT username FROM tbluser WHERE (userid=23);
SELECT username FROM tbluser WHERE (userid=36);
SELECT username FROM tbluser WHERE (userid=123);

No, that's stupid.

eg GOOD:
SELECT userid, username FROM tbluser WHERE
(userid IN (23,36,123));

But BAD:
SELECT userid, username FROM tbluser WHERE
(userid IN (<500 id numbers>));

Yeah, I've never actually used them, and wouldn't you agree that that
wouldn't help me?

Correct, it would only reduce the 0.16 secs you talked about. Not the (0.8 -
0.16) secs.

Exactly. And 0.16 is ok, 0.8 is not.
--
Sandman[.net]

Nov 24 '06 #16

Sandman

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

Hi Sandman,

One last idea: stored procedures.
I don't know if mySQL has support for them, but they can speed things up
hugely. (I am a Postgres-guy)

You can move the logic from PHP to the stored procedure.
The SP will fetch the needed info from the tables based on what it found in
earlier queries (in the same SP).

I don't like SP because they hide the logic away from the language to the
database, are hard to debug, are hard to update, but they ARE fast.

The advantage is also that you do not have to place/parse/check the
recordset from MySQL memoryspace to PHP memoryspace.

The SP will simply deliver what PHP needs to know.

Ok, you lost me. How does it do that? How can a SP store information
about arbitrary member id and give it to PHP at will?

I once made a SP excactly for this reason, it delivered ready-to-go HTML and
removed all the logic from the language (Java in this case) to Postgres,
but then 3 years later I had to update it.... Really hard.
I avoid SP as hell since then. Maybe people with more experience than I have
with SP can do better jobs.

But I am quite sure SP CAN solve your speedproblem.

Ok, I'd be happy to hear just how it works before nosediving into some
docs about it, to see if we understand each other correctly.
--
Sandman[.net]

Nov 24 '06 #17

Erwin Moller

Sandman wrote:

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

>Hi Sandman,

One last idea: stored procedures.
I don't know if mySQL has support for them, but they can speed things up
hugely. (I am a Postgres-guy)

You can move the logic from PHP to the stored procedure.
The SP will fetch the needed info from the tables based on what it found
in earlier queries (in the same SP).

I don't like SP because they hide the logic away from the language to the
database, are hard to debug, are hard to update, but they ARE fast.

The advantage is also that you do not have to place/parse/check the
recordset from MySQL memoryspace to PHP memoryspace.

The SP will simply deliver what PHP needs to know.

Ok, you lost me. How does it do that? How can a SP store information
about arbitrary member id and give it to PHP at will?

>I once made a SP excactly for this reason, it delivered ready-to-go HTML
and removed all the logic from the language (Java in this case) to
Postgres, but then 3 years later I had to update it.... Really hard.
I avoid SP as hell since then. Maybe people with more experience than I
have with SP can do better jobs.

But I am quite sure SP CAN solve your speedproblem.

Ok, I'd be happy to hear just how it works before nosediving into some
docs about it, to see if we understand each other correctly.

Hey Sandman,

A Stored procedure is a piece of code that is run by the database itself (in
the same memoryspace as the database too).
You can use decisions (if/then/else), loop over resultsets from former
queries, etc.
So it is a programming language on its own INSIDE the database.
And fast as hell since no overhead is needed (eg pumping resultsets through
a pipe from the database process to php process).

I am not sure if that will help in your current situation, but this is the
idea:
- You write a stored procedure that will actually fetch all needed tuples
based on the same information your PHP receives (eg, userid, topicid, etc).
- This SP will do the same things your script would do (except of course
produce HTML) and you remember the needed info inside the SP.
- When the SP is finished it returns the resultset, which you use to load
above your script the same way as you do now, but now you only receive the
needed info.

For this setup to work you must of course be able to know what your script
does, so you can program that into the SP.
It will be double work (once in the SP, and once again in your scripts) but
it will remove the massive databaseresults import you have now.

I cannot judge if this will help you.
But it is a possibility to put some time into I think, maybe when studying
SP you see a few other shortcuts to reduce your resultset.

If you need advise on SP with MySQL, I cannot help.(never looked into it)

Good luck.

Regards,
Erwin Moller

Nov 24 '06 #18

Michael Fesser

..oO(Sandman)

>Clear.
Just to be sure, you should of course NOT running seperate queries for each
result you need.
eg BAD:
SELECT username FROM tbluser WHERE (userid=23);
SELECT username FROM tbluser WHERE (userid=36);
SELECT username FROM tbluser WHERE (userid=123);

No, that's stupid.

>eg GOOD:
SELECT userid, username FROM tbluser WHERE
(userid IN (23,36,123));

But BAD:
SELECT userid, username FROM tbluser WHERE
(userid IN (<500 id numbers>));

I did a little test with a single table containing 100000(!) entries,
all with an ID and a random name. The task was to select 1000 randomly
chosen records. I did two tests:

1) a foreach loop, running 1000 queries
time: 0.90s

2) a single query with an IN operator and 1000 given IDs
time: 0.04s

Tested on a P3 550MHz/192MB RAM, MySQL 4.1.11

Micha

Nov 24 '06 #19

Michael Fesser

..oO(Sandman)

>What I'm trying to say is that it's hard to say, when a new page
"starts" to know what member info will be presented when it has
finnished.

OK, maybe it's not possible for the entire page, but at least for some
main parts like an article listing, where a whole bunch of member
informations is required.

>I realize the benefit of that, but it can't be done *beforehand*.

What *can* be done is doing it *after*. I.e. when presented with a
member id, the member_name() function only outputs a placeholder for
the information, like so:

Post from [[MEMBERINFO::1234]] on 2006-11-23:

When you fetch articles from the database, you know who wrote them. It
doesn't really matter if it's a member or a visitor.

Actually I can't imagine that it should not be possible to fetch the
articles, get the member IDs from them (if available) and then fetch the
member informations. Would take just 2 or 3 queries.

>And then, when processing the output buffer, I replace all these with
the proper information, and only fetch what member_name() has saved in
$GLOBALS["displayed_members"] or something like that.

Problem with THAT is that the page may contain one hundred, or maybe
five hundred such placeholders, and selecting something with "and id
in(<list of 500 id numbers>)" takes a lot of time (a lot more than
0.16 seconds).

Did you really test that?

>And then preg_replacing() them after selecting them
also takes some time.

str_replace() would be enough in this case, but of course it would still
require some time.

Micha

Nov 24 '06 #20

Sandman

In article <q9********************************@4ax.com>,
Michael Fesser <ne*****@gmx.dewrote:

What I'm trying to say is that it's hard to say, when a new page
"starts" to know what member info will be presented when it has
finnished.

OK, maybe it's not possible for the entire page, but at least for some
main parts like an article listing, where a whole bunch of member
informations is required.

Yes, I could - potentially, prefetch that list and then fetch member
info and then print the list. Sure. BUt there needs to be onlyh one
instance of member information being printed outside of this scope to
throw this off as a *solution* to my problem.

I realize the benefit of that, but it can't be done *beforehand*.

What *can* be done is doing it *after*. I.e. when presented with a
member id, the member_name() function only outputs a placeholder for
the information, like so:

Post from [[MEMBERINFO::1234]] on 2006-11-23:

When you fetch articles from the database, you know who wrote them. It
doesn't really matter if it's a member or a visitor.

Right, which means I can't do a "select from articles,members
wherearticles.member = member.id..." select since that would leave out
all article where member is null, i.e. written by a visitor.

Actually I can't imagine that it should not be possible to fetch the
articles, get the member IDs from them (if available) and then fetch the
member informations. Would take just 2 or 3 queries.

Right, I agree with you - that can be done - for that particular case.
I feel like we're going in circles. I have a problem - the problem is:

I have to be able to display information about an arbitrary member
at any given time without loading information about ALL members

Your solution is to make the member ids NOT be arbitrary, which is
only possible to a certain degree without rewriting my entire
1,200,000 line CMS project from scratch or something like that.

My basic solution has been thus far:

1. Coming up with a system where acquiring information about just ONE
member at any given time isn't time consuming should it be done 500
times for a page.

2. Printing placeholder information about member id's and post-process
the output buffer with the relevant information.

In 1, I was thinking about aggregating individual member info to text
files on the HD, and I will shortly test this to measure load times
for this, if file() is faster than mysql_query() and
mysql_fetch_assoc()

And then, when processing the output buffer, I replace all these with
the proper information, and only fetch what member_name() has saved in
$GLOBALS["displayed_members"] or something like that.

Problem with THAT is that the page may contain one hundred, or maybe
five hundred such placeholders, and selecting something with "and id
in(<list of 500 id numbers>)" takes a lot of time (a lot more than
0.16 seconds).

Did you really test that?

Yes.

And then preg_replacing() them after selecting them
also takes some time.

str_replace() would be enough in this case, but of course it would still
require some time.

Yes, my mistake. Sorry. :)

--
Sandman[.net]

Nov 24 '06 #21

Sandman

In article <79********************************@4ax.com>,
Michael Fesser <ne*****@gmx.dewrote:

.oO(Sandman)

Clear.
Just to be sure, you should of course NOT running seperate queries for
each
result you need.
eg BAD:
SELECT username FROM tbluser WHERE (userid=23);
SELECT username FROM tbluser WHERE (userid=36);
SELECT username FROM tbluser WHERE (userid=123);
No, that's stupid.

eg GOOD:
SELECT userid, username FROM tbluser WHERE
(userid IN (23,36,123));
But BAD:
SELECT userid, username FROM tbluser WHERE
(userid IN (<500 id numbers>));

I did a little test with a single table containing 100000(!) entries,
all with an ID and a random name. The task was to select 1000 randomly
chosen records. I did two tests:

1) a foreach loop, running 1000 queries
time: 0.90s

2) a single query with an IN operator and 1000 given IDs
time: 0.04s

Tested on a P3 550MHz/192MB RAM, MySQL 4.1.11

Yes, I apologize, I tested it the wrong way. My mistake. Sorry.

Doing it the same way as your way:

1. Fetching 1000 members, randomly chosen, one at a time:
0.254 seconds

2. Fetching all 1000 members in one really big IN():
0.007 seconds

So, where does this lead me...
A. Use placeholders, substitute them in post-processing. This may or
may not be a good idea due to lack of control, or a more demanind
replace process (i.e. should I replace the placeholder with the member
online status, shold the nick be in it, should the members label be in
it?)

or

B. Fetch each one as a single SQL query when requested. As you see,
making 1000 SQL queries took 0.254 seconds, which is less than the 0.8
seconds I'm spending om moving data today. And that's for 1000
members. I think I'm leaning towards this since it requires the least
amount of changes to my code. Hmmm
Anyway, thanks for helping me find my error in my benchmarks! :-D

--
Sandman[.net]

Nov 24 '06 #22

Sandman

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

Sandman wrote:

In article <45*********************@news.xs4all.nl>,
Erwin Moller
<si******************************************@spam yourself.com>
wrote:

Hi Sandman,

One last idea: stored procedures.
I don't know if mySQL has support for them, but they can speed things up
hugely. (I am a Postgres-guy)

You can move the logic from PHP to the stored procedure.
The SP will fetch the needed info from the tables based on what it found
in earlier queries (in the same SP).

I don't like SP because they hide the logic away from the language to the
database, are hard to debug, are hard to update, but they ARE fast.

The advantage is also that you do not have to place/parse/check the
recordset from MySQL memoryspace to PHP memoryspace.

The SP will simply deliver what PHP needs to know.
Ok, you lost me. How does it do that? How can a SP store information
about arbitrary member id and give it to PHP at will?

I once made a SP excactly for this reason, it delivered ready-to-go HTML
and removed all the logic from the language (Java in this case) to
Postgres, but then 3 years later I had to update it.... Really hard.
I avoid SP as hell since then. Maybe people with more experience than I
have with SP can do better jobs.

But I am quite sure SP CAN solve your speedproblem.
Ok, I'd be happy to hear just how it works before nosediving into some
docs about it, to see if we understand each other correctly.

Hey Sandman,

A Stored procedure is a piece of code that is run by the database itself (in
the same memoryspace as the database too).
You can use decisions (if/then/else), loop over resultsets from former
queries, etc.
So it is a programming language on its own INSIDE the database.
And fast as hell since no overhead is needed (eg pumping resultsets through
a pipe from the database process to php process).

I am not sure if that will help in your current situation, but this is the
idea:
- You write a stored procedure that will actually fetch all needed tuples
based on the same information your PHP receives (eg, userid, topicid, etc).
- This SP will do the same things your script would do (except of course
produce HTML) and you remember the needed info inside the SP.
- When the SP is finished it returns the resultset, which you use to load
above your script the same way as you do now, but now you only receive the
needed info.

For this setup to work you must of course be able to know what your script
does, so you can program that into the SP.
It will be double work (once in the SP, and once again in your scripts) but
it will remove the massive databaseresults import you have now.

I cannot judge if this will help you.
But it is a possibility to put some time into I think, maybe when studying
SP you see a few other shortcuts to reduce your resultset.

If you need advise on SP with MySQL, I cannot help.(never looked into it)

Good luck.

Regards,
Erwin Moller

Ok, thanks for the info. I will think about it. :)

--
Sandman[.net]

Nov 24 '06 #23

Advice about fetching user information

Similar topics