473,325 Members | 2,860 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,325 software developers and data experts.

threadsort

I recently (loosely) translated a file-based perl cgi bulletin board
script to php.

Both scripts now produce the same output. But the
perl version is substantially faster than my php
version. Perhaps someone else can figure out how to
optimize the sorting of file-based 'threads' of posts,
which is where the bottleneck seems to be.
Can someone else do a better job?
Both these scripts use a readdir to get the contents of
the current directory. Every post is a file with
a numerical name like 0004
0004.0001 would be a reply to post 0004
0004.0002.0001 would be the first reply to the second reply to 0004

So you want to assemble individual subject threads as
sorted, numerically ascending arrays of string.

But the collection of all such arrays of strings (threads) want to be
sorted in reverse order, so newer threads appear (on the top of the page)
before older threads. So you want to sort those based on the
first string in each array of string (the root post of that thread).

the perl:
#generic sort function by Joseph Hall, jo****@5sigma.com
sub fieldsort {
my ($sep, $cols);
if (ref $_[0]) {
$sep = '\\s+'
}
else {
$sep = shift;
}
unless (ref($cols = shift) eq 'ARRAY') {
die "fieldsort columns must be in anon array";
}
my (@sortcode, @col);
my $col = 1;
for (@$cols) {
my ($a, $b) = /^-/ ? qw(b a) : qw(a b);
my $op = /n$/ ? '<=>' : 'cmp';
push @col, (/(\d+)/)[0] - 1;
push @sortcode, "\$${a}->[$col] $op \$${b}->[$col]";
$col++;
}
my $sortfunc = eval "sub { " . join (" or ", @sortcode) . " } ";
my $splitfunc = eval 'sub { (split /$sep/o, $_)[@col] } ';
return
map $_->[0],
sort { $sortfunc->() }
map [$_, $splitfunc->($_)],
@_;

}

==slower, uglier php that does produce the same output ============
function how_deep($msg)
{
$cnt=0;
$notdone=$msg;
while($notdone)
{
$notdone = strstr($msg,".");
$msg=substr($notdone,1);
$cnt++;
}
return($cnt);
}

function threadBase($msg)
{
$substr = ereg_replace(strstr($msg,"."),"",$msg);
return($substr);
}

function threadsort($files)
{
$fcnt = count($files);
sort($files, SORT_NUMERIC);
$list=array();
for($i=0; $i<$fcnt; $i++)
{
$key = $files[$i];
$depth = $this->how_deep($key);
if($depth == 1)
{
$thread_head = array();
$thread_head[] = $key;
$list[$key] = $thread_head;
$this->threadkeys[] = $key;
}
else
{
$thread_key = $this->threadBase($key);
$list[$thread_key][] = $key;
}
}
rsort($this->threadkeys, SORT_NUMERIC);
return($list);
}
Jul 17 '05 #1
8 1766
"Salmo Bytes" a écrit le 07/01/2004 :
[...]
==slower, uglier php that does produce the same output ============
function how_deep($msg)
{
$cnt=0;
$notdone=$msg;
while($notdone)
{
$notdone = strstr($msg,".");
$msg=substr($notdone,1);
$cnt++;
}
return($cnt);
}


I would do this like :
function how_deep($msg)
{
return sizeof(explode($msg,"."));
}

--
Have you read the manual?
http://www.php.net/manual/en/

Jul 17 '05 #2
"Salmo Bytes" <de*****@montana-riverboats.com> wrote in message
news:a8**************************@posting.google.c om...
I recently (loosely) translated a file-based perl cgi bulletin board
script to php.

Both scripts now produce the same output. But the
perl version is substantially faster than my php
version. Perhaps someone else can figure out how to
optimize the sorting of file-based 'threads' of posts,
which is where the bottleneck seems to be.
Can someone else do a better job?
Both these scripts use a readdir to get the contents of
the current directory. Every post is a file with
a numerical name like 0004
0004.0001 would be a reply to post 0004
0004.0002.0001 would be the first reply to the second reply to 0004

I created a bunch of random files, acording to your scheme (xx - xx.xx -
xx.xx.xx )
and read through the entire directory, and added each filename to an array

$aryFiles[] = $fileName;

then did
$sortFiles = arsort($aryFiles);

then they all came out in this order

0003.0001
0003.0001.0001
0003.0001.0002
0003.0002
0003.0002.0001

is this what you are looking for?

--
Mike Bradley
http://www.gzentools.com -- free online php tools
Jul 17 '05 #3
"Salmo Bytes" a écrit le 07/01/2004 :
[...]
function threadBase($msg)
{
$substr = ereg_replace(strstr($msg,"."),"",$msg);
return($substr);
}


I would have done :
function threadBase($msg)
{
$substr = explode(".",$msg);
return($substr[0]);
}

But maybe you should look at CountScubula post before...

--
Have you read the manual?
http://www.php.net/manual/en/

Jul 17 '05 #4
"Jedi121" <je*********@free.fr.Removethis> wrote in message
news:me********************************@free.fr.Re movethis...
"Salmo Bytes" a écrit le 07/01/2004 :
[...]
function threadBase($msg)
{
$substr = ereg_replace(strstr($msg,"."),"",$msg);
return($substr);
}


I would have done :
function threadBase($msg)
{
$substr = explode(".",$msg);
return($substr[0]);
}

But maybe you should look at CountScubula post before...

--
Have you read the manual?
http://www.php.net/manual/en/

Did I miss something? I did realy read his entire post (sorry, I dont want
to port code)
Did I answer a question that didn't exist? sorry, I will try and read
things a little more.

--
Mike Bradley
http://www.gzentools.com -- free online php tools

Jul 17 '05 #5
On 6 Jan 2004 15:04:03 -0800, de*****@montana-riverboats.com (Salmo Bytes)
wrote:
I recently (loosely) translated a file-based perl cgi bulletin board
script to php.

Both scripts now produce the same output. But the
perl version is substantially faster than my php
version. Perhaps someone else can figure out how to
optimize the sorting of file-based 'threads' of posts,
which is where the bottleneck seems to be.
Can someone else do a better job?
Both these scripts use a readdir to get the contents of
the current directory. Every post is a file with
a numerical name like 0004
0004.0001 would be a reply to post 0004
0004.0002.0001 would be the first reply to the second reply to 0004

So you want to assemble individual subject threads as
sorted, numerically ascending arrays of string.

But the collection of all such arrays of strings (threads) want to be
sorted in reverse order, so newer threads appear (on the top of the page)
before older threads. So you want to sort those based on the
first string in each array of string (the root post of that thread).


How about this; the array in the following code is sorted how I think you want
it sorted (descending by first component, ascending by the reply components?),
shuffled, then resorted to return to the original sort order:

<pre>
<?php
$a = array(
'0003',
'0003.0001.0001',
'0003.0001.0002',
'0003.0001.0002.0001',
'0003.0001.0002.0002',
'0003.0001.0002.0003',
'0003.0002.0001',
'0003.0002.0002',
'0003.0003.0001',
'0002',
'0002.0001.0001',
'0002.0001.0002',
'0002.0002.0001',
'0002.0002.0002',
'0001',
'0001.0001.0001',
'0001.0001.0002',
'0001.0002.0001',
'0001.0002.0002'
);

shuffle($a);

var_dump($a);

function threadsort($a, $b) {
$thread_a = substr($a, 0, 4);
$thread_b = substr($b, 0, 4);

$reply_a = substr($a, 5);
$reply_b = substr($b, 5);

if ($thread_cmp = strcmp($thread_a, $thread_b))
return -$thread_cmp;
else
return strcmp($reply_a, $reply_b);
}

usort($a, 'threadsort');
var_dump($a);

?>
</pre>

Outputs:

array(19) {
[0]=>
string(14) "0001.0001.0002"
[1]=>
string(14) "0002.0001.0002"
[2]=>
string(4) "0001"
[3]=>
string(14) "0002.0001.0001"
[4]=>
string(19) "0003.0001.0002.0002"
[5]=>
string(14) "0003.0002.0002"
[6]=>
string(14) "0001.0001.0001"
[7]=>
string(14) "0001.0002.0002"
[8]=>
string(19) "0003.0001.0002.0001"
[9]=>
string(14) "0002.0002.0001"
[10]=>
string(14) "0003.0002.0001"
[11]=>
string(14) "0003.0001.0001"
[12]=>
string(14) "0003.0001.0002"
[13]=>
string(4) "0002"
[14]=>
string(19) "0003.0001.0002.0003"
[15]=>
string(14) "0003.0003.0001"
[16]=>
string(4) "0003"
[17]=>
string(14) "0002.0002.0002"
[18]=>
string(14) "0001.0002.0001"
}
array(19) {
[0]=>
string(4) "0003"
[1]=>
string(14) "0003.0001.0001"
[2]=>
string(14) "0003.0001.0002"
[3]=>
string(19) "0003.0001.0002.0001"
[4]=>
string(19) "0003.0001.0002.0002"
[5]=>
string(19) "0003.0001.0002.0003"
[6]=>
string(14) "0003.0002.0001"
[7]=>
string(14) "0003.0002.0002"
[8]=>
string(14) "0003.0003.0001"
[9]=>
string(4) "0002"
[10]=>
string(14) "0002.0001.0001"
[11]=>
string(14) "0002.0001.0002"
[12]=>
string(14) "0002.0002.0001"
[13]=>
string(14) "0002.0002.0002"
[14]=>
string(4) "0001"
[15]=>
string(14) "0001.0001.0001"
[16]=>
string(14) "0001.0001.0002"
[17]=>
string(14) "0001.0002.0001"
[18]=>
string(14) "0001.0002.0002"
}

Anywhere close?

Assumes the fields are all zero-padded to the same length. If they aren't, use
the natural sort comparison functions (strncmp), which should produce the
expected result.

--
Andy Hassall (an**@andyh.co.uk) icq(5747695) (http://www.andyh.co.uk)
Space: disk usage analysis tool (http://www.andyhsoftware.co.uk/space)
Jul 17 '05 #6
Yes, that's clean, compact and the right
output. I'll plug that into my code and see
how it runs.

Jul 17 '05 #7
"CountScubula" a écrit le 07/01/2004 :
Did I miss something? I did realy read his entire post (sorry, I dont want
to port code)
Did I answer a question that didn't exist? sorry, I will try and read
things a little more.


No, don't misunderstand me, I realised I was replying small bit at a
time instead of looking at the overall problem. So my meaning was for
Salmo Bytes to read your Post to answer his question and still look at
my posts to get ideas how someone else would write his functions.

Have a nice night,
Jedi

--
Have you read the manual?
http://www.php.net/manual/en/

Jul 17 '05 #8
"Jedi121" <je*********@free.fr.Removethis> wrote in message
news:me********************************@free.fr.Re movethis...
"CountScubula" a écrit le 07/01/2004 :
Did I miss something? I did realy read his entire post (sorry, I dont want to port code)
Did I answer a question that didn't exist? sorry, I will try and read
things a little more.


No, don't misunderstand me, I realised I was replying small bit at a
time instead of looking at the overall problem. So my meaning was for
Salmo Bytes to read your Post to answer his question and still look at
my posts to get ideas how someone else would write his functions.

Have a nice night,
Jedi

--
Have you read the manual?
http://www.php.net/manual/en/


IC,

Sometimes I end up answering a question that didn't exists, I sometimes lose
sight of a question sometimes.

And yes, there are 1,001 ways to skin a cat (2,101 in the thailand)

--
Mike Bradley
http://www.gzentools.com -- free online php tools
Jul 17 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.