By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
455,776 Members | 1,292 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 455,776 IT Pros & Developers. It's quick & easy.

How to access random lines in textfile

P: n/a
I have a textfile "textfile.txt" containing a list of words. There is
one word on each line. I want to pick two random lines from this
textfile, and I have tried to do something like:

//Loading the file into an array:
$textarray = file("textfile.txt);

//Using array_rand to pick two random words
$rand_numbers = array_rand($textarray, 2);

//Reading out the two words:
$rand_word_one = $textarray[$rand_numbers[0]];
$rand_word_two = $textarray[$rand_numbers[1]];

This seems to work ok if the textfile is small, but when I try a larger
textfile, I get an error indicating a memory overload. I am not very
surprised, to load the whole file using file() seems unnecessary.

I guess a better sollution would be to pick two random numbers between
1 and the total number of lines in the textfile, and then try to read
out these line numbers using readline etc, but how can I do this? Any
suggestions are welcome!

/H.A.

Jul 17 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
On 23 May 2005 06:38:08 -0700, Hans A wrote:
pick two random numbers between 1 and the total number of lines
in the textfile, and then try to read out these line numbers


$fname = 'textfile.txt';
$lines = 1000; //number of lines in the text file
$words = 2; //number of words (lines) to pick

$skip = array();
for ( $i = 0; $i < $words; ++$i )
{
$r = mt_rand( 0, $lines - $words );
$lines -= $r;
$skip[] = $r;
}

$word = array();
$fh = fopen( $fname );
for ( $i = 0; $i < $words; ++$i )
{
for ( $j = 0; $j <= $skip[$i]; ++$j )
$w = fgets( $fh );
$word[] = trim( $w );
}

echo 'Random words: '.implode( ', ', $word );
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #2

P: n/a
$lines = file("file.txt");

echo $lines[0]; //First line;
echo $lines[3]; //fourth line;

Jul 17 '05 #3

P: n/a
Just fseek() to a random location in the file, then do 2 fgets()--the
first to remove the potentially truncated line, the second to get the
next line.

Jul 17 '05 #4

P: n/a
On Mon, 23 May 2005, Ewoud Dronkert wrote:
On 23 May 2005 06:38:08 -0700, Hans A wrote:
pick two random numbers between 1 and the total number of lines
in the textfile, and then try to read out these line numbers


You don't need to know the number of lines in the text file before you
start to select one line at random.

Reading a line at a time from the file, just update the selected line if
floor(mt_rand(0, $currentLineNumber - 1)) is 0. At the end, your selected
line is a fair and random choice, but you don't need to store more than 2
lines in memory at any one time and there's no need to count the lines
first.

Something like:

<?php

$file = fopen($filename);
$counter = 0;
while (($line = fgets($file)) !== false) {
if (floor(mt_rand(0, $counter++)) == 0) {
$selectedLine = $line;
}
}
fclose($file);
[... do something with $selectedLine ...]

?>

If there's one line, it always matches, since mt_rand(0, 0) always returns
0.

If there are two lines, there is a 1 in 2 chance the second line will
overwrite our $selectedLine.

By the third iteration, the 1 in 3 chance the third line will match is
split evenly (on average) between the matches and non-matches for the
second line, if you see what I mean, leaving a 1 in 3 chance for each of
the three lines. And so ad nauseum.

Fairly applying the requirement to select *two* lines at random, you might
have to run through the file twice, ignoring the line you selected last
time. In this case you'd need to store the selected line's index as well
as its content. Or you could just add a second if statement to the loop,
but you'd have to work around the chance of selecting the same line twice.

--
Matt

Jul 17 '05 #5

P: n/a
On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:
Reading a line at a time from the file, just update the selected line if
floor(mt_rand(0, $currentLineNumber - 1)) is 0.
Will never get beyond first line! And if implemented properly, I don't
believe it's fair (but can't be bothered to do the math, sorry).

Neither was my suggestion by the way; every next number is totally
dependent on the previous one:
$skip = array();
for ( $i = 0; $i < $words; ++$i )
{
$r = mt_rand( 0, $lines - $words );
$lines -= $r;
$skip[] = $r;
}


But it was an easy way to avoid picking the same number twice or more.
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #6

P: n/a
On Tue, 24 May 2005, Ewoud Dronkert wrote:
On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:
Reading a line at a time from the file, just update the selected line
if floor(mt_rand(0, $currentLineNumber - 1)) is 0.
Will never get beyond first line!


Perhaps I didn't make it clear that you need to iterate across every line
of the file even if you find a match. The point is that at each iteration
you change your selected line if the call to mt_rand returns 0; on the
first line it should always match. A certain number of runs
(1/numberOfLines) will never match again. The others will be distributed
evenly across the lines in the file.
And if implemented properly, I don't believe it's fair (but can't be
bothered to do the math, sorry).


Think of it like this: 100% of runs will match on the first line.
<----------1----------->

On the second line, 50% of runs will overwrite the selected line with the
current line.
<----1-----><----2----->

On the third line, a third of runs will overwrite with the current line.
But half of those (a sixth) will not have matched on line 2, and the other
half will.
<--1---><3-><--2---><3->

This works out, when rearranged below, at exactly one third chance of
matching each of the three lines.
<--1---><--2---><--3--->

You can continue to apply this logic for as many lines as you like, but
I'm lazy so I choose to stop here. (note: also the reason I didn't bother
to rejig it to return two lines instead of one :) ).

Disclaimer: I'm pretty sure I didn't come up with this logic. I probably
read it in a book once.

Cheers,
--
Matt
Jul 17 '05 #7

P: n/a
On Tue, 24 May 2005 14:28:44 +0100, Matt Raines wrote:
Perhaps I didn't make it clear that [...]


No, sorry, my fault. I half expected one thing then didn't read on very
well.

Your solution is nifty, but rather expensive because of every call to
mt_rand() or rand() on each line of the file, especially if every word
chosen requires another complete walk of the file (or concurrent but
different rand() calls). My algorithm requires only one walk of the file
for any number of random words picked, or two if the number of lines is
not known.

What is the best way to pick k numbers from range n while optimizing
speed, storage and/or randomness? Maybe:

$n = 1000;
$k = 2;
$a = range( 0, $n - 1 );
for ( $i = 0; $i < $k; ++$i )
{
$j = mt_random( $i, $n - 1 );
$a[$i] = $a[$j];
$a[$j] = $i;
}
$b = array_slice( $a, 0, $k );

(Still just as many calls to mt_rand() as no. of words picked).
Btw, is this for-loop (but then with $i<$n) the way the shuffle function
is implemented?
To prep $b for acting as $skip from my first post:

sort( $b );
for ( $i = 1; $i < $k; ++$i )
$b[$i] -= $b[$i - 1] + 1;
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.