472,345 Members | 1,638 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,345 software developers and data experts.

How to access random lines in textfile

I have a textfile "textfile.txt" containing a list of words. There is
one word on each line. I want to pick two random lines from this
textfile, and I have tried to do something like:

//Loading the file into an array:
$textarray = file("textfile.txt);

//Using array_rand to pick two random words
$rand_numbers = array_rand($textarray, 2);

//Reading out the two words:
$rand_word_one = $textarray[$rand_numbers[0]];
$rand_word_two = $textarray[$rand_numbers[1]];

This seems to work ok if the textfile is small, but when I try a larger
textfile, I get an error indicating a memory overload. I am not very
surprised, to load the whole file using file() seems unnecessary.

I guess a better sollution would be to pick two random numbers between
1 and the total number of lines in the textfile, and then try to read
out these line numbers using readline etc, but how can I do this? Any
suggestions are welcome!

/H.A.

Jul 17 '05 #1
7 4888
On 23 May 2005 06:38:08 -0700, Hans A wrote:
pick two random numbers between 1 and the total number of lines
in the textfile, and then try to read out these line numbers


$fname = 'textfile.txt';
$lines = 1000; //number of lines in the text file
$words = 2; //number of words (lines) to pick

$skip = array();
for ( $i = 0; $i < $words; ++$i )
{
$r = mt_rand( 0, $lines - $words );
$lines -= $r;
$skip[] = $r;
}

$word = array();
$fh = fopen( $fname );
for ( $i = 0; $i < $words; ++$i )
{
for ( $j = 0; $j <= $skip[$i]; ++$j )
$w = fgets( $fh );
$word[] = trim( $w );
}

echo 'Random words: '.implode( ', ', $word );
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #2
$lines = file("file.txt");

echo $lines[0]; //First line;
echo $lines[3]; //fourth line;

Jul 17 '05 #3
Just fseek() to a random location in the file, then do 2 fgets()--the
first to remove the potentially truncated line, the second to get the
next line.

Jul 17 '05 #4
On Mon, 23 May 2005, Ewoud Dronkert wrote:
On 23 May 2005 06:38:08 -0700, Hans A wrote:
pick two random numbers between 1 and the total number of lines
in the textfile, and then try to read out these line numbers


You don't need to know the number of lines in the text file before you
start to select one line at random.

Reading a line at a time from the file, just update the selected line if
floor(mt_rand(0, $currentLineNumber - 1)) is 0. At the end, your selected
line is a fair and random choice, but you don't need to store more than 2
lines in memory at any one time and there's no need to count the lines
first.

Something like:

<?php

$file = fopen($filename);
$counter = 0;
while (($line = fgets($file)) !== false) {
if (floor(mt_rand(0, $counter++)) == 0) {
$selectedLine = $line;
}
}
fclose($file);
[... do something with $selectedLine ...]

?>

If there's one line, it always matches, since mt_rand(0, 0) always returns
0.

If there are two lines, there is a 1 in 2 chance the second line will
overwrite our $selectedLine.

By the third iteration, the 1 in 3 chance the third line will match is
split evenly (on average) between the matches and non-matches for the
second line, if you see what I mean, leaving a 1 in 3 chance for each of
the three lines. And so ad nauseum.

Fairly applying the requirement to select *two* lines at random, you might
have to run through the file twice, ignoring the line you selected last
time. In this case you'd need to store the selected line's index as well
as its content. Or you could just add a second if statement to the loop,
but you'd have to work around the chance of selecting the same line twice.

--
Matt

Jul 17 '05 #5
On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:
Reading a line at a time from the file, just update the selected line if
floor(mt_rand(0, $currentLineNumber - 1)) is 0.
Will never get beyond first line! And if implemented properly, I don't
believe it's fair (but can't be bothered to do the math, sorry).

Neither was my suggestion by the way; every next number is totally
dependent on the previous one:
$skip = array();
for ( $i = 0; $i < $words; ++$i )
{
$r = mt_rand( 0, $lines - $words );
$lines -= $r;
$skip[] = $r;
}


But it was an easy way to avoid picking the same number twice or more.
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #6
On Tue, 24 May 2005, Ewoud Dronkert wrote:
On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:
Reading a line at a time from the file, just update the selected line
if floor(mt_rand(0, $currentLineNumber - 1)) is 0.
Will never get beyond first line!


Perhaps I didn't make it clear that you need to iterate across every line
of the file even if you find a match. The point is that at each iteration
you change your selected line if the call to mt_rand returns 0; on the
first line it should always match. A certain number of runs
(1/numberOfLines) will never match again. The others will be distributed
evenly across the lines in the file.
And if implemented properly, I don't believe it's fair (but can't be
bothered to do the math, sorry).


Think of it like this: 100% of runs will match on the first line.
<----------1----------->

On the second line, 50% of runs will overwrite the selected line with the
current line.
<----1-----><----2----->

On the third line, a third of runs will overwrite with the current line.
But half of those (a sixth) will not have matched on line 2, and the other
half will.
<--1---><3-><--2---><3->

This works out, when rearranged below, at exactly one third chance of
matching each of the three lines.
<--1---><--2---><--3--->

You can continue to apply this logic for as many lines as you like, but
I'm lazy so I choose to stop here. (note: also the reason I didn't bother
to rejig it to return two lines instead of one :) ).

Disclaimer: I'm pretty sure I didn't come up with this logic. I probably
read it in a book once.

Cheers,
--
Matt
Jul 17 '05 #7
On Tue, 24 May 2005 14:28:44 +0100, Matt Raines wrote:
Perhaps I didn't make it clear that [...]


No, sorry, my fault. I half expected one thing then didn't read on very
well.

Your solution is nifty, but rather expensive because of every call to
mt_rand() or rand() on each line of the file, especially if every word
chosen requires another complete walk of the file (or concurrent but
different rand() calls). My algorithm requires only one walk of the file
for any number of random words picked, or two if the number of lines is
not known.

What is the best way to pick k numbers from range n while optimizing
speed, storage and/or randomness? Maybe:

$n = 1000;
$k = 2;
$a = range( 0, $n - 1 );
for ( $i = 0; $i < $k; ++$i )
{
$j = mt_random( $i, $n - 1 );
$a[$i] = $a[$j];
$a[$j] = $i;
}
$b = array_slice( $a, 0, $k );

(Still just as many calls to mt_rand() as no. of words picked).
Btw, is this for-loop (but then with $i<$n) the way the shuffle function
is implemented?
To prep $b for acting as $skip from my first post:

sort( $b );
for ( $i = 1; $i < $k; ++$i )
$b[$i] -= $b[$i - 1] + 1;
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Keith Tellinghuisen | last post by:
I have developed some courses whereby a php script reads and displays a textfile. Clicking a "Next" button increments a variable and submits the...
1
by: Eric | last post by:
I'm a new programmer on Visual Basic and for any of you who play tank wars, I'm attempting to lay out a terrain similar to it. What I need is a...
7
by: nizar.jouini | last post by:
Hello. I have long text file whitch is formatted like this: nextrow4 asdf asdf
8
by: Danny Smith | last post by:
Hi, I need to read a file and be able to: 1. Find the current position in the stream 2. Have access to a handy ReadLine() method. Obviously...
5
by: Lyle Fairfield | last post by:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/callnetfrcom.asp The Joy of Interoperability Sometimes a...
16
by: Claudio Grondi | last post by:
I have a 250 Gbyte file (occupies the whole hard drive space) and want to change only eight bytes in this file at a given offset of appr. 200 Gbyte...
9
by: Justme | last post by:
Novice programmer needs help with using fgets to read and ignore the first two lines of a file. I've gone thru the previous posting regarding...
4
by: BibI | last post by:
Hi there, I just started programming with PERL and am trying to put together my first little data manipulation program. I am working on a MAC with...
20
by: Robbie Hatley | last post by:
I needed a quick program called "random" that gives a random positive integer from n1 to n2. For example, if I type "random 38, 43", I want the...
0
better678
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
0
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
0
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
jalbright99669
by: jalbright99669 | last post by:
Am having a bit of a time with URL Rewrite. I need to incorporate http to https redirect with a reverse proxy. I have the URL Rewrite rules made...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.