473,413 Members | 1,811 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,413 software developers and data experts.

How to access random lines in textfile

I have a textfile "textfile.txt" containing a list of words. There is
one word on each line. I want to pick two random lines from this
textfile, and I have tried to do something like:

//Loading the file into an array:
$textarray = file("textfile.txt);

//Using array_rand to pick two random words
$rand_numbers = array_rand($textarray, 2);

//Reading out the two words:
$rand_word_one = $textarray[$rand_numbers[0]];
$rand_word_two = $textarray[$rand_numbers[1]];

This seems to work ok if the textfile is small, but when I try a larger
textfile, I get an error indicating a memory overload. I am not very
surprised, to load the whole file using file() seems unnecessary.

I guess a better sollution would be to pick two random numbers between
1 and the total number of lines in the textfile, and then try to read
out these line numbers using readline etc, but how can I do this? Any
suggestions are welcome!

/H.A.

Jul 17 '05 #1
7 4978
On 23 May 2005 06:38:08 -0700, Hans A wrote:
pick two random numbers between 1 and the total number of lines
in the textfile, and then try to read out these line numbers


$fname = 'textfile.txt';
$lines = 1000; //number of lines in the text file
$words = 2; //number of words (lines) to pick

$skip = array();
for ( $i = 0; $i < $words; ++$i )
{
$r = mt_rand( 0, $lines - $words );
$lines -= $r;
$skip[] = $r;
}

$word = array();
$fh = fopen( $fname );
for ( $i = 0; $i < $words; ++$i )
{
for ( $j = 0; $j <= $skip[$i]; ++$j )
$w = fgets( $fh );
$word[] = trim( $w );
}

echo 'Random words: '.implode( ', ', $word );
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #2
$lines = file("file.txt");

echo $lines[0]; //First line;
echo $lines[3]; //fourth line;

Jul 17 '05 #3
Just fseek() to a random location in the file, then do 2 fgets()--the
first to remove the potentially truncated line, the second to get the
next line.

Jul 17 '05 #4
On Mon, 23 May 2005, Ewoud Dronkert wrote:
On 23 May 2005 06:38:08 -0700, Hans A wrote:
pick two random numbers between 1 and the total number of lines
in the textfile, and then try to read out these line numbers


You don't need to know the number of lines in the text file before you
start to select one line at random.

Reading a line at a time from the file, just update the selected line if
floor(mt_rand(0, $currentLineNumber - 1)) is 0. At the end, your selected
line is a fair and random choice, but you don't need to store more than 2
lines in memory at any one time and there's no need to count the lines
first.

Something like:

<?php

$file = fopen($filename);
$counter = 0;
while (($line = fgets($file)) !== false) {
if (floor(mt_rand(0, $counter++)) == 0) {
$selectedLine = $line;
}
}
fclose($file);
[... do something with $selectedLine ...]

?>

If there's one line, it always matches, since mt_rand(0, 0) always returns
0.

If there are two lines, there is a 1 in 2 chance the second line will
overwrite our $selectedLine.

By the third iteration, the 1 in 3 chance the third line will match is
split evenly (on average) between the matches and non-matches for the
second line, if you see what I mean, leaving a 1 in 3 chance for each of
the three lines. And so ad nauseum.

Fairly applying the requirement to select *two* lines at random, you might
have to run through the file twice, ignoring the line you selected last
time. In this case you'd need to store the selected line's index as well
as its content. Or you could just add a second if statement to the loop,
but you'd have to work around the chance of selecting the same line twice.

--
Matt

Jul 17 '05 #5
On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:
Reading a line at a time from the file, just update the selected line if
floor(mt_rand(0, $currentLineNumber - 1)) is 0.
Will never get beyond first line! And if implemented properly, I don't
believe it's fair (but can't be bothered to do the math, sorry).

Neither was my suggestion by the way; every next number is totally
dependent on the previous one:
$skip = array();
for ( $i = 0; $i < $words; ++$i )
{
$r = mt_rand( 0, $lines - $words );
$lines -= $r;
$skip[] = $r;
}


But it was an easy way to avoid picking the same number twice or more.
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #6
On Tue, 24 May 2005, Ewoud Dronkert wrote:
On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:
Reading a line at a time from the file, just update the selected line
if floor(mt_rand(0, $currentLineNumber - 1)) is 0.
Will never get beyond first line!


Perhaps I didn't make it clear that you need to iterate across every line
of the file even if you find a match. The point is that at each iteration
you change your selected line if the call to mt_rand returns 0; on the
first line it should always match. A certain number of runs
(1/numberOfLines) will never match again. The others will be distributed
evenly across the lines in the file.
And if implemented properly, I don't believe it's fair (but can't be
bothered to do the math, sorry).


Think of it like this: 100% of runs will match on the first line.
<----------1----------->

On the second line, 50% of runs will overwrite the selected line with the
current line.
<----1-----><----2----->

On the third line, a third of runs will overwrite with the current line.
But half of those (a sixth) will not have matched on line 2, and the other
half will.
<--1---><3-><--2---><3->

This works out, when rearranged below, at exactly one third chance of
matching each of the three lines.
<--1---><--2---><--3--->

You can continue to apply this logic for as many lines as you like, but
I'm lazy so I choose to stop here. (note: also the reason I didn't bother
to rejig it to return two lines instead of one :) ).

Disclaimer: I'm pretty sure I didn't come up with this logic. I probably
read it in a book once.

Cheers,
--
Matt
Jul 17 '05 #7
On Tue, 24 May 2005 14:28:44 +0100, Matt Raines wrote:
Perhaps I didn't make it clear that [...]


No, sorry, my fault. I half expected one thing then didn't read on very
well.

Your solution is nifty, but rather expensive because of every call to
mt_rand() or rand() on each line of the file, especially if every word
chosen requires another complete walk of the file (or concurrent but
different rand() calls). My algorithm requires only one walk of the file
for any number of random words picked, or two if the number of lines is
not known.

What is the best way to pick k numbers from range n while optimizing
speed, storage and/or randomness? Maybe:

$n = 1000;
$k = 2;
$a = range( 0, $n - 1 );
for ( $i = 0; $i < $k; ++$i )
{
$j = mt_random( $i, $n - 1 );
$a[$i] = $a[$j];
$a[$j] = $i;
}
$b = array_slice( $a, 0, $k );

(Still just as many calls to mt_rand() as no. of words picked).
Btw, is this for-loop (but then with $i<$n) the way the shuffle function
is implemented?
To prep $b for acting as $skip from my first post:

sort( $b );
for ( $i = 1; $i < $k; ++$i )
$b[$i] -= $b[$i - 1] + 1;
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Keith Tellinghuisen | last post by:
I have developed some courses whereby a php script reads and displays a textfile. Clicking a "Next" button increments a variable and submits the form to the same page, which then reads and displays...
1
by: Eric | last post by:
I'm a new programmer on Visual Basic and for any of you who play tank wars, I'm attempting to lay out a terrain similar to it. What I need is a terrain that I can draw (Basically just a random...
7
by: nizar.jouini | last post by:
Hello. I have long text file whitch is formatted like this: nextrow4 asdf asdf
8
by: Danny Smith | last post by:
Hi, I need to read a file and be able to: 1. Find the current position in the stream 2. Have access to a handy ReadLine() method. Obviously the FileStream class supports random access, so...
5
by: Lyle Fairfield | last post by:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/callnetfrcom.asp The Joy of Interoperability Sometimes a revolution in programming forces you to abandon all...
16
by: Claudio Grondi | last post by:
I have a 250 Gbyte file (occupies the whole hard drive space) and want to change only eight bytes in this file at a given offset of appr. 200 Gbyte (all other data in that file should remain...
9
by: Justme | last post by:
Novice programmer needs help with using fgets to read and ignore the first two lines of a file. I've gone thru the previous posting regarding fgets, but none of them seems to help my situation. I...
4
by: BibI | last post by:
Hi there, I just started programming with PERL and am trying to put together my first little data manipulation program. I am working on a MAC with OSX. I have a data file with the following...
20
by: Robbie Hatley | last post by:
I needed a quick program called "random" that gives a random positive integer from n1 to n2. For example, if I type "random 38, 43", I want the program to print a random member of the set {38,...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.