473,799 Members | 2,924 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to access random lines in textfile

I have a textfile "textfile.t xt" containing a list of words. There is
one word on each line. I want to pick two random lines from this
textfile, and I have tried to do something like:

//Loading the file into an array:
$textarray = file("textfile. txt);

//Using array_rand to pick two random words
$rand_numbers = array_rand($tex tarray, 2);

//Reading out the two words:
$rand_word_one = $textarray[$rand_numbers[0]];
$rand_word_two = $textarray[$rand_numbers[1]];

This seems to work ok if the textfile is small, but when I try a larger
textfile, I get an error indicating a memory overload. I am not very
surprised, to load the whole file using file() seems unnecessary.

I guess a better sollution would be to pick two random numbers between
1 and the total number of lines in the textfile, and then try to read
out these line numbers using readline etc, but how can I do this? Any
suggestions are welcome!

/H.A.

Jul 17 '05 #1
7 5031
On 23 May 2005 06:38:08 -0700, Hans A wrote:
pick two random numbers between 1 and the total number of lines
in the textfile, and then try to read out these line numbers


$fname = 'textfile.txt';
$lines = 1000; //number of lines in the text file
$words = 2; //number of words (lines) to pick

$skip = array();
for ( $i = 0; $i < $words; ++$i )
{
$r = mt_rand( 0, $lines - $words );
$lines -= $r;
$skip[] = $r;
}

$word = array();
$fh = fopen( $fname );
for ( $i = 0; $i < $words; ++$i )
{
for ( $j = 0; $j <= $skip[$i]; ++$j )
$w = fgets( $fh );
$word[] = trim( $w );
}

echo 'Random words: '.implode( ', ', $word );
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #2
$lines = file("file.txt" );

echo $lines[0]; //First line;
echo $lines[3]; //fourth line;

Jul 17 '05 #3
Just fseek() to a random location in the file, then do 2 fgets()--the
first to remove the potentially truncated line, the second to get the
next line.

Jul 17 '05 #4
On Mon, 23 May 2005, Ewoud Dronkert wrote:
On 23 May 2005 06:38:08 -0700, Hans A wrote:
pick two random numbers between 1 and the total number of lines
in the textfile, and then try to read out these line numbers


You don't need to know the number of lines in the text file before you
start to select one line at random.

Reading a line at a time from the file, just update the selected line if
floor(mt_rand(0 , $currentLineNum ber - 1)) is 0. At the end, your selected
line is a fair and random choice, but you don't need to store more than 2
lines in memory at any one time and there's no need to count the lines
first.

Something like:

<?php

$file = fopen($filename );
$counter = 0;
while (($line = fgets($file)) !== false) {
if (floor(mt_rand( 0, $counter++)) == 0) {
$selectedLine = $line;
}
}
fclose($file);
[... do something with $selectedLine ...]

?>

If there's one line, it always matches, since mt_rand(0, 0) always returns
0.

If there are two lines, there is a 1 in 2 chance the second line will
overwrite our $selectedLine.

By the third iteration, the 1 in 3 chance the third line will match is
split evenly (on average) between the matches and non-matches for the
second line, if you see what I mean, leaving a 1 in 3 chance for each of
the three lines. And so ad nauseum.

Fairly applying the requirement to select *two* lines at random, you might
have to run through the file twice, ignoring the line you selected last
time. In this case you'd need to store the selected line's index as well
as its content. Or you could just add a second if statement to the loop,
but you'd have to work around the chance of selecting the same line twice.

--
Matt

Jul 17 '05 #5
On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:
Reading a line at a time from the file, just update the selected line if
floor(mt_rand(0 , $currentLineNum ber - 1)) is 0.
Will never get beyond first line! And if implemented properly, I don't
believe it's fair (but can't be bothered to do the math, sorry).

Neither was my suggestion by the way; every next number is totally
dependent on the previous one:
$skip = array();
for ( $i = 0; $i < $words; ++$i )
{
$r = mt_rand( 0, $lines - $words );
$lines -= $r;
$skip[] = $r;
}


But it was an easy way to avoid picking the same number twice or more.
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #6
On Tue, 24 May 2005, Ewoud Dronkert wrote:
On Tue, 24 May 2005 11:58:44 +0100, Matt Raines wrote:
Reading a line at a time from the file, just update the selected line
if floor(mt_rand(0 , $currentLineNum ber - 1)) is 0.
Will never get beyond first line!


Perhaps I didn't make it clear that you need to iterate across every line
of the file even if you find a match. The point is that at each iteration
you change your selected line if the call to mt_rand returns 0; on the
first line it should always match. A certain number of runs
(1/numberOfLines) will never match again. The others will be distributed
evenly across the lines in the file.
And if implemented properly, I don't believe it's fair (but can't be
bothered to do the math, sorry).


Think of it like this: 100% of runs will match on the first line.
<----------1----------->

On the second line, 50% of runs will overwrite the selected line with the
current line.
<----1-----><----2----->

On the third line, a third of runs will overwrite with the current line.
But half of those (a sixth) will not have matched on line 2, and the other
half will.
<--1---><3-><--2---><3->

This works out, when rearranged below, at exactly one third chance of
matching each of the three lines.
<--1---><--2---><--3--->

You can continue to apply this logic for as many lines as you like, but
I'm lazy so I choose to stop here. (note: also the reason I didn't bother
to rejig it to return two lines instead of one :) ).

Disclaimer: I'm pretty sure I didn't come up with this logic. I probably
read it in a book once.

Cheers,
--
Matt
Jul 17 '05 #7
On Tue, 24 May 2005 14:28:44 +0100, Matt Raines wrote:
Perhaps I didn't make it clear that [...]


No, sorry, my fault. I half expected one thing then didn't read on very
well.

Your solution is nifty, but rather expensive because of every call to
mt_rand() or rand() on each line of the file, especially if every word
chosen requires another complete walk of the file (or concurrent but
different rand() calls). My algorithm requires only one walk of the file
for any number of random words picked, or two if the number of lines is
not known.

What is the best way to pick k numbers from range n while optimizing
speed, storage and/or randomness? Maybe:

$n = 1000;
$k = 2;
$a = range( 0, $n - 1 );
for ( $i = 0; $i < $k; ++$i )
{
$j = mt_random( $i, $n - 1 );
$a[$i] = $a[$j];
$a[$j] = $i;
}
$b = array_slice( $a, 0, $k );

(Still just as many calls to mt_rand() as no. of words picked).
Btw, is this for-loop (but then with $i<$n) the way the shuffle function
is implemented?
To prep $b for acting as $skip from my first post:

sort( $b );
for ( $i = 1; $i < $k; ++$i )
$b[$i] -= $b[$i - 1] + 1;
--
Firefox Web Browser - Rediscover the web - http://getffox.com/
Thunderbird E-mail and Newsgroups - http://gettbird.com/
Jul 17 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1608
by: Keith Tellinghuisen | last post by:
I have developed some courses whereby a php script reads and displays a textfile. Clicking a "Next" button increments a variable and submits the form to the same page, which then reads and displays the next textfile. 99% of the time, all works well, but occasionally a server error appears. Refreshing the page seems to solve the problem, although in rare cases it no longer recognizes the cookie set by the page. Does anyone have any idea...
1
2075
by: Eric | last post by:
I'm a new programmer on Visual Basic and for any of you who play tank wars, I'm attempting to lay out a terrain similar to it. What I need is a terrain that I can draw (Basically just a random line across the screen). I have no idea how to do this though. http://www-user.tu-chemnitz.de/~mali/tankwars/images/spiel.gif Taht is a link to an idea of how it is to look. I have made the tanks with the ability to rotate the barrells, shoot, and...
7
2144
by: nizar.jouini | last post by:
Hello. I have long text file whitch is formatted like this: nextrow4 asdf asdf
8
6827
by: Danny Smith | last post by:
Hi, I need to read a file and be able to: 1. Find the current position in the stream 2. Have access to a handy ReadLine() method. Obviously the FileStream class supports random access, so you have a Seek() method and a Position property to find the current stream position, and the StreamReader class has a ReadLine() method. I thought using these together
5
2693
by: Lyle Fairfield | last post by:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/callnetfrcom.asp The Joy of Interoperability Sometimes a revolution in programming forces you to abandon all that's come before. To take an extreme example, suppose you have been writing Visual Basic applications for years now. If you're like many developers, you will have built up a substantial inventory of code in that time. And if you've been following...
16
7191
by: Claudio Grondi | last post by:
I have a 250 Gbyte file (occupies the whole hard drive space) and want to change only eight bytes in this file at a given offset of appr. 200 Gbyte (all other data in that file should remain unchanged). How can I do that in Python? Claudio Grondi
9
3466
by: Justme | last post by:
Novice programmer needs help with using fgets to read and ignore the first two lines of a file. I've gone thru the previous posting regarding fgets, but none of them seems to help my situation. I have airdata file that i have to read, but in other teh fscanf to work properly, i need to ignore the first two lines, because scanf does not read spaces. This is what my current code looks like #include <stdio.h>
4
7244
by: BibI | last post by:
Hi there, I just started programming with PERL and am trying to put together my first little data manipulation program. I am working on a MAC with OSX. I have a data file with the following header that has been created on a Windows XP machine: My goal is to get rid of the header and the empty lines to finally have an output file with only the number entries.
20
2299
by: Robbie Hatley | last post by:
I needed a quick program called "random" that gives a random positive integer from n1 to n2. For example, if I type "random 38, 43", I want the program to print a random member of the set {38, 39, 40, 41, 42, 43}. Also, I read in my compiler's documentation the following: To get a random number in the range 0..N, use rand()%(N+1). Note that the low bits of the rand's return value are not very random, so rand()%N for small values of N...
0
9685
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9538
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10025
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9068
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5461
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5584
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4138
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3755
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2937
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.