473,320 Members | 1,722 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

need help ...

I need a function that reads from a txt file and randomly chooses a line
from which retrieves a string (with spaces) and returns it to main function.

thx
Nov 14 '05 #1
14 1436
Nikola writes:
I need a function that reads from a txt file and randomly chooses a line
from which retrieves a string (with spaces) and returns it to main

function.

I assume this is a student problem. Try breaking the problem down. You can
not choose a random line until you know how many lines there are. So you
must start by counting the lines. To count the lines you will have to read
the lines. To read the lines you will have to open the file. To open the
file you must know the file name. And so on. Writing things down in some
form can be a great help. The preferred method is called pseudocode but any
form at all is better than just fussing and flailing.
Nov 14 '05 #2
in comp.lang.c i read:
I need a function that reads from a txt file and randomly chooses a line
from which retrieves a string (with spaces) and returns it to main function.


i bet you do. when is your homework due?

--
a signature
Nov 14 '05 #3
osmium wrote:
Nikola writes:

I need a function that reads from a txt file and randomly chooses a line
from which retrieves a string (with spaces) and returns it to main


function.

I assume this is a student problem. Try breaking the problem down. You can
not choose a random line until you know how many lines there are.


There are perfectly good ways to choose a line at random
without knowing in advance how many there are. In fact, you can
choose a random sample of N lines without advance knowledge; the
case N==1 is particularly easy, but N>1 isn't difficult. See
Knuth "The Art of Computer Programming, Volume II: Seminumerical
Algorithms," section 3.4.2, Algorithm R.

(Well, I guess you do in fact need *some* advance knowledge.
Specifically, you need to know that the source of random numbers
can deliver a different value for every line in the file, without
repetitions. A PRNG that will deliver two billion distinct
"good" numbers is easy to write in portable C; four billion is
also easy at some small sacrifice of "goodness." So the advance
knowledge amounts to knowing that the file holds fewer than four
billion lines; I think this may be considered axiomatic in the
context of a simple homework assignment.)

--
Er*********@sun.com

Nov 14 '05 #4
No, it's not a home work and yes, I am a begginer (weren't we all??) and
trying to do a little code of my own - was told that ambition is a virtue
:-)

Nov 14 '05 #5

On Fri, 14 May 2004, Eric Sosman wrote:

osmium wrote:
Nikola writes:
I need a function that reads from a txt file and randomly chooses a line
from which retrieves a string (with spaces) and returns it to main
function.
<snip> You cannot choose a random line until you know how many lines there are.
There are perfectly good ways to choose a line at random
without knowing in advance how many there are.

<snip> (Well, I guess you do in fact need *some* advance knowledge.
Specifically, you need to know that the source of random numbers
can deliver a different value for every line in the file, without
repetitions.


...Or, I think more generally, you can be satisfied with a RNG
that will return an unbiased random number in the range 1..n, for
every n in the range 1..N where N is the number of lines in the
file. That may be what you were getting at; after all, if the
RNG can deliver a number in the range 1..N, then in some sense it
must be able to deliver N different numbers "without repetition."
Of course, it doesn't need to generate its random numbers without
repeating itself --- that would be a pretty bad RNG! ;)

Okay, enough hints from me now. :)

-Arthur

Nov 14 '05 #6
Nikola wrote:

No, it's not a home work and yes, I am a begginer (weren't we all??) and
trying to do a little code of my own - was told that ambition is a virtue
:-)


Then you need to show us this "little code". Just giving us the problem
spec and requesting an answer doesn't really display much in the way of
ambition.

What have you figured out so far? What is your analysis of the problem?
Do you know how to open a file? Read out a line? How would you choose a
random line?

Brian Rodenborn
Nov 14 '05 #7
> What have you figured out so far? What is your analysis of the problem?
Do you know how to open a file? Read out a line? How would you choose a
random line?


Well, nothing anyone said didn't help so I kept trying ...
If there are some noncence-mistakes in my code that no self-respectfull
programmer would make
keep in mind that I'm just 19, meaning, a begginer
this code should read the random line of an file
commented parts were just a unsuccessfull toughts....
it's not in english this is the translation of some words:

znak-character
br_redaka-number of lines
brojac-counter
pojam-randomly chosen line (string)

this is ment to be a function that returns a string ...

FILE *dat;
char pojam[50],c,znak;
int br_redaka=1,brojac=1,rand_br;
dat=fopen("film.txt","r");
while(c!=char(EOF))
{
c=getc(dat);
if(c==char(10)) //10 - ascii for new line
br_redaka=br_redaka+1;
}
//generiranje nasumicnog broja od 0 do broja redaka //moguci BUG zbog
nule!
srand(time(NULL));
rand_br=rand() % br_redaka;
c='b';
while(c!=EOF)
{
/*c=getc(dat);
if(c==char(10)&&brojac==rand_br) //10-ascii kod za novi redak
{
brojac=brojac+1; //rand_br od 0 do br_redaka
fgets(pojam,50,dat);
}*/
c=getc(dat);
if(c==char(10)&&brojac==rand_br)
{
brojac=brojac+1;
while(znak!='\\')
//strcat(pojam,znak);
pojam[brojac-2]=c;

}
}
printf("%s",pojam);


Nov 14 '05 #8
Arthur J. O'Dwyer wrote:
On Fri, 14 May 2004, Eric Sosman wrote:
osmium wrote:
Nikola writes:
I need a function that reads from a txt file and randomly chooses a line
(Well, I guess you do in fact need *some* advance knowledge.
Specifically, you need to know that the source of random numbers
can deliver a different value for every line in the file, without
repetitions.


...Or, I think more generally, you can be satisfied with a RNG
that will return an unbiased random number in the range 1..n, for
every n in the range 1..N where N is the number of lines in the
file. That may be what you were getting at; after all, if the
RNG can deliver a number in the range 1..N, then in some sense it
must be able to deliver N different numbers "without repetition."


The algorithm I'm thinking of actually needs N distinct
values. Simply knowing that the range of generated numbers
contains more than N values is not good enough, nor is knowing
that the period of the PRNG exceeds N. You actually need a
"no duplicates among the first N values" guarantee.

At the risk of becoming almost topical: rand() itself lacks
such a guarantee, but it would be possible to write a wrapper
around rand() to get this behavior.
Of course, it doesn't need to generate its random numbers without
repeating itself --- that would be a pretty bad RNG! ;)


I don't understand what this means.

--
Er*********@sun.com

Nov 14 '05 #9
On Fri, 14 May 2004 19:38:17 +0200, "Nikola" <az*****@inet.hr> wrote:
Well, nothing anyone said didn't help so I kept trying ...
If there are some noncence-mistakes in my code that no self-respectfull
programmer would make
keep in mind that I'm just 19, meaning, a begginer
this code should read the random line of an file
commented parts were just a unsuccessfull toughts....
it's not in english this is the translation of some words:

znak-character
br_redaka-number of lines
brojac-counter
pojam-randomly chosen line (string)

this is ment to be a function that returns a string ...

FILE *dat;
char pojam[50],c,znak;
int br_redaka=1,brojac=1,rand_br;
dat=fopen("film.txt","r");
while(c!=char(EOF))
I see three problems.

Firstly char(EOF) is a syntax error.
Secondly getc() returns an int, not char.
Thirdly object c is used before it has been initialised.

Also testing for EOF before calling getc() is bad style. It
results in the loop executing one too many times, although
in your case the result of this is harmless. It is better
to write this loop as:

while (c=getc(dat), c!=EOF)
{
/* ... */
}

This also ensures that object c is initialised before the
comparision.
{
c=getc(dat);
if(c==char(10)) //10 - ascii for new line
This is also bad style. It needlessly restricts your program to
only working on ASCII implementations. Since the file is open
in text mode you can write:

if (c=='\n')
br_redaka=br_redaka+1;
}
//generiranje nasumicnog broja od 0 do broja redaka //moguci BUG zbog
nule!
srand(time(NULL));
rand_br=rand() % br_redaka;
See Q13.16 in the FAQ:

http://www.eskimo.com/~scs/C-faq/q13.16.html

You also need to reset the file pointer here:

fseek( dat, 0, SEEK_SET );
c='b';
while(c!=EOF)
{
/*c=getc(dat);
if(c==char(10)&&brojac==rand_br) //10-ascii kod za novi redak
{
brojac=brojac+1; //rand_br od 0 do br_redaka
fgets(pojam,50,dat);
}*/
c=getc(dat);
if(c==char(10)&&brojac==rand_br)
{
brojac=brojac+1;
while(znak!='\\')
//strcat(pojam,znak);
pojam[brojac-2]=c;
This while loop would appear to be infinite. I don't think it's
even needed.
}
}
You need to null terminate the char array to turn it into a valid
string for display:

projam[brojac-1] = '\0';
printf("%s",pojam);


Nick.
Nov 14 '05 #10

In article <40**********@sun.com>, Eric Sosman <Er*********@sun.com> writes:

The algorithm I'm thinking of actually needs N distinct
values. Simply knowing that the range of generated numbers
contains more than N values is not good enough, nor is knowing
that the period of the PRNG exceeds N. You actually need a
"no duplicates among the first N values" guarantee.


The "reservoir sampling" algorithm, on the other hand, doesn't need
this guarantee. It just needs an unbiased PRNG (and since one can
be constructed from a biased PRNG, that's pretty easy to do). There
was a nice writeup on RS in _Dr Dobb's Journal_ a while back, but
it's not available free at the DDJ site. No doubt it could be found
on Google with little trouble.

I don't have Knuth v2 here to compare the algorithm you mention with
RS.

--
Michael Wojcik mi************@microfocus.com

Vinegar keeps more flies away than honey does.
Nov 14 '05 #11
Michael Wojcik wrote:
In article <40**********@sun.com>, Eric Sosman <Er*********@sun.com> writes:
The algorithm I'm thinking of actually needs N distinct
values. Simply knowing that the range of generated numbers
contains more than N values is not good enough, nor is knowing
that the period of the PRNG exceeds N. You actually need a
"no duplicates among the first N values" guarantee.

The "reservoir sampling" algorithm, on the other hand, doesn't need
this guarantee. It just needs an unbiased PRNG (and since one can
be constructed from a biased PRNG, that's pretty easy to do). There
was a nice writeup on RS in _Dr Dobb's Journal_ a while back, but
it's not available free at the DDJ site. No doubt it could be found
on Google with little trouble.

I don't have Knuth v2 here to compare the algorithm you mention with
RS.


Knuth's description of RS requires distinct values.
Each item enters the reservoir iff its associated random
value is greater than the smallest already present (and
bumps the smallest out). If equality occurs an ambiguity
arises, and it's not clear how to resolve the ambiguity
without introducing a bias.

I don't have DDJ here to compare the algorithm you
mention with RS ;-)

--
Er*********@sun.com

Nov 14 '05 #12

On Fri, 14 May 2004, Eric Sosman wrote:

Arthur J. O'Dwyer wrote:
On Fri, 14 May 2004, Eric Sosman wrote:
osmium wrote:
Nikola writes:

>I need a function that reads from a txt file and randomly chooses a line
(Well, I guess you do in fact need *some* advance knowledge.
Specifically, you need to know that the source of random numbers
can deliver a different value for every line in the file, without
repetitions.


...Or, I think more generally, you can be satisfied with a RNG
that will return an unbiased random number in the range 1..n, for
every n in the range 1..N where N is the number of lines in the
file. The algorithm I'm thinking of actually needs N distinct
values. [...]


What's wrong with:

N := 0
Line := ""
for each line L in file
if (Rand in [0..N] == 0) then
Line := L
print Line

Simple, easy to code, requires an unbiased RNG that can do random
numbers in the range [0..N] for all N up to the number of lines
in the file.

Of course, it doesn't need to generate its random numbers without
repeating itself --- that would be a pretty bad RNG! ;)


I don't understand what this means.


Given an RNG with that property, generating numbers between 0 and
65535, and being able to see the first 60000 outputs or so, I'd have
a pretty good edge on predicting the next 5000 outputs. :) It's good
for whatever application you're thinking of, but it would not be a good
idea to make a 'rand()' implementation with this property. ;-)

-Arthur
Nov 14 '05 #13
Arthur J. O'Dwyer wrote:
On Fri, 14 May 2004, Eric Sosman wrote:
The algorithm I'm thinking of actually needs N distinct
values. [...]

What's wrong with:

N := 0
Line := ""
for each line L in file
if (Rand in [0..N] == 0) then
Line := L
print Line

Simple, easy to code, requires an unbiased RNG that can do random
numbers in the range [0..N] for all N up to the number of lines
in the file.


I'm either failing to understand your pseudocode, or
something's missing. The way I read the above, it just
prints the file's final line, because `Rand in [0..0]'
is (I'm guessing) always equal to zero. Is there supposed
to be some adjustment of `N' along the way?

It's also possible you and I are attacking slightly
different problems. I'm trying to choose an unbiased
sample of K lines (K == 1 in the O.P.'s question) from a
file containing N lines (assuming N >= K and N <= some
very large number depending on the PRNG implementation),
in one pass over the file and without foreknowledge of N.

--
Er*********@sun.com

Nov 14 '05 #14

In article <40**************@sun.com>, Eric Sosman <Er*********@sun.com> writes:
Michael Wojcik wrote:

The "reservoir sampling" algorithm, on the other hand, doesn't need
this guarantee. It just needs an unbiased PRNG (and since one can
be constructed from a biased PRNG, that's pretty easy to do). There
was a nice writeup on RS in _Dr Dobb's Journal_ a while back, but
it's not available free at the DDJ site. No doubt it could be found
on Google with little trouble.

I don't have Knuth v2 here to compare the algorithm you mention with
RS.
Knuth's description of RS requires distinct values.
Each item enters the reservoir iff its associated random
value is greater than the smallest already present (and
bumps the smallest out). If equality occurs an ambiguity
arises, and it's not clear how to resolve the ambiguity
without introducing a bias.


I figured it must be something like that, based on the requirement
you stated.
I don't have DDJ here to compare the algorithm you
mention with RS ;-)


I really should dig out the issue in question to provide a complete
description, but basically what it does is:

- Assume the domain being sampled contains at least as many
items (N) as the sample size (M). N is not known ahead of time.

- Populate the reservoir with the first M items from the domain

- Each remaining item is tested as a candidate for inclusion by
generating a random integer value between 0 and I-1, where I is the
total number of items read thus far. If the random value is < M, the
item in that slot in the reservoir is replaced by the current item.
(I hope that's right...)

Items read from the domain early on have a high probability of
entering the reservoir (it's 1 for the first M items), but they
undergo more tests and so are more likely to be bumped than items
that are added later.

The final item has exactly M in N (ie M/N) probability of being
added to the reservoir, since it undergoes only its initial test
for candidacy, and when it does I == N.

For all other items, the cumulative probability that they'll get
into the reservoir, and stay there, also works out to M/N.

For example:

Domain series {a,b,c,d}, reservoir size M=1.

1. a gets into the reservoir with probability 1.
2. a survives b's test with probability 1/2.
3. a survives c's test with probability 2/3 (c has a 1/3 chance).
4. a survives d's test with probability 3/4.

a's cumulative chance: (1)(1/2)(2/3)(3/4) = 6/24 = 1/4.

This algorithm just requires that you have an unbiased generator
that can provide intergers in the range [0,N-1].

--
Michael Wojcik mi************@microfocus.com

The penance was not building the field and bringing back Shoeless Joe
Jackson, but rather tossing on the field with his father. -- Kevin Aug
Nov 14 '05 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Sofia | last post by:
My name is Sofia and I have for many years been running a personals site, together with my partner, on a non-profit basis. The site is currently not running due to us emigrating, but during its...
6
by: Robert Maas, see http://tinyurl.com/uh3t | last post by:
System login message says PHP is available, so I tried this: http://www.rawbw.com/~rem/HelloPlus/h.php It doesn't work at all. Browser just shows the source. What am I doing wrong?
0
by: Gregory Nans | last post by:
hello, i need some help to 'tree-ify' a string... for example i have strings such as : s = """A(here 's , B(A ) silly test) C(to show D(what kind) of stuff i need))""" and i need to...
7
by: Mike Kamermans | last post by:
I hope someone can help me, because what I'm going through at the moment trying to edit XML documents is enough to make me want to never edit XML again. I'm looking for an XML editor that has a...
8
by: JustSomeGuy | last post by:
I need to write an new class derived from the list class. This class stores data in the list to the disk if an object that is added to the list is over 1K in size. What methods of the std stl...
3
by: Bob.Henkel | last post by:
I write this to tell you why we won't use postgresql even though we wish we could at a large company. Don't get me wrong I love postgresql in many ways and for many reasons , but fact is fact. If...
2
by: Michael R. Pierotti | last post by:
Dim reg As New Regex("^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}$") Dim m As Match = reg.Match(txtIPAddress.Text) If m.Success Then 'No need to do anything here Else MessageBox.Show("You need to enter a...
8
by: skumar434 | last post by:
i need to store the data from a data base in to structure .............the problem is like this ....suppose there is a data base which stores the sequence no and item type etc ...but i need only...
11
by: Alan Mailer | last post by:
A project I'm working on is going to use VB6 as a front end. The back end is going to be pre-existing MS Access 2002 database tables which already have records in them *but do not have any...
0
by: U S Contractors Offering Service A Non-profit | last post by:
Brilliant technology helping those most in need Inbox Reply U S Contractors Offering Service A Non-profit show details 10:37 pm (1 hour ago) Brilliant technology helping those most in need ...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.