473,398 Members | 2,404 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Reading Words from File

I want to read in lines from a file and then seperate the words so i
can do a process on each of the words. Say the text file "readme.txt"
contains the following:

In the face of criticism from the left and right, President Bush
insisted Tuesday that Harriet Miers is the nation's best-qualified
candidate for the Supreme Court and assured skeptical conservatives
that his lawyer...

I could get an input to a char *s such that s = "In" and then i do
something with s, then s = "the" and then i do something with that,
etc. With no idea the length of any string or line or whitespace.

Heres what I have so far.

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void process(char *s) /* whats here is not really important *
{
printf("%s", s);
}

int main() {

char buffer[80];
FILE *f = fopen("readme.txt", "r");
char *s;

while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
{
while( sscanf(buffer, "%s", s) ) /* scans for words in line */
{
process(s); /* do stuff to the words */
}
}

fclose(f);
return 0;

}

Also, is there anyway to adjust the size of the buffer or reallocate
the memory so it doesn't overflow and get a seg error.

Nov 15 '05 #1
12 12499
"dough" <vi****@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...
I want to read in lines from a file and then seperate the words so i
can do a process on each of the words. Say the text file "readme.txt"
contains the following:

In the face of criticism from the left and right, President Bush
insisted Tuesday that Harriet Miers is the nation's best-qualified
candidate for the Supreme Court and assured skeptical conservatives
that his lawyer...

I could get an input to a char *s such that s = "In" and then i do
something with s, then s = "the" and then i do something with that,
etc. With no idea the length of any string or line or whitespace.


I don't want to be harsh, but it seems to me the 2nd paragraph is off topic
and unwise for a poster looking for help...

Alex
Nov 15 '05 #2
In article <11*********************@f14g2000cwb.googlegroups. com>,
dough <vi****@gmail.com> wrote:
:I want to read in lines from a file and then seperate the words so i
:can do a process on each of the words.

There is often a non-trivial semantic problem in deciding what
a "word" is in such matters. For example, in

"Oh!," he yelled (into his Hello-Kitty phone.)

then if you go by whitespace you get "words" such as

"Oh!," and (into and phone.) and Hello-Kitty

which is usually not the breakdown you want.
--
These .signatures are sold by volume, and not by weight.
Nov 15 '05 #3


dough wrote On 10/04/05 14:39,:
I want to read in lines from a file and then seperate the words so i
can do a process on each of the words. Say the text file "readme.txt"
contains the following:

In the face of criticism from the left and right, President Bush
insisted Tuesday that Harriet Miers is the nation's best-qualified
candidate for the Supreme Court and assured skeptical conservatives
that his lawyer...

I could get an input to a char *s such that s = "In" and then i do
something with s, then s = "the" and then i do something with that,
etc. With no idea the length of any string or line or whitespace.

Heres what I have so far.

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void process(char *s) /* whats here is not really important *
{
printf("%s", s);
}

int main() {

char buffer[80];
FILE *f = fopen("readme.txt", "r");
char *s;
It would be a good idea to test `f == NULL' before
proceeding ...
while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
{
while( sscanf(buffer, "%s", s) ) /* scans for words in line */
Here's a problem: `s' doesn't point to anything, so
when scanf() locates a word and tries to copy it to the
memory `s' points at, all manner of mischief can ensue.
{
process(s); /* do stuff to the words */
}
}

fclose(f);
return 0;

}
Also, is there anyway to adjust the size of the buffer or reallocate
the memory so it doesn't overflow and get a seg error.


If you used malloc() to create the space for `buffer', you
could use realloc() to enlarge it. But the immediate problem
is not the size of `buffer', but the uninitialized `s'.

Your overall task sounds like a job for the much-maligned
strtok() function. However, see Walter Roberson's post for
some of the pitfalls of using simple string-bashing to separate
"words" from their surroundings.

--
Er*********@sun.com

Nov 15 '05 #4
Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:
There is often a non-trivial semantic problem in deciding what
a "word" is in such matters. For example, in "Oh!," he yelled (into his Hello-Kitty phone.)


I must say that that is a truly bizarre example sentence :-) That
aside, it seems to me that assuming a "word" is a sequence of
consecutive alpha characters would yield better results, at least
depending on what OP wants to do with the "words" once he has them.

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 15 '05 #5
dough wrote:
I want to read in lines from a file and then seperate the words so i
can do a process on each of the words.

.......use strtok() function to split a string into words (use
whitespace or any other separator you want)

char buffer[80];
FILE *f = fopen("readme.txt", "r");
while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */

Also, is there anyway to adjust the size of the buffer or reallocate
the memory so it doesn't overflow and get a seg error.

........the fgets statement reads until num-1 characters are read (in
this case 79) or a newline or EOF is reached (whichever happens first).
So I don't think you need a realloc in this case.
HTH,
Hemanth

Nov 15 '05 #6
dough wrote:
I want to read in lines from a file and then seperate the words so i
can do a process on each of the words. Say the text file "readme.txt"
contains the following:

In the face of criticism from the left and right, President Bush
insisted Tuesday that Harriet Miers is the nation's best-qualified
candidate for the Supreme Court and assured skeptical conservatives
that his lawyer...

I could get an input to a char *s such that s = "In" and then i do
something with s, then s = "the" and then i do something with that,
etc. With no idea the length of any string or line or whitespace.
I am not sure what your problem is.
When you have a problem, please help us help you:
State what you want to achieve (this part seems clear) and
what about your solution did not work.
Otherwise, everyone tells you about A because you seemed to
ask for B while meaning C...

Heres what I have so far.

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void process(char *s) /* whats here is not really important *
{
printf("%s", s);
}

int main() {

char buffer[80];
FILE *f = fopen("readme.txt", "r");
char *s;
Check whether f is != NULL. If you omitted the check for
brevity, then write a comment.
while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
{
while( sscanf(buffer, "%s", s) ) /* scans for words in line */
{
process(s); /* do stuff to the words */
}
}
Okay, so what is the problem here? About everything:
1) you may inadvertently separate a word if your buffer is not
long enough (uncritical)
2) You scan always from the same position (buffer is effectively &buffer[0])
3) You read your string into memory pointed to by an unitialized pointer.

Consider
char s[sizeof buffer] = "", *tmp = NULL;
while (....)
{
tmp = buffer;
while ( sscanf(tmp, "%s", s) )
{
process(s);
tmp += strlen(s);
}
/* a */
}
This solves 2) and 3).
Another solution is the use of strtok() etc.

If you check at point "a" whether buffer[strlen(buffer)-1]=='\n',
then you can also detect instances of 1).
However, this may not be what you are looking for (see below)

fclose(f);
return 0;

}

Also, is there anyway to adjust the size of the buffer or reallocate
the memory so it doesn't overflow and get a seg error.


realloc() helps you do that.
Have a look at the comp.lang.c archives to see how to use it.

If you do not need the words in context, you also use getc() which
may be clearer:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define START_BUFSIZE 20
void process(const char *s);
int resize_buffer (char **buf, size_t *len);
int main (void)
{
FILE *f;
char *s = NULL;
size_t length = 0;
int input;

if (NULL == (f = fopen("readme.txt", "r")))
{
fprintf(stderr, "Cannot open file\n");
exit(EXIT_FAILURE);
}
if (NULL == (s = malloc((START_BUFSIZE+1) * sizeof *s)))
{
fprintf(stderr, "Error on allocating memory for s\n");
fclose(f);
exit(EXIT_FAILURE);
}
length = START_BUFSIZE;

do /* ... while (input != EOF) */
{
size_t curr = 0;

/* Read up to the first whitespace */
while (!isspace(input = getc(f)) && input != EOF)
{
s[curr++] = input;
if (curr == length)
{
if (resize_buffer(&s, &length))
{
/* perform error handling */
break;
}
}
}
/* Make s a string */
s[curr] = '\0';

if (curr)
process(s);

/* Read up to the first non-whitespace */
while ((input = getc(f)) != EOF)
{
putchar('*');
if (!isspace(input))
{
ungetc(input, f);
break;
}
}
} while (input != EOF);

free(s);
fclose(f);

putchar('\n');

return 0;
}

void process(const char *s) /* whats here is not really important */
{
printf("%s", s); fflush (stdout);
}

int resize_buffer (char **buf, size_t *len)
{
/* Using mybuf and mylen for readability */
char *mybuf = *buf;
size_t mylen = *len;

char *tmp;
size_t destlen = 2*mylen+1;

/* A */
if (NULL == (tmp = realloc(mybuf, destlen)))
{
return 1;
}
mybuf = tmp;
mylen = destlen - 1;

/* write back to parameters */
*buf = mybuf;
*len = mylen;

return 0;
}
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Nov 15 '05 #7
In article <dh**********@chessie.cirr.com>,
Christopher Benson-Manica <at***@nospam.cyberspace.org> wrote:
Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:
There is often a non-trivial semantic problem in deciding what
a "word" is in such matters.

aside, it seems to me that assuming a "word" is a sequence of
consecutive alpha characters would yield better results, at least
depending on what OP wants to do with the "words" once he has them.


Using "alpha" as the boundary definition runs into difficulties
with possessives, contractions, joined-words, and words such as
re-enter in which the dash indicates seperation of vowels that
would otherwise form a diapthong. It would likely also run
into problems with Mr. Salutation, and abbreviations such as etc.
in which the period is really part of the word.
--
Okay, buzzwords only. Two syllables, tops. -- Laurie Anderson
Nov 15 '05 #8


Christopher Benson-Manica wrote On 10/04/05 15:50,:
Walter Roberson <ro******@ibd.nrc-cnrc.gc.ca> wrote:

There is often a non-trivial semantic problem in deciding what
a "word" is in such matters. For example, in


"Oh!," he yelled (into his Hello-Kitty phone.)

I must say that that is a truly bizarre example sentence :-) That
aside, it seems to me that assuming a "word" is a sequence of
consecutive alpha characters would yield better results, at least
depending on what OP wants to do with the "words" once he has them.


This is a reasonable 1st approximation, but its tend-
ency to generate non-words (e.g., "st") isn't desirable.

--
Er*********@sun.com
Nov 15 '05 #9

"dough" <vi****@gmail.com> wrote in message
news:11*********************@f14g2000cwb.googlegro ups.com...
I want to read in lines from a file and then seperate the words so i
can do a process on each of the words. Say the text file "readme.txt"
contains the following:

In the face of criticism from the left and right, President Bush
insisted Tuesday that Harriet Miers is the nation's best-qualified
candidate for the Supreme Court and assured skeptical conservatives
that his lawyer...

I could get an input to a char *s such that s = "In" and then i do
something with s, then s = "the" and then i do something with that,
etc. With no idea the length of any string or line or whitespace.

Heres what I have so far.

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void process(char *s) /* whats here is not really important *
{
printf("%s", s);
}

int main() {

char buffer[80];
FILE *f = fopen("readme.txt", "r");
char *s;

while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
{
while( sscanf(buffer, "%s", s) ) /* scans for words in line */
{
process(s); /* do stuff to the words */
}
}

fclose(f);
return 0;

}

Also, is there anyway to adjust the size of the buffer or reallocate
the memory so it doesn't overflow and get a seg error.


"process" is a terrible name for a function in any context.

Barry
Nov 15 '05 #10
"Michael Mair" <Mi**********@invalid.invalid> wrote in message
news:3q************@individual.net...
dough wrote:
I want to read in lines from a file and then seperate the words so i
can do a process on each of the words. Say the text file "readme.txt" contains the following:
Interesting. No one has ever thought of doing that before. Where did you
come up with such a great idea for a program? It's unlike anything I've
ever heard of...
Also, is there anyway to adjust the size of the buffer or reallocate
the memory so it doesn't overflow and get a seg error.
realloc() helps you do that.
Have a look at the comp.lang.c archives to see how to use it.

That would be like studying. If he wanted to study he would go to
school.

If you do not need the words in context, you also use getc() which
may be clearer:


<Homework answers snipped>

Nice job you get him an A-.

--
Mabden
Nov 15 '05 #11
On 4 Oct 2005 11:39:39 -0700, "dough" <vi****@gmail.com> wrote:
I want to read in lines from a file and then seperate the words so i
can do a process on each of the words. Say the text file "readme.txt"
contains the following:
It would be nice if you mentioned what your problem was.

snip
Heres what I have so far.

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void process(char *s) /* whats here is not really important *
{
printf("%s", s);
}

int main() {

char buffer[80];
FILE *f = fopen("readme.txt", "r");
char *s;

while( fgets(buffer, sizeof(buffer), f) != NULL ) /* reads a line */
{
while( sscanf(buffer, "%s", s) ) /* scans for words in line */
s doesn't point anywhere sscanf can write to. This invokes undefined
behavior.
{
process(s); /* do stuff to the words */
}
}

fclose(f);
return 0;

}

Also, is there anyway to adjust the size of the buffer or reallocate
the memory so it doesn't overflow and get a seg error.


The seg error you experience has nothing to do with buffer, since you
never overflow it. It has everything to do with failing to have s
point somewhere.
<<Remove the del for email>>
Nov 15 '05 #12
Mabden wrote:
"Michael Mair" <Mi**********@invalid.invalid> wrote in message
news:3q************@individual.net...

[snip]
If you do not need the words in context, you also use getc() which
may be clearer:


<Homework answers snipped>

Nice job you get him an A-.


The original message was not too obviously a homework question
to me and contained a first shot at the problem, so I decided
to give the OP the benefit of doubt. If "dough" posts something
like that again or does not respond to the answer he or she got
in this thread, I won't.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Nov 15 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

20
by: sahukar praveen | last post by:
Hello, I have a question. I try to print a ascii file in reverse order( bottom-top). Here is the logic. 1. Go to the botton of the file fseek(). move one character back to avoid the EOF. 2....
30
by: siliconwafer | last post by:
Hi All, I want to know tht how can one Stop reading a file in C (e.g a Hex file)with no 'EOF'?
21
by: EdUarDo | last post by:
Hi all, I'm not a newbie with C, but I don't use it since more than 5 years... I'm trying to read a text file which has doubles in it: 1.0 1.1 1.2 1.3 1.4 2.0 2.1 2.2 2.3 2.4 I'm doing...
11
by: Freddy Coal | last post by:
Hi, I'm trying to read a binary file of 2411 Bytes, I would like load all the file in a String. I make this function for make that: '-------------------------- Public Shared Function...
8
by: Lonifasiko | last post by:
Hi, Using Process class I asynchronously launch an executable (black box executable) file from my Windows application. I mean asynchronously because I've got an EventHandler for "Exited" event....
4
by: archana | last post by:
Hi all, I want to read csv file into datatable. Is there any csv reader and writer available for freeware. I am reading csv file using schema.ini file. I don't want this dependency. The...
7
by: tackleberi | last post by:
hi, im having some trouble reading a file into java and then storing it in an array here the code i have so far: import java.io.FileNotFoundException; import java.io.FileReader; import...
2
by: Srinivas3279 | last post by:
I am new to the C/C++ My Program: int main(int argc, _TCHAR* argv) { //Declarations FILE *fp;
0
by: FredSovenix | last post by:
Can anybody provide instruction or examples on how to access the noise words file using T-SQL? I want to be able to adjust the users' search string by eliminating the noise words, but if the...
2
by: electromania | last post by:
Hi, Im reading a file, with 2 columns. this is working as Im reading I want to be able to count how many rows I've read and also add the all values as im reading from the second column, could...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.