473,386 Members | 1,647 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Count total no. of characters,words & sentences in a text file

Please try to do it while I try myself!

Apr 23 '07 #1
17 12119
Umesh said:
Please try to do it while I try myself!
Look what happened the last time you cross-posted to clc and clc++.

Pick a language - any language - but pick ONE language, and post to the
newsgroup appropriate to that language.

The problem to which you refer is one of the easier K&R examples, and
should cause you no difficulty.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.
Apr 23 '07 #2
"Umesh" wrote:
Please try to do it while I try myself!
Men don't talk much about doing "it".

Doing the counts you mention would probably be easier if you wrote three
separate functions, that way you only have one train of thought at a time.

Apr 23 '07 #3
Umesh wrote:
Please try to do it while I try myself!
Quit cross-posting between groups. I don't think you will get much help
if you continue this anti-social behavior.

Figure out which language you are working in, then post to that group.

Brian
Apr 23 '07 #4
On Apr 23, 8:37 pm, Umesh <fraternitydispo...@gmail.comwrote:
Please try to do it while I try myself!
done
system("wc");

Apr 23 '07 #5
On Apr 23, 2:37 pm, Umesh <fraternitydispo...@gmail.comwrote:
Please try to do it while I try myself!
Already done, and (IIRC) published.

We'll be happy to take a look at your solution, any time you wish to
post it

Apr 23 '07 #6
#include<stdio.h>
int main(void)
{
long int ch,c,num=0,num1=0,num2=0,num3=0;
FILE *f;
f=fopen("c:/1.txt","r");
while((ch=getc(f))!=EOF && (ch=getc(f))!=EOF)
{
if(ch==' ') ++num;
if(ch=='.') ++num1;
if(ch<=256) ++num2;
if(ch=='.' && c!=' ') ++num3; /* '.' followed by '.' denotes end of a
sentence.*/
}
printf("\nNo. of spaces = %ld",num);
printf("\nNo. of full stops = %ld",num1);
printf("\nNo. of characters = %ld",num2);
printf("\nNo. of characters without spaces = %ld",num2-num);
printf("\nNo. of sentences = %ld",num3);
return 0;
}

Apr 23 '07 #7
"Umesh" wrote:

Just a couple of random observations.
#include<stdio.h>
int main(void)
{
long int ch,c,num=0,num1=0,num2=0,num3=0;
Your choice of names is not very helpful.
FILE *f;
f=fopen("c:/1.txt","r");
while((ch=getc(f))!=EOF && (ch=getc(f))!=EOF)
{
if(ch==' ') ++num;
if(ch=='.') ++num1;
if(ch<=256) ++num2;
if(ch=='.' && c!=' ') ++num3; /* '.' followed by '.' denotes end of a
I doubt you wanted '=' in there.
In the US, a sentence is usually indicated by ". " , two spaces.
sentence.*/
}
printf("\nNo. of spaces = %ld",num);
printf("\nNo. of full stops = %ld",num1);
printf("\nNo. of characters = %ld",num2);
printf("\nNo. of characters without spaces = %ld",num2-num);
printf("\nNo. of sentences = %ld",num3);
return 0;
}
This looks like a draft. I would not think you were happy with it because
of the = above. When you get happy with it, try to improve it. For
example, handle hyphenated words, at least the simple cases. I don't think
they can all be resolved without a dictionary. Following Google rules for
words seems fine to me. AFAIK, they have no rules for sentences.
Apr 23 '07 #8
Umesh wrote:
#include<stdio.h>
int main(void)
{
long int ch,c,num=0,num1=0,num2=0,num3=0;
FILE *f;
f=fopen("c:/1.txt","r");
while((ch=getc(f))!=EOF && (ch=getc(f))!=EOF)
You never learn, do you?

--
Ian Collins.
Apr 23 '07 #9
On Apr 23, 2:37 pm, Umesh <fraternitydispo...@gmail.comwrote:
Please try to do it while I try myself!
It's been done.

When you have something, post it here, and we'll take a look at it.
Apr 23 '07 #10
On Apr 23, 2:37 pm, Umesh <fraternitydispo...@gmail.comwrote:
Please try to do it while I try myself!
I tried and it worked. Its fast too.
Counts whitespaces and detects a sentence not ending in '.' as well
(ie: '?' or '!').
perhaps what follows might help:
http://www.parashift.com/c++-faq-lit...t.html#faq-5.2

Apr 23 '07 #11
On Apr 23, 11:37 pm, Umesh <fraternitydispo...@gmail.comwrote:
Please try to do it while I try myself!
#include<stdio.h>
int main(void)
{
long int ch,c,num=0,num1=0,num2=0,num3=0;
FILE *f;
f=fopen("c:/1.txt","r");
while((ch=getc(f))!=EOF && (ch=getc(f))!=EOF)
{
if(ch==' ') ++num;
if(ch=='.') ++num1;
if(ch<=256) ++num2;
if(ch=='.' && c!=' ') ++num3; // '.' followed by '.' denotes end of a
sentence.
}
printf("\nNo. of spaces = %ld",num);
printf("\nNo. of full stops = %ld",num1);
printf("\nNo. of characters = %ld",num2);
printf("\nNo. of characters without spaces = %ld",num2-num);
printf("\nNo. of sentences = %ld",num3);
return 0;
}

Apr 23 '07 #12
Umesh wrote:
>
Please try to do it while I try myself!
Try to do what? What are you going to try yourself for?

F'ups set.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Apr 23 '07 #13
Umesh <fr****************@gmail.comwrites:
#include<stdio.h>
int main(void)
{
long int ch,c,num=0,num1=0,num2=0,num3=0;
FILE *f;
f=fopen("c:/1.txt","r");
while((ch=getc(f))!=EOF && (ch=getc(f))!=EOF)
{
if(ch==' ') ++num;
if(ch=='.') ++num1;
if(ch<=256) ++num2;
if(ch=='.' && c!=' ') ++num3; /* '.' followed by '.' denotes end of a
sentence.*/
}
printf("\nNo. of spaces = %ld",num);
printf("\nNo. of full stops = %ld",num1);
printf("\nNo. of characters = %ld",num2);
printf("\nNo. of characters without spaces = %ld",num2-num);
printf("\nNo. of sentences = %ld",num3);
return 0;
}
This program has numerous problems, which I'll be happy to discuss
with you if you pick *one* newsgroup to post to (if that newsgroup
happens to be comp.lang.c; I don't regularly read comp.lang.c++).
C and C++ are two different languages, and cross-posting between
comp.lang.c and comp.lang.c++ is almost never a good idea.

But the first thing you should do is to run the program and take a
look at its output (hint: the results it reports are incorrect).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Apr 23 '07 #14
On Apr 23, 8:51 pm, Richard Heathfield <r...@see.sig.invalidwrote:
Umesh said:
[...]
The problem to which you refer is one of the easier K&R examples, and
should cause you no difficulty.
Sort of. As I recall it, it was characters, words and lines,
not sentences. And K&R made a special point of defining what
they meant by "words" and "lines" (and pointing out that they
were simplistic definitions, which didn't necessarily correspond
to the "intuitive" definition).

So, independantly of the language he choses (C, C++ or
whatever), the first step should be to define exactly what the
code is supposed to do. Until he's done that, he shouldn't
write a single line of code. (Defining a "sentence" in a way
that can be programmed is not obvious, and defining "word" in a
way compatible with everyday use is perhaps not trivial either:
is "don't" one word or two?)

--
James Kanze (GABI Software) mailto:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 24 '07 #15
On Apr 23, 9:16 pm, Umesh <fraternitydispo...@gmail.comwrote:
if(ch=='.' && c!=' ') ++num3; /* '.' followed by '.' denotes end ofa
sentence.*/}
So "Mr. and Mrs. Brown went out." is two sentences, and "I went
out." isn't a sentence.

I suggest that you start by defining exactly what is and what
isn't a sentence (and a word---and you might even ask the
question about characters; I work a lot with UTF-8, where a
character can require several char's).

Don't write a single line of code until you've defined the
problem space precisely.

FWIW: in my own line breaking algorithms, I defined a sentence
as anything ending with [.?!], optionally followed by ["'], and
then any amount of white space (not just ' '). In the context
I'm working in abbreviations aren't a problem; the only one I
encounter in practice is "etc.", and it's easy to special case
that. For learning code, handling abbreviations may be a bit
too complex (but you should at least document the restriction),
but the rest can easily be implemented by means of a simple
state machine.

--
James Kanze (GABI Software) mailto:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 24 '07 #16
On Apr 23, 2:21 pm, Keith Thompson <k...@mib.orgwrote:
Umesh <fraternitydispo...@gmail.comwrites:
#include<stdio.h>
int main(void)
{
long int ch,c,num=0,num1=0,num2=0,num3=0;
FILE *f;
f=fopen("c:/1.txt","r");
while((ch=getc(f))!=EOF && (ch=getc(f))!=EOF)
{
if(ch==' ') ++num;
if(ch=='.') ++num1;
if(ch<=256) ++num2;
if(ch=='.' && c!=' ') ++num3; /* '.' followed by '.' denotes end of a
sentence.*/
}
printf("\nNo. of spaces = %ld",num);
printf("\nNo. of full stops = %ld",num1);
printf("\nNo. of characters = %ld",num2);
printf("\nNo. of characters without spaces = %ld",num2-num);
printf("\nNo. of sentences = %ld",num3);
return 0;
}

This program has numerous problems, which I'll be happy to discuss
with you if you pick *one* newsgroup to post to (if that newsgroup
happens to be comp.lang.c; I don't regularly read comp.lang.c++).
C and C++ are two different languages, and cross-posting between
comp.lang.c and comp.lang.c++ is almost never a good idea.

But the first thing you should do is to run the program and take a
look at its output (hint: the results it reports are incorrect).
I have done an exercise on this. But it doesn't deal with this
condition: comp.lang.c - this group name will be treated as two
sentences. How can I improve it :)

$
$ type a.c
#include <stdio.h>
#include <ctype.h>

int wc2(const char *filename)
{
FILE *fp;
int ch;
int nc; /*num of chars*/
int nw; /*num of words*/
int ns; /*num of sentences*/
int inw; /*inside a word*/

nc = nw = ns = 0;
inw = 0;
if ((fp = fopen(filename, "r")) == NULL)
return -1;
while ((ch = fgetc(fp)) != EOF){
if (isalnum(ch)){
nc++;
inw = 1;
} else if ((ispunct(ch) || ch == ' ') && (inw == 1)){
nw++;
if (ch == '!' || ch == '?' || ch == '.' || ch == ';')
ns++;
inw = 0;
}
}
fprintf(stdout, "num of chars: %d\nnum of words: %d\nnum of
sentences: %d\n"

, nc, nw, ns);
fclose(fp);
return 0;
}

int main(int argc, char **argv)
{
if (argc != 2)
fprintf(stdout, "Usage: %s <filename>", argv[0]);
wc2(argv[1]);
return 0;
}

$ type test.txt
This program has numerous problems, which I'll be happy to discuss
with you if you pick *one* newsgroup to post to (if that newsgroup
happens to be comp.lang.c; I don't regularly read comp.lang.c++).
C and C++ are two different languages, and cross-posting between
comp.lang.c and comp.lang.c++ is almost never a good idea.

But the first thing you should do is to run the program and take a
look at its output (hint: the results it reports are incorrect).
--
Keith Thompson (The_Other_Keith) k...@mib.org <http://www.ghoti.net/
~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/
~kst>
"We must do something. This is something. Therefore, we must do
this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"


$ a.out test.txt
num of chars: 533
num of words: 130
num of sentences: 19

$

Apr 24 '07 #17
On Apr 24, 5:45 pm, "lovecreatesbea...@gmail.com"
<lovecreatesbea...@gmail.comwrote:
I have done an exercise on this. But it doesn't deal with this
condition: comp.lang.c - this group name will be treated as two
sentences. How can I improve it :)
Define what you mean by sentence more exactly.

IIRC, the definition TeX uses, adopted to the C/C++ character
set, would be:

-- one of [.?!],
-- followed by zero or more of ['"],
-- followed by
. either zero or more whitespace, followed by the end of
file, or
. one or more whitespace, followed by a capital letter.

This works because TeX also expects things like "Mr. Brown" to
contain a non-breaking whitespace ('~' in TeX, 0xA0 in ISO
8859-1), which doesn't count as a whitespace. If you don't
require that, I can't think of anything but special casing to
handle Mr. and Mrs. (and Dr. and... any other abbreviation that
is often followed by a noun).

This is probably most easily handled by some sort of state
machine.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Apr 26 '07 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
by: Umesh | last post by:
Please try to do it while I try myself!
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.