By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,918 Members | 2,258 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,918 IT Pros & Developers. It's quick & easy.

Stroustrup section 1.5.4, word counting

P: n/a
this is an example programme that counts lines, words and characters.
i have noticed one thing that this programme counts space, a newline
and a tab as a character.

i know:

1. a newline is represented as '\n'
2. a tab as '\t'
3. a space as ' '

what i want to know is whether a newline, a space and a tab are
represented internally as characters ?

i know everything is represented as machine's character set, most
probably ASCII where 'A' is 65 but i am actually confused on this
'\t', '\n' , ' ', and character issue.

any help

here is the code that counts characters,words,tabs and newlines:

// word counting
#include <stdio.h>

#define IN 0
#define OUT 1

int main(void) {
int c, nl, nw, nc, state;

state = OUT;
nl = nc = nw = 0;

while((c = getchar()) != EOF)
{
++nc;

if (c == '\n')
++nl;

if( c == ' ' || c == '\n' || c == '\t')
state = OUT;

else if (state == OUT)
{
state = IN;
++ nw;
}
}

printf("%d NEWLINES \t %d WORDS \t %d CHARs \n", nl, nw, nc);

return 0;
}

Mar 9 '07 #1
Share this Question
Share on Google+
3 Replies


P: n/a
arnuld wrote:
this is an example programme that counts lines, words and characters.
i have noticed one thing that this programme counts space, a newline
and a tab as a character.

i know:

1. a newline is represented as '\n'
2. a tab as '\t'
3. a space as ' '

what i want to know is whether a newline, a space and a tab are
represented internally as characters ?
It depends on the machine and it's character set.
i know everything is represented as machine's character set, most
probably ASCII where 'A' is 65 but i am actually confused on this
'\t', '\n' , ' ', and character issue.

any help
Generally end-of-line sequence is represented by one or two
characters. Under UNIX it's a single linefeed character, while under
DOS-like systems it's a carriage-return followed by a linefeed. MacOS
used to use a single carriage-return. Doubtless other systems may use
more variations.

Spaces and tabs are usually represented by one character.
here is the code that counts characters,words,tabs and newlines:

// word counting
It's better to use /* ... */ style comments, especially when you're
posting code onto Usenet.

Mar 9 '07 #2

P: n/a
"arnuld" <ge*********@gmail.comwrites:
[snip]

You mean K&R, not Stroustrup.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Mar 10 '07 #3

P: n/a
"santosh" <sa*********@gmail.comwrites:
arnuld wrote:
>this is an example programme that counts lines, words and characters.
i have noticed one thing that this programme counts space, a newline
and a tab as a character.

i know:

1. a newline is represented as '\n'
2. a tab as '\t'
3. a space as ' '

what i want to know is whether a newline, a space and a tab are
represented internally as characters ?

It depends on the machine and it's character set.
>i know everything is represented as machine's character set, most
probably ASCII where 'A' is 65 but i am actually confused on this
'\t', '\n' , ' ', and character issue.

any help

Generally end-of-line sequence is represented by one or two
characters. Under UNIX it's a single linefeed character, while under
DOS-like systems it's a carriage-return followed by a linefeed. MacOS
used to use a single carriage-return. Doubtless other systems may use
more variations.
[...]

But C's I/O routines, when operating on files opened in text mode,
hide those details for you. Regardless of how an end-of-line is
represented in an external file (and there are a *lot* of ways to do
this, including fixed-length records with no specific marker), it's
mapped to a single '\n' character.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Mar 10 '07 #4

This discussion thread is closed

Replies have been disabled for this discussion.