Any way to take a word as input from stdin ?

arnuld

I searched the c.l.c archives provided by Google as Google Groups with
"word input" as the key words and did not come up with anything good.
C++ has std::string for taking a word as input from stdin. C takes input
in 2 ways:

1) as a character, etchar()
2) as a whole line, fgets()
as C programmer, are we supposed to create a get_word function everytime
when we need a words as input from stdin ( e.g. terminal)
--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 10 '08

Subscribe Post Reply

209

8541

<
1
2
3
4
>
Last »

Pilcrow

On Fri, 12 Sep 2008 06:17:25 +0000, Richard Heathfield
<rj*@see.sig.invalidwrote:

>Pilcrow said:

>On Thu, 11 Sep 2008 16:42:51 +0000, Richard Heathfield
<rj*@see.sig.invalidwrote:

>>>Pilcrow said:

<snip>

>>>I have a lot of trouble reading yours.

You may well be the first person ever to say that. People have made all
kinds of complaints about my code, but readability is not usually high on
the hit-list.

I apologize.

I wish you wouldn't. You have every right to say what you said. I wasn't
being "precious" about it - merely surprised! In fact, I'd be quite
curious to know more about *why* you have a lot of trouble reading my
code. Maybe there's something I can change to make it easier for you to
read without making it more difficult for myself and others.

It's just a matter of 'accent'. Just as it's easiest for me to
understand someone who speaks with a regional accent similar to mine
(I'm a native of Brooklyn), I understand easiest a coding style similar
to mine. I really shouldn't have brought it up.

Sep 12 '08 #51

CBFalconer

Richard Bos wrote:

Pilcrow <Pi******@gmail.comwrote:
>Richard Heathfield wrote:
>>Pilcrow said:

Try using fgets(), and strtok(). strtok() will allow you to
define word separators to your taste.

This is poor advice for a beginner. Whilst strtok does have its
uses, it also has issues - traps for the unwary programmer.
These derive from its maintenance of significant state between
calls, which makes it unsuitable

I understood that, and I am a 'beginner'. It is very adequately
covered in textbooks (see 'C in a Nutshell', ISBN 0-596-00697-7,
page 440), somewhat less so in K&R2. And I gave the questioner
an example to help him. My dissatisfaction with strtok() is
that repeated separation characters are treated as one, making
it difficult to present the user with an intuitively
understandable interface. It is not usually a good idea to
equate ignorance and stupidity.

There is also the catch that strtok() scribbles over its
parameter, meaning that you cannot use it to tokenise either a
string literal, or data you want to keep. This is something that
catches out a lot of less well-informed newbies.

Try this:

/* ------- file tknsplit.c ----------*/
#include "tknsplit.h"

/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.

The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.

Returns: a pointer past the terminating tknchar.

This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.

A better name would be "strtkn", except that is reserved
for the system namespace. Change to that at your risk.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
Revised 2006-06-13 2007-05-26 (name)
*/

const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh) /* length tkn can receive */
/* not including final '\0' */
{
if (src) {
while (' ' == *src) src++;

while (*src && (tknchar != *src)) {
if (lgh) {
*tkn++ = *src;
--lgh;
}
src++;
}
if (*src && (tknchar == *src)) src++;
}
*tkn = '\0';
return src;
} /* tknsplit */

#ifdef TESTING
#include <stdio.h>

#define ABRsize 6 /* length of acceptable tkn abbreviations */

/* ---------------- */

static void showtkn(int i, char *tok)
{
putchar(i + '1'); putchar(':');
puts(tok);
} /* showtkn */

/* ---------------- */

int main(void)
{
char teststring[] = "This is a test, ,, abbrev, more";

const char *t, *s = teststring;
int i;
char tkn[ABRsize + 1];

puts(teststring);
t = s;
for (i = 0; i < 4; i++) {
t = tknsplit(t, ',', tkn, ABRsize);
showtkn(i, tkn);
}

puts("\nHow to detect 'no more tkns' while truncating");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ',', tkn, 3);
showtkn(i, tkn);
i++;
}

puts("\nUsing blanks as tkn delimiters");
t = s; i = 0;
while (*t) {
t = tknsplit(t, ' ', tkn, ABRsize);
showtkn(i, tkn);
i++;
}
return 0;
} /* main */

#endif
/* ------- end file tknsplit.c ----------*/
/* ------- file tknsplit.h ----------*/
#ifndef H_tknsplit_h
# define H_tknsplit_h

# ifdef __cplusplus
extern "C" {
# endif

#include <stddef.h>

/* copy over the next tkn from an input string, after
skipping leading blanks (or other whitespace?). The
tkn is terminated by the first appearance of tknchar,
or by the end of the source string.

The caller must supply sufficient space in tkn to
receive any tkn, Otherwise tkns will be truncated.

Returns: a pointer past the terminating tknchar.

This will happily return an infinity of empty tkns if
called with src pointing to the end of a string. Tokens
will never include a copy of tknchar.

released to Public Domain, by C.B. Falconer.
Published 2006-02-20. Attribution appreciated.
revised 2007-05-26 (name)
*/

const char *tknsplit(const char *src, /* Source of tkns */
char tknchar, /* tkn delimiting char */
char *tkn, /* receiver of parsed tkn */
size_t lgh); /* length tkn can receive */
/* not including final '\0' */

# ifdef __cplusplus
}
# endif
#endif
/* ------- end file tknsplit.h ----------*/

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Sep 12 '08 #52

arnuld

On Fri, 12 Sep 2008 11:05:07 +0000, James Kuyper wrote:

Could you identify the std::string feature that implements this? I
couldn't find any use of the word "word" anywhere in section 21 of the
C++ standard, which describes std::string.

I think we have to look at the source code of std::string library and see
how it is implemented. I am sure it is done using C way, arrays and
pointers ;)

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 15 '08 #53

Ian Collins

arnuld wrote:

>
I think std::string in C++ defines what exactly *definition* of word is.
Look at my code and see how std::string works and perhaps we can settle on
some common and standard meaning word.

I "token" what you are looking for?

<snip code>

>
you see even if you put a line as input, std::string will automatically
dissect it into separate words.

No, it will not.

<OTThe input stream tokenises the input. The C++ standard defines how
formatted input is tokenised. </OT>

--
Ian Collins.

Sep 15 '08 #54

Richard Heathfield

arnuld said:

>On Thu, 11 Sep 2008 13:36:30 -0700, Keith Thompson wrote:

<snip>

>
>It's pretty much clear what your definition of a word is. It's still
not at all clear what a word is in general (and it can't be, since the
term is used inconsistently).

I think std::string in C++ defines what exactly *definition* of word is.

It defines what *its* definition is.

Look at my code and see how std::string works and perhaps we can settle
on some common and standard meaning word.

Even if we could, it would only be *our* definition, not a universal
definition.

Let me give you an example from ordinary English, where whitespace
delimiters are not sufficient:

"What did he say?", said Albert.
"He just said, 'I'll be there', I think", replied the captain.

Now, consider the whitespace-separated tokens:

"What
did
he
say?",
said
Albert.
"He
just
said,
'I'll
be
there',
I
think",
replied
the
captain.

Seventeen tokens there, but fewer than half of them are English words. The
rest are encumbered with some kind of punctuation. But do we really wish
to treat "said" and "said," differently? No, of course not. They are the
same word. So we need to strip punctuation, right?

Problem: design an algorithm for removing punctuation from arbitrary
English sentences, *without* removing punctuation that actually belongs to
the word (example: "will-o'-the-wisp" must retain its three hyphens and
its apostrophe).

As Knuth would say: [50]

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 15 '08 #55

arnuld

On Mon, 15 Sep 2008 04:50:29 +0000, Richard Heathfield wrote:

Even if we could, it would only be *our* definition, not a universal
definition.

Let me give you an example from ordinary English, where whitespace
delimiters are not sufficient:

....SNIP....

Problem: design an algorithm for removing punctuation from arbitrary
English sentences, *without* removing punctuation that actually belongs to
the word (example: "will-o'-the-wisp" must retain its three hyphens and
its apostrophe).

That means, we will also have a function containing all of the words with
intended hyphens and apostrophes to which we will compare the input words.
Hence that function will be used at run time and will have millions of
words, hence will be very expansive to run. If the user wants to enter
comp.lang.3c as words then its his choice or stupidity.Let hi do this way,
why we need to think about it.

As Knuth would say: [50]

I don't know what that means .

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 15 '08 #56

Richard Heathfield

arnuld said:

>On Mon, 15 Sep 2008 04:50:29 +0000, Richard Heathfield wrote:

>Even if we could, it would only be *our* definition, not a universal
definition.

Let me give you an example from ordinary English, where whitespace
delimiters are not sufficient:

>....SNIP....

>Problem: design an algorithm for removing punctuation from arbitrary
English sentences, *without* removing punctuation that actually belongs
to the word (example: "will-o'-the-wisp" must retain its three hyphens
and its apostrophe).

That means, we will also have a function containing all of the words with
intended hyphens and apostrophes to which we will compare the input
words.

Not necessarily a function, but yes, we would need some kind of dictionary
- and even then, we wouldn't be done, because some French or German or
Spanish or Czech or Polish or Slovakian or Turkish geezer would come along
and say "you call those words? Those aren't words - THESE are words...",
and give you a whole new set of problems.

The lesson here is that there is no single answer that will satisfy
everyone.

>As Knuth would say: [50]

I don't know what that means .

<sighI know.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 15 '08 #57

Keith Thompson

arnuld <su*****@invalid.addresswrites:

>On Fri, 12 Sep 2008 11:05:07 +0000, James Kuyper wrote:
Could you identify the std::string feature that implements this? I
couldn't find any use of the word "word" anywhere in section 21 of the
C++ standard, which describes std::string.

I think we have to look at the source code of std::string library and see
how it is implemented. I am sure it is done using C way, arrays and
pointers ;)

<OT>
No, std::string doesn't define what a "word" is. The overloaded "<<"
operator, declared in the <iostreamheader, does that.
</OT>

But, to be blunt, so what? That's one possible definition of a
"word". If it works for your purposes, that's great. But there is
nothing unique or definitive about the way this particular C++
operator does things; there are many other possible definitions, most
of them equally valid.

If you want to discuss how to read a "word" as input from stdin, given
a particular definition of "word", that's fine. For any definition
you can state with sufficient clarity, it's possible to implement it
in C. (I expect RH to offer a counterexample shortly.) If you want
to discuss which definition of "word" (or "token", or whatever) is
correct, that's not a C question, nor is it really an answerable
question.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Sep 15 '08 #58

Richard Heathfield

Keith Thompson said:

<snip>

If you want to discuss how to read a "word" as input from stdin, given
a particular definition of "word", that's fine. For any definition
you can state with sufficient clarity, it's possible to implement it
in C. (I expect RH to offer a counterexample shortly.)

I don't know what kind of counterexample you're expecting. I would guess,
however, that any sufficiently clear explanation would probably be
insufficiently accurate for universal or even general use.

If you want
to discuss which definition of "word" (or "token", or whatever) is
correct, that's not a C question, nor is it really an answerable
question.

Isaac Asimov was once in a Q&A session and said something like: "I can
answer any question you ask me, provided you are prepared accept 'I don't
know' as an answer."

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 15 '08 #59

Richard

Richard Heathfield <rj*@see.sig.invalidwrites:

arnuld said:

>>On Mon, 15 Sep 2008 04:50:29 +0000, Richard Heathfield wrote:

>>Even if we could, it would only be *our* definition, not a universal
definition.

Let me give you an example from ordinary English, where whitespace
delimiters are not sufficient:

>>....SNIP....

>>Problem: design an algorithm for removing punctuation from arbitrary
English sentences, *without* removing punctuation that actually belongs
to the word (example: "will-o'-the-wisp" must retain its three hyphens
and its apostrophe).

That means, we will also have a function containing all of the words with
intended hyphens and apostrophes to which we will compare the input
words.

Not necessarily a function, but yes, we would need some kind of dictionary
- and even then, we wouldn't be done, because some French or German or
Spanish or Czech or Polish or Slovakian or Turkish geezer would come along
and say "you call those words? Those aren't words - THESE are words...",
and give you a whole new set of problems.

The lesson here is that there is no single answer that will satisfy
everyone.

>>As Knuth would say: [50]

I don't know what that means .

<sighI know.

I wonder how many people do?

But then 90% of SW Engineers never read Knuth or possibly they tried it
and found it impenetrable. Only in c.l.c is it recommended as a "great
way to learn programming". I still smile when I remember that thread.

So, basically, I don't know what that means either.

Sep 15 '08 #60

arnuld

On Sun, 14 Sep 2008 22:48:39 -0700, Keith Thompson wrote:

..SNIP...

If you want
to discuss which definition of "word" (or "token", or whatever) is
correct, that's not a C question, nor is it really an answerable
question.

okay, that seems a good reply. I mean, we make it topical to C again as I
lost in the confusion a little. so *my* definition of word will be the
same one yo told earlier:

A "word" is a non-empty contiguous sequence of characters other
than space, tab, or newline, preceded or followed either by a
space, tab, or newline or by the start or end of the input.
Now you earlier questions down here:

It would also be good to specify whether the input is a string, a line
of text, or an entire text file.

in the current case, it is a "word" from terminal, the word we just
defined.
so lets code it :)

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 15 '08 #61

arnuld

On Mon, 15 Sep 2008 09:25:22 +0200, Richard wrote:

Please get lost. This was on topic and in a discussion with how best to
approach certain issues. Only a complete idiot like you would try to do
that without considering already researched (partial) solutions.

Though I really appreciate that you supported me and I think you are
disrespecting him by calling him an idiot. Its not what I think of him
when he replied. Chuck said so because he respects clc like all of us.
Though he could have added something like "it is ok for this time but no
C++ next time. you know better", to his reply. If he did not add that it
does not mean anyone should disrespect him. He is looking for the
well-being of clc , like me and I understood his reasoning.

Only trolls should be disrespected by not replying to their posts ;)

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 15 '08 #62

Richard Heathfield

arnuld said:

>
Now since *my* definition of word is done.

Yes. Whitespace-delimited.

Here is the outline of the program:

[...]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void get_words( char** );
void sort_words( char** );
void printf_words( char** );

I don't think this is going to cut it.

You want to sort many words (i.e. more than one), so you will need to store
them, assuming you want an in-memory sort. The natural way to store them
(or at least, the natural way for me to store them) is by allocating a
number of pointers to char (which can be reallocated if it proves to be
insufficiently long), each of which points to a word. The pointers to char
will be stored in a dynamic array, the base element of which will be
pointed to by a char **. For get_words to be able to do the allocation and
any necessary reallocations, you must be able to modify this char **
within the get_words function. For this change to "stick" in the caller,
you will need either to pass a pointer to the char **, or return the char
** from the function. Thus, you will need, at a minimum (at this point),
either this:

char **get_words(void);

or this:

void get_words(char ***);

But you need to know how many words you've captured! So you must either
return the count or pass a pointer to an integer object in which to store
the count. So you'll need either this:

char **get_words(size_t *);

or this:

size_t get_words(char ***);

But what if something goes wrong? You'll need to be able to report an
error. The natural way to do this is via a return value, which means we
can't use that value for either the list or the count, and that leads us
to:

int get_words(char ***, size_t *);

Since they don't need to modify the caller's status, sort_words and
print_words can be of type int(char **, size_t).

<snip>

SOLUTION: pda is an array of pointers, where pointers are pointing to
different words input by the user ( which are in fact arrays of
characters terminated by null, which means they are string literals of C,
which means it is still inherently confusing to me )

Not string literals - just strings.

<snip>

3rd, we do have an idea on
the maximum length of the word.

The longest one you find. That's why you use dynamic allocation - it means
you can fit the longest word you find without having to worry about
wasting space catering for longer words.

<snip>

I will limit the longets
words to what we call "Longest word in Shakespeare's work" whihc is 27
characters long, hence limiting the array size to be used to store the
words to 28.

good idea ?

Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set it
at a million or so, and treat any string longer than that as a reportable
error). With dynamic allocation, you don't /need/ to set a limit; you
simply allocate as you go, and reallocate if necessary.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 15 '08 #63

arnuld

On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:

I don't think this is going to cut it.

..SNIP...

But what if something goes wrong? You'll need to be able to report an
error. The natural way to do this is via a return value, which means we
can't use that value for either the list or the count, and that leads us
to:

int get_words(char ***, size_t *);

why *** , 3 levels of indirection ? when we pass an array of characters
as an argument to a function, it becomes a pointer, single * . Hence when
we will pass an array of pointers, it will become **.

Not string literals - just strings.

string literal, string and string constant aren't 3 names for a single
thing ?

Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set
it at a million or so, and treat any string longer than that as a
reportable error). With dynamic allocation, you don't /need/ to set a
limit; you simply allocate as you go, and reallocate if necessary.

so you want to dynamically allocate both the single word and the array of
words.

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 15 '08 #64

Richard Heathfield

arnuld said:

>On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:

>I don't think this is going to cut it.

>..SNIP...

>But what if something goes wrong? You'll need to be able to report an
error. The natural way to do this is via a return value, which means we
can't use that value for either the list or the count, and that leads us
to:

int get_words(char ***, size_t *);

why *** , 3 levels of indirection ? when we pass an array of characters
as an argument to a function, it becomes a pointer, single * . Hence
when we will pass an array of pointers, it will become **.

Yes, but you're not passing an array of pointers. You're trying to pass a
pointer to a pointer to char - which is fine, but it means that any
changes made to the pointer value within the function (and there *will* be
changes) will be local to that function. That isn't what you want.

>Not string literals - just strings.

string literal, string and string constant aren't 3 names for a single
thing ?

Two of them are two names for a single thing. Although "string literal" is
the formal term for a string literal, people will know what you mean if
you say "string constant". But consider this:

char foo[3];
foo[0] = 'H';
foo[1] = 'i';
foo[2] = '\0';

foo now contains a string, but no string literals are involved.

>Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set
it at a million or so, and treat any string longer than that as a
reportable error). With dynamic allocation, you don't /need/ to set a
limit; you simply allocate as you go, and reallocate if necessary.

so you want to dynamically allocate both the single word and the array of
words.

Yes. I think that's the best approach.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 15 '08 #65

James Kuyper

arnuld wrote:
....

I think std::string in C++ defines what exactly *definition* of word is.

Where?

Look at my code and see how std::string works and perhaps we can settle on
some common and standard meaning word.

The C++ standard does not provide a name to describe what it is that
operator>extracts into a string; but the most generally used term for
that kind of thing is "token", not "word".

The std::string operator>overload reads in delimited tokens. By
default, the set of delimiters is the set of characters that are
considered to be spacing characters under the currently imbued locale.
This default can be overridden.

Sep 15 '08 #66

arnuld

On Mon, 15 Sep 2008 09:58:29 +0000, Richard Heathfield wrote:

>arnuld said:
why *** , 3 levels of indirection ? when we pass an array of characters
as an argument to a function, it becomes a pointer, single * . Hence
when we will pass an array of pointers, it will become **.

Yes, but you're not passing an array of pointers. You're trying to pass a
pointer to a pointer to char - which is fine, but it means that any
changes made to the pointer value within the function (and there *will* be
changes) will be local to that function. That isn't what you want.

I don't get it to be true. You can never pass an array as value,
arrays are *always* passed by reference. It means when I
pass the name of an array of characters to a function as an argument, then
any changes made to the array will be made to the original array because
when you pass an array to a function as an argument, it will be changed to
a pointer to its first element:
char arrc[3] = { 'a', 'z', '\0'};
char* pc;

pc = arrc;

some_function( arrc );
some_function( pc );

both calls are same, right ?
Now when we will pass an array of pointers to some function, then it will
be converted as pointer to its first element ( which in fact is already a
pointer) hence it will be passed as pointer to pointer to char and with
that we can modify the original elements:
#include <stdio.h>
#include <ctype.h>

enum { ARRSIZE = 2 };

void edit_first_element_arrp( char** ppc );
int main( void )
{
char* p1;
char* p2;
char** p_arrp;
char* arrp[ARRSIZE] = { 0 };

p1 = p2 = NULL;

arrp[0] = p1;
arrp[1] = p2;

p_arrp = arrp;

edit_first_element_arrp( p_arrp );

/* pointer has moved, so take it to the original position */
p_arrp = arrp;

printf("arrp[0] = %c\n", **p_arrp++);
printf("arrp[1] = %c\n", **p_arrp);

return 0;
}

void edit_first_element_arrp( char** ppc )
{
int idx;

for( idx = 0; idx != ARRSIZE; ++idx )
{
if( ! (idx) )
{
**ppc++ = 'Z';
}
}
}
Hence we can change the values of p1 and p2 pointing to. but this function
Segfaults :(

Two of them are two names for a single thing. Although "string literal"
is the formal term for a string literal, people will know what you mean
if you say "string constant". But consider this:

char foo[3];
foo[0] = 'H';
foo[1] = 'i';
foo[2] = '\0';

foo now contains a string, but no string literals are involved.

So what is a string literal ?

Yes. I think that's the best approach.

okay, first I will try to test the dynamic version of get_single_word
function. which will just make a single word out of some input characters.

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 15 '08 #67

James Kuyper

arnuld wrote:

>On Fri, 12 Sep 2008 11:05:07 +0000, James Kuyper wrote:

>Could you identify the std::string feature that implements this? I
couldn't find any use of the word "word" anywhere in section 21 of the
C++ standard, which describes std::string.

I think we have to look at the source code of std::string library and see
how it is implemented. I am sure it is done using C way, arrays and
pointers ;)

No, that will only tell you what std::string actually does. It will not
tell you what the meaning of the word "word" is. For that, you have to
search the relevant documentation, the C++ standard - and that
documentation never uses the word "word" to describe what std::string does.

Sep 15 '08 #68

Ben Bacarisse

arnuld <su*****@invalid.addresswrites:

>On Mon, 15 Sep 2008 09:58:29 +0000, Richard Heathfield wrote:

>>arnuld said:
why *** , 3 levels of indirection ? when we pass an array of characters
as an argument to a function, it becomes a pointer, single * . Hence
when we will pass an array of pointers, it will become **.

>Yes, but you're not passing an array of pointers. You're trying to pass a
pointer to a pointer to char - which is fine, but it means that any
changes made to the pointer value within the function (and there *will* be
changes) will be local to that function. That isn't what you want.

I don't get it to be true. You can never pass an array as value,
arrays are *always* passed by reference. It means when I
pass the name of an array of characters to a function as an argument, then
any changes made to the array will be made to the original array because
when you pass an array to a function as an argument, it will be changed to
a pointer to its first element:

Richard is taking about the pointer to the whole array. A function
that takes: void get_word(char **words); can change words[32] to point
to some new string just found. It can change words[32][0] to be 'x',
but it can't change words itself. Well, it can, but the effect will
be lost when the function returns.

The most important change you need to make is that you will have to
realloc the space for the char * array. This is, of course, a char
**, but if the function is to change a char ** this is outside and
passed in, that parameter must be a char ***.

char arrc[3] = { 'a', 'z', '\0'};
char* pc;

pc = arrc;

some_function( arrc );
some_function( pc );

both calls are same, right ?

Yes, but some_function can't make pc point to a bigger array if
needed. pc will point to the same place after the call.

Now when we will pass an array of pointers to some function, then it will
be converted as pointer to its first element ( which in fact is already a
pointer) hence it will be passed as pointer to pointer to char and with
that we can modify the original elements:
#include <stdio.h>
#include <ctype.h>

enum { ARRSIZE = 2 };

void edit_first_element_arrp( char** ppc );
int main( void )
{
char* p1;
char* p2;
char** p_arrp;
char* arrp[ARRSIZE] = { 0 };

p1 = p2 = NULL;

arrp[0] = p1;
arrp[1] = p2;

All these last three lines make no changes. Both elements of arrp are
already NULL.

p_arrp = arrp;

edit_first_element_arrp( p_arrp );

/* pointer has moved, so take it to the original position */
p_arrp = arrp;

No. p_arrp can't be change by the call. This is a key thin about C
and applied to all types:

void f(int x);
...
int x = 42;
f(x);

x is guaranteed to be unchanged here. The same applies it x is a
pointer or a pointer to a pointer or a pointer to a pointer to a
pointer or...

printf("arrp[0] = %c\n", **p_arrp++);
printf("arrp[1] = %c\n", **p_arrp);

return 0;
}

void edit_first_element_arrp( char** ppc )
{
int idx;

for( idx = 0; idx != ARRSIZE; ++idx )
{
if( ! (idx) )
{
**ppc++ = 'Z';

*ppc is NULL -- you it to be NULL before the call. You can write any
value into **ppc.

}
}
}
Hence we can change the values of p1 and p2 pointing to. but this function
Segfaults :(

See above.

>Two of them are two names for a single thing. Although "string literal"
is the formal term for a string literal, people will know what you mean
if you say "string constant". But consider this:

char foo[3];
foo[0] = 'H';
foo[1] = 'i';
foo[2] = '\0';

foo now contains a string, but no string literals are involved.

So what is a string literal ?

It is a sequence of characters (and escaped chracters) between ""s.
I.e. it is there, literally, in your program's text.

--
Ben.

Sep 15 '08 #69

arnuld

On Mon, 15 Sep 2008 13:15:23 +0100, Ben Bacarisse wrote:

>arnuld <su*****@invalid.addresswrites:

Richard is taking about the pointer to the whole array.

pointer to the whole array ? char* is a pointer to char, int** is a
pointer to pointer to int. How you get pointer to array, I mean what type
it is?

A function
that takes: void get_word(char **words); can change words[32] to point
to some new string just found.

Right. And words++ will take us to the 2nd element of the array.

It can change words[32][0] to be 'x',
but it can't change words itself. Well, it can, but the effect will
be lost when the function returns.

Now here is the problem where my understanding about pointers and arrays
blows away:

get_word( char* words[3] )

so we can change where words[0], [1] and [2] point because array will be
converted to pointer to first element and pointer *always* changes the
original element.

>both calls are same, right ?

Yes, but some_function can't make pc point to a bigger array if needed.
pc will point to the same place after the call.

yes, it means I can understand arrays and pointers :)

> arrp[0] = p1;
arrp[1] = p2;

All these last three lines make no changes. Both elements of arrp are
already NULL.

There is difference. First array had NULL elements. Now arrays has
pointers which point to NULL. There is a difference.

No. p_arrp can't be change by the call. This is a key thin about C and
applied to all types:

void f(int x);
...
int x = 42;
f(x);

x is guaranteed to be unchanged here. The same applies it x is a
pointer or a pointer to a pointer or a pointer to a pointer to a pointer
or...

That I know, x is a variable in the example and variables are passed as
value. Pointers and arrays are passed as references, hence we can change
the original elements.

> if( ! (idx) )
{
**ppc++ = 'Z';

*ppc is NULL -- you it to be NULL before the call. You can write any
value into **ppc.

then why that values does not appear ?

It is a sequence of characters (and escaped chracters) between ""s. I.e.
it is there, literally, in your program's text.

I got it. What we pass to printf() is a string literal.

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 15 '08 #70

Andrew Poelstra

On 2008-09-15, arnuld <su*****@invalid.addresswrote:

>On Mon, 15 Sep 2008 04:50:29 +0000, Richard Heathfield wrote:

>As Knuth would say: [50]

I don't know what that means .

In the series /The Art of Computer Programming/ by Donald
Knuth, which is probably the greatest book on mathematical
computing ever written, problems are given at the end of
each section with a numerical code indicating their difficulty.

A code of [01], for example, you should be able to answer
in your head without pausing. A code of [50] means that,
if you solve the problem, you will have been the first
in the history of mathematics to do so.

The point is that I highly recommend you pick up a copy of
at least the first three volumes of this work, and when
you are able, read though them all.

--
Andrew Poelstra ap*******@wpsoftware.com
To email me, use the above email addresss with .com set to .net

Sep 15 '08 #71

Ben Bacarisse

arnuld <su*****@invalid.addresswrites:

>On Mon, 15 Sep 2008 13:15:23 +0100, Ben Bacarisse wrote:

>>arnuld <su*****@invalid.addresswrites:

>Richard is taking about the pointer to the whole array.

pointer to the whole array ? char* is a pointer to char, int** is a
pointer to pointer to int. How you get pointer to array, I mean what type
it is?

I was being a bit vague. Lets leave actual array pointers out of
this. I mean that Richard was talking about changing the char ** as
seen from the calling function. The thing you are intending to pass,
a char **, is in some sense a pointer to the whole array: from it all
of the array's data is accessible. The trouble is you can can't
change this char ** inside the function -- not in a way that has any
effect outside. All you can do is change the various things it points
to.

If a function needs to change an int, you pass an int *. If it needs
to change int *, you pass an int **. If it needs to change and int **
you must pass an int ***.

> A function
that takes: void get_word(char **words); can change words[32] to point
to some new string just found.

Right. And words++ will take us to the 2nd element of the array.

Right. With no visible effect outside. Just as:

void f(int x)
{
x++; /* changes x but has no effect on anything passed */
}

>It can change words[32][0] to be 'x',
but it can't change words itself. Well, it can, but the effect will
be lost when the function returns.

Now here is the problem where my understanding about pointers and arrays
blows away:

get_word( char* words[3] )

(First, the declaration is confusing because the 3 has no effect.
Pretend you wrote get_word(char **words);).

so we can change where words[0], [1] and [2] point because array will be
converted to pointer to first element and pointer *always* changes the
original element.

Absolutely. Now, having set words[0], words[1] and words[2] what
happens when you need to set sets words[3]. You can't. You need to
realloc some more space (always assuming that this is how the function
is supposed to work). That means changing words:

char **new_space = realloc(words, new_size * sizeof *new_space);
if (new_space) {
/* set up new space with all the right pointer in it... */
words = new_space;
}

Now what? words has more space and you can set words[3], but the
calling function will never see it. The calling function will still
have the old vale of that is passed (we can't even say what it is
called since it is just a pointer value) and, worse, that pointer now
points to storage invalidated by the realloc call.

>>both calls are same, right ?

>Yes, but some_function can't make pc point to a bigger array if needed.
pc will point to the same place after the call.

yes, it means I can understand arrays and pointers :)

>> arrp[0] = p1;
arrp[1] = p2;

>All these last three lines make no changes. Both elements of arrp are
already NULL.

There is difference. First array had NULL elements. Now arrays has
pointers which point to NULL. There is a difference.

No. I don't know how to explain this because I can't see the source
of your confusion. Writing:

char* arrp[ARRSIZE] = { 0 };

p1 = p2 = NULL;

arrp[0] = p1;
arrp[1] = p2;

as you did, is just like writing:

int arr[ARRSIZE] = { 42, 42 };

i1 = i2 = 42;
arr[0] = i1;
arr[1] = i2;

All the elements were 42 to start with and the are 42 after the
assignments. All I did was change the type. Everything is an int
rather than a char *.

>No. p_arrp can't be change by the call. This is a key thin about C and
applied to all types:

void f(int x);
...
int x = 42;
f(x);

x is guaranteed to be unchanged here. The same applies it x is a
pointer or a pointer to a pointer or a pointer to a pointer to a pointer
or...

That I know, x is a variable in the example and variables are passed as
value. Pointers and arrays are passed as references, hence we can change
the original elements.

Excellent! It words the same with a pointer -- provided you think
about the value of the pointer itself

>> if( ! (idx) )
{
**ppc++ = 'Z';

>*ppc is NULL -- you it to be NULL before the call. You can write any
value into **ppc.

then why that values does not appear ?

Typo! I meant you *can't* write any value into **ppc! Sorry. There
are two typos, I now see. It should have read: "*ppc is NULL -- you
set it to be NULL before the call. You can't write any value into
**ppc."

--
Ben.

Sep 15 '08 #72

James Kuyper

arnuld wrote:
....

I won't address most of your questions, because I'm short of time and
the answers are complicated; I'll let Richard or Ben take care of that.
I'll just address one thing where the answer is simple:

>It is a sequence of characters (and escaped chracters) between ""s. I.e.
it is there, literally, in your program's text.

I got it. What we pass to printf() is a string literal.

The format string passed to printf is often a string literal; the other
arguments can be string literals, but often aren't. However, it's quite
feasible to call printf() without using any string literals.

The following code is simplified for purpose of exposition by failing to
checking for the validity, or even the presence, of command line
arguments in any way. This is NOT recommended.

#include <stdio.h>
int main(int argc, char *argv[])
{
printf(argv[1], argv[2]);
return 0;
}
What is passed to printf in that case is two pointers to char. No string
literals are involved in any way.

Sep 15 '08 #73

Keith Thompson

arnuld <su*****@invalid.addresswrites:

>On Mon, 15 Sep 2008 09:25:22 +0200, Richard wrote:
Please get lost. This was on topic and in a discussion with how best to
approach certain issues. Only a complete idiot like you would try to do
that without considering already researched (partial) solutions.

Though I really appreciate that you supported me and I think you are
disrespecting him by calling him an idiot. Its not what I think of him
when he replied. Chuck said so because he respects clc like all of us.
Though he could have added something like "it is ok for this time but no
C++ next time. you know better", to his reply. If he did not add that it
does not mean anyone should disrespect him. He is looking for the
well-being of clc , like me and I understood his reasoning.

Only trolls should be disrespected by not replying to their posts ;)

Richard no-last-name has made a hobby out of insulting Chuck Falconer
at every opportunity, even dragging his name into discussions in which
Chuck has not participated.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Sep 15 '08 #74

CBFalconer

Keith Thompson wrote:

arnuld <su*****@invalid.addresswrites:

.... snip ...

>
>Only trolls should be disrespected by not replying to their posts ;)

Richard no-last-name has made a hobby out of insulting Chuck Falconer
at every opportunity, even dragging his name into discussions in which
Chuck has not participated.

It is totally pointless, since I have Richard the un-named PLONKed,
and I never see his silly diatribes.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Sep 15 '08 #75

CBFalconer

James Kuyper wrote:

>

.... snip ...

>
The only way he knows how to clearly describe what he wants his
code to do is by providing a C++ example; this has been made
abundantly clear by his failed attempts to clearly describe it in
English. However, the code he wants to write should be in C.

If he were to post this same question to comp.lang.c++, and there
were a C++BFalconer on comp.lang.c++, C++BFalconer would certainly
respond by saying that this C question is off-topic in
comp.lang.c++. Should arnuld then simply remain silent about his
question?

I disagree. If he wants to use a C++ algorithm as illustration he
should translate that algorithm to C. In fact, a good example
would be a lexer for a C compiler.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Sep 15 '08 #76

CBFalconer

arnuld wrote:

>

.... snip ...

>
okay, that seems a good reply. I mean, we make it topical to C
again as I lost in the confusion a little. so *my* definition of
word will be the same one yo told earlier:

A "word" is a non-empty contiguous sequence of characters other
than space, tab, or newline, preceded or followed either by a
space, tab, or newline or by the start or end of the input.

So any sequence of control chars, such as '\16', '\17' can go in a
word? Just illustrating the difficulties. For examples
identifiers in C start with 'a'..'z', 'A'..'Z', '_', and continue
with the same plus '0'..'9'. This assumes (not valid for C) that
'a'..'z' are contiguous, as are 'A'..'Z'. When the word has been
parsed it has to be checked against a (limited) list of reserved
words.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Sep 15 '08 #77

jameskuyper

CBFalconer wrote:

James Kuyper wrote:

... snip ...

The only way he knows how to clearly describe what he wants his
code to do is by providing a C++ example; this has been made
abundantly clear by his failed attempts to clearly describe it in
English. However, the code he wants to write should be in C.

If he were to post this same question to comp.lang.c++, and there
were a C++BFalconer on comp.lang.c++, C++BFalconer would certainly
respond by saying that this C question is off-topic in
comp.lang.c++. Should arnuld then simply remain silent about his
question?

I disagree. If he wants to use a C++ algorithm as illustration he
should translate that algorithm to C. In fact, a good example
would be a lexer for a C compiler.

His question was basically about how to translate the C++ algorithm to
C. So what you're saying is that he must answer his own question
before he can ask it here? I'm curious, where do you think he should
go to get help with the translation, since you've ruled out coming
here for help with it; and C++BFalconer would presumably rule out
going to clc++ for such a question? And when he finally does ask it,
according to you, his question is required to take the form "How do I
translate this algorithm {algorithm already translated into C}, into
C?". That's patently ridiculous.

Sep 15 '08 #78

Keith Thompson

CBFalconer <cb********@yahoo.comwrites:

arnuld wrote:
... snip ...
>>
okay, that seems a good reply. I mean, we make it topical to C
again as I lost in the confusion a little. so *my* definition of
word will be the same one yo told earlier:

A "word" is a non-empty contiguous sequence of characters other
than space, tab, or newline, preceded or followed either by a
space, tab, or newline or by the start or end of the input.

So any sequence of control chars, such as '\16', '\17' can go in a
word? Just illustrating the difficulties. For examples
identifiers in C start with 'a'..'z', 'A'..'Z', '_', and continue
with the same plus '0'..'9'. This assumes (not valid for C) that
'a'..'z' are contiguous, as are 'A'..'Z'. When the word has been
parsed it has to be checked against a (limited) list of reserved
words.

Yes, given the definition above, this string:

"\16 \17"

contains two "words". Are you suggesting that that's a problem?

Obviously a program that's intended to recognize C identifiers would
have to use a different rule. But the OP didn't say anything about C
identifiers, so I'm not sure why you're bringing them up.

Incidentally, on my initial reading of your followup, I thought your
use of the word "contiguous" was meant to be related to the use in
arnuld's definition of "word" (the one I had suggested earlier). In
fact, they're quite different; in the definition of "word" it refers
to the characters being adjacent in the input, not to their numeric
representations. A more careful reading of what you wrote indicates
that you just meant that the notation 'a'..'z' doesn't make sense
unless the representations of those characters are numerically
contiguous. I thought I should point this out in case anyone else is
confused.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Sep 15 '08 #79

jameskuyper

CBFalconer wrote:

arnuld wrote:

... snip ...

okay, that seems a good reply. I mean, we make it topical to C
again as I lost in the confusion a little. so *my* definition of
word will be the same one yo told earlier:

A "word" is a non-empty contiguous sequence of characters other
than space, tab, or newline, preceded or followed either by a
space, tab, or newline or by the start or end of the input.

So any sequence of control chars, such as '\16', '\17' can go in a
word? Just illustrating the difficulties. For examples
identifiers in C start with 'a'..'z', 'A'..'Z', '_', and continue
with the same plus '0'..'9'. ...

If you're going to bother pointing out that

... This assumes (not valid for C) that
'a'..'z' are contiguous, as are 'A'..'Z'.

then you shouldn't be assuming that '\16' and '\17' are control
characters; if we're not assuming ASCII, then they just might
represent ' ' and 'a', respectively.

Identifying what arnuld calls "words" is much simpler than identifying
C identifiers; in fact, I can't quite figure out why you bothered
bringing up the definition of C identifiers. All that arnuld's code
needs to do is check for the delimiting " \t\n" characters. In fact,
since he has said that he wants to mimic the behavior of the C++ code
which he provided as an example, he probably left out out the form-
feed, carriage return, and vertical tab characters only by accident.
If he adds "\f\r\v" to the delimiter list, then the simplest way to
handle the delimiter check is to just call isspace().

Sep 15 '08 #80

Richard Heathfield

CBFalconer said:

James Kuyper wrote:
>>
... snip ...
>>
The only way he knows how to clearly describe what he wants his
code to do is by providing a C++ example; this has been made
abundantly clear by his failed attempts to clearly describe it in
English. However, the code he wants to write should be in C.

If he were to post this same question to comp.lang.c++, and there
were a C++BFalconer on comp.lang.c++, C++BFalconer would certainly
respond by saying that this C question is off-topic in
comp.lang.c++. Should arnuld then simply remain silent about his
question?

I disagree. If he wants to use a C++ algorithm as illustration he
should translate that algorithm to C.

He agrees. How about helping him do it, by answering his C question?

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 15 '08 #81

Antoninus Twink

On 15 Sep 2008 at 19:30, CBFalconer wrote:

Keith Thompson wrote:
>Richard no-last-name has made a hobby out of insulting Chuck Falconer
at every opportunity, even dragging his name into discussions in
which Chuck has not participated.

It is totally pointless, since I have Richard the un-named PLONKed,
and I never see his silly diatribes.

Fortunate, then, that your posts have become so embarrassingly
error-ridden in recent months that another Richard, with a surname we
all know only too well, has started posting similar diatribes that you
surely *do* read.

Sep 15 '08 #82

Kenny McCormack

In article <sl*******************@nospam.invalid>,
Antoninus Twink <no****@nospam.invalidwrote:

>On 15 Sep 2008 at 19:30, CBFalconer wrote:
>Keith Thompson wrote:
>>Richard no-last-name has made a hobby out of insulting Chuck Falconer
at every opportunity, even dragging his name into discussions in
which Chuck has not participated.

It is totally pointless, since I have Richard the un-named PLONKed,
and I never see his silly diatribes.

Fortunate, then, that your posts have become so embarrassingly
error-ridden in recent months that another Richard, with a surname we
all know only too well, has started posting similar diatribes that you
surely *do* read.

Of course, now, KT himself has gotten on the bashing CBF bandwagon.

Good on him!

Sep 15 '08 #83

Old Wolf

On Sep 15, 4:50*pm, Richard Heathfield <r...@see.sig.invalidwrote:

arnuld said:
Look at my code and see how std::string works and perhaps we can settle
on some common and standard meaning word.

Let me give you an example from ordinary English, where whitespace
delimiters are not sufficient:

"What did he say?", said Albert.
"He just said, 'I'll be there', I think", replied the captain.

Now, consider the whitespace-separated tokens:

A bit sidetracked from the original thread, but is
there actually any problem here besides identifying
whether a ' symbol is a quote mark or an apostrophe?

Sep 16 '08 #84

CBFalconer

Old Wolf wrote:

Richard Heathfield <r...@see.sig.invalidwrote:

.... snip ...

>
>Let me give you an example from ordinary English, where
whitespace delimiters are not sufficient:

"What did he say?", said Albert.
"He just said, 'I'll be there', I think", replied the captain.

Now, consider the whitespace-separated tokens:

A bit sidetracked from the original thread, but is
there actually any problem here besides identifying
whether a ' symbol is a quote mark or an apostrophe?

And I gather you consider that a trivial problem? Please describe
your algorithm.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Sep 16 '08 #85

CBFalconer

ja*********@verizon.net wrote:

CBFalconer wrote:
>James Kuyper wrote:
>>>
... snip ...
>>>
The only way he knows how to clearly describe what he wants his
code to do is by providing a C++ example; this has been made
abundantly clear by his failed attempts to clearly describe it in
English. However, the code he wants to write should be in C.

If he were to post this same question to comp.lang.c++, and there
were a C++BFalconer on comp.lang.c++, C++BFalconer would certainly
respond by saying that this C question is off-topic in
comp.lang.c++. Should arnuld then simply remain silent about his
question?

I disagree. If he wants to use a C++ algorithm as illustration he
should translate that algorithm to C. In fact, a good example
would be a lexer for a C compiler.

His question was basically about how to translate the C++ algorithm to
C. So what you're saying is that he must answer his own question
before he can ask it here? I'm curious, where do you think he should
go to get help with the translation, since you've ruled out coming
here for help with it; and C++BFalconer would presumably rule out
going to clc++ for such a question? And when he finally does ask it,
according to you, his question is required to take the form "How do I
translate this algorithm {algorithm already translated into C}, into
C?". That's patently ridiculous.

You certainly make a good point.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Sep 16 '08 #86

CBFalconer

Keith Thompson wrote:

CBFalconer <cb********@yahoo.comwrites:
>arnuld wrote:

... snip ...

>>okay, that seems a good reply. I mean, we make it topical to C
again as I lost in the confusion a little. so *my* definition of
word will be the same one yo told earlier:

A "word" is a non-empty contiguous sequence of characters other
than space, tab, or newline, preceded or followed either by a
space, tab, or newline or by the start or end of the input.

So any sequence of control chars, such as '\16', '\17' can go in a
word? Just illustrating the difficulties. For examples
identifiers in C start with 'a'..'z', 'A'..'Z', '_', and continue
with the same plus '0'..'9'. This assumes (not valid for C) that
'a'..'z' are contiguous, as are 'A'..'Z'. When the word has been
parsed it has to be checked against a (limited) list of reserved
words.

Yes, given the definition above, this string:

"\16 \17"

contains two "words". Are you suggesting that that's a problem?

I didn't specify a string. I meant those characters contiguous
(i.e. one strictly following the other) in the input stream. The
detection I specified above can be done with one char look ahead.
The presence (and necessity) of such a look ahead scheme may not be
obvious to the casual reader. In C it revolves around the ungetc()
function.

>
Obviously a program that's intended to recognize C identifiers would
have to use a different rule. But the OP didn't say anything about C
identifiers, so I'm not sure why you're bringing them up.

Incidentally, on my initial reading of your followup, I thought your
use of the word "contiguous" was meant to be related to the use in
arnuld's definition of "word" (the one I had suggested earlier). In
fact, they're quite different; in the definition of "word" it refers
to the characters being adjacent in the input, not to their numeric
representations. A more careful reading of what you wrote indicates
that you just meant that the notation 'a'..'z' doesn't make sense
unless the representations of those characters are numerically
contiguous. I thought I should point this out in case anyone else is
confused.

Right. I should have specified 'the values of the chars are
contiguous'. The point being that ASCII works fine, but EBCDIC
doesn't. The C lexer will be a good example, because what it has
to detect is well defined.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Sep 16 '08 #87

arnuld

On Mon, 15 Sep 2008 14:43:36 +0100, Ben Bacarisse wrote:

I was being a bit vague. Lets leave actual array pointers out of
this. I mean that Richard was talking about changing the char ** as
seen from the calling function. The thing you are intending to pass,
a char **, is in some sense a pointer to the whole array: from it all
of the array's data is accessible. The trouble is you can can't
change this char ** inside the function -- not in a way that has any
effect outside. All you can do is change the various things it points
to.

...SNIP....

If a function needs to change an int, you pass an int *. If it needs
to change int *, you pass an int **. If it needs to change and int **
you must pass an int ***.

... SNIP....

Typo! I meant you *can't* write any value into **ppc! Sorry. There
are two typos, I now see. It should have read: "*ppc is NULL -- you
set it to be NULL before the call. You can't write any value into
**ppc."

see my new post titled "pointers passed by copying ?"

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 16 '08 #88

Richard Heathfield

Old Wolf said:

On Sep 15, 4:50 pm, Richard Heathfield <r...@see.sig.invalidwrote:
>arnuld said:
Look at my code and see how std::string works and perhaps we can
settle on some common and standard meaning word.

Let me give you an example from ordinary English, where whitespace
delimiters are not sufficient:

"What did he say?", said Albert.
"He just said, 'I'll be there', I think", replied the captain.

Now, consider the whitespace-separated tokens:

A bit sidetracked from the original thread, but is
there actually any problem here besides identifying
whether a ' symbol is a quote mark or an apostrophe?

I think it's about here that I like to pretend I'm from Missouri.

Show me.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 16 '08 #89

Old Wolf

On Sep 16, 3:43*pm, CBFalconer <cbfalco...@yahoo.comwrote:

Old Wolf wrote:
Richard Heathfield <r...@see.sig.invalidwrote:

"He just said, 'I'll be there', I think", replied the captain.

A bit sidetracked from the original thread, but is
there actually any problem here besides identifying
whether a ' symbol is a quote mark or an apostrophe?

And I gather you consider that a trivial problem? *Please describe
your algorithm.

Not at all, I was just checking that there
wasn't some other problem besides this one,
that I hadn't seen.

Sep 16 '08 #90

Richard Heathfield

Old Wolf said:

On Sep 16, 3:43 pm, CBFalconer <cbfalco...@yahoo.comwrote:
>Old Wolf wrote:
Richard Heathfield <r...@see.sig.invalidwrote:

>"He just said, 'I'll be there', I think", replied the captain.

A bit sidetracked from the original thread, but is
there actually any problem here besides identifying
whether a ' symbol is a quote mark or an apostrophe?

And I gather you consider that a trivial problem? Please describe
your algorithm.

Not at all, I was just checking that there
wasn't some other problem besides this one,
that I hadn't seen.

Hyphens are another issue: "will-o'-the-wisp" illustrates where both the
hyphen and the apostrophe are part of the word, but there are situ-
ations where the hyphen (and newline) are not part of the word, just as
there are situations where 'apostrophes' are not part of the word.

Then there's the whole issue of "what is an alphabetic character"? If we
simply say A-Za-z, we exclude a vast range of words from languages such as
French, German, Spanish, Polish, and Russian. I'm not saying we shouldn't
do that, but we should be aware that the decision is costly in terms of
internationalisation.

Is 'C++' a word? How about 'G#m'? You might or might not consider that to
be a word, but a musician might. And yet they may have a very different
opinion about 'H#m'.

What about numbers? Is 42 a word? How about 3Com?

Is the copyright symbol a word? What about the trademark and registered
trademark symbols? Can they be part of a word? Consider, for example,
Microsoft<sup>(R)</sup>.

How about full stops (or 'periods' as some people call them)? Consider:
"U.S.A.", "B.B.C.", "etc.", etc.

What about &? Is that a word?

To any one of these questions, you may say, "yes, that's allowable as part
of a word", or you may say, "no, it's not allowable". But your decision
may well differ from someone else's decision.

And having decided, how do you design your algorithm so that it accepts
"fo'c'sle" as one word rather than three? A dictionary? If you're going to
do /that/, the algorithm is indeed trivial (modulo bugs):

1. start with s = "" and an empty word list
2. c = getch
3. if EOF continue from 8.
4. s += c
5. if s in dictionary
continue from 2.
6. else
s -= c.
if s != ""
add s to word list
s = c
7. continue from 2.
8. if s != ""
add s to word list
9. stop

but now you have to list in your dictionary every single character
combination that you consider to be a word. Big dictionary. (For a start,
every word will need at least three entries: "word", "Word", "WORD".)

The dictionary approach is clumsy in the extreme, and the algorithmic
approach gets more and more difficult as you get pickier and pickier about
what does and what does not constitute a word.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 16 '08 #91

Old Wolf

On Sep 17, 10:02*am, Richard Heathfield <r...@see.sig.invalidwrote:

but now you have to list in your dictionary every single character
combination that you consider to be a word. Big dictionary. (For a start,
every word will need at least three entries: "word", "Word", "WORD".)

The dictionary approach is clumsy in the extreme, and the algorithmic
approach gets more and more difficult as you get pickier and pickier about
what does and what does not constitute a word.

Surely there is no approach other than using
a sophisticated dictionary. For example:

'Tis the season to be playin'

there is no rule to deduce whether we have
quote marks or apostrophes, besides knowing
that 'Tis is a word.

The dictionary can includes rules such as
the fact that if "abcd" is a word, then
so is "Abcd"; it can know that acronyms
can be written with periods, and so on.

Now where it gets harder is if you have to
accept text from people who make spelling
mistakes and typoes :)

Sep 17 '08 #92

Richard Heathfield

Old Wolf said:

On Sep 17, 10:02 am, Richard Heathfield <r...@see.sig.invalidwrote:
>but now you have to list in your dictionary every single character
combination that you consider to be a word. Big dictionary. (For a
start, every word will need at least three entries: "word", "Word",
"WORD".)

The dictionary approach is clumsy in the extreme, and the algorithmic
approach gets more and more difficult as you get pickier and pickier
about what does and what does not constitute a word.

Surely there is no approach other than using
a sophisticated dictionary.

Yes, there is. There is the "good enough for Professor Jenkins[1]"
approach, in which we define "word" as non-empty contiguous sequence of
non-whitespace characters delimited on the left by SOF or whitespace and
on the right by EOF or whitespace.

This is not only good enough for Professor Jenkins[1] but frequently good
enough in the Real World, too.

Not that the Real World has any bearing, but I just thought I'd mention it.

[1] cf Gary Larson (the one with the duck)

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 17 '08 #93

arnuld

On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:

...SNIP...

But what if something goes wrong? You'll need to be able to report an
error. The natural way to do this is via a return value, which means we
can't use that value for either the list or the count, and that leads us
to:

what we will do with that return value ? If something wrong occurs I can
simply exit the program telling the user that he did some thing stupid and
he is responsible for that.

int get_words(char ***, size_t *);

Since they don't need to modify the caller's status, sort_words and
print_words can be of type int(char **, size_t).

I think there is qsort in std. lib. , hence we can use that but I don't
know whether it modifies the original array or not.

Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set it
at a million or so, and treat any string longer than that as a reportable
error). With dynamic allocation, you don't /need/ to set a limit; you
simply allocate as you go, and reallocate if necessary.

okay, I will write the program in parts. First we will write a simple
program that will ask the user to input and we will store that word
dynamically using calloc in some array. It will be called get_single_word
and it will form the basis of get_words function which will store all
words in an array. get_single_word returns an int because I want to use
get_single_word in get_words like this:

while( get_single_word )
{
/* code for get_words */
}
Here is my code for get_single_word. PROBLEM: it does not print anything
I entered:
/* a program to get a single word from stdin */
#include <stdio.h>
#include <stdlib.h>

enum { AVERAGE_SIZE = 28 };
int get_single_word( char* );

int main( void )
{
char* pw; /* pw means pointer to word */
get_single_word( pw );

printf("word you entered is: %s\n", pw);

return 0;
}

int get_single_word( char* pc )
{
int idx;
int ch;
char *pc_begin;

pc = calloc(AVERAGE_SIZE-1, sizeof(char));
pc_begin = pc;

if( (! pc) )
{
perror("can not allocate memory, sorry babe!");
return 1;
}

for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++pc )
{
if( AVERAGE_SIZE == idx )
{
/* use realloc here which I have no idea how to write */
}

*pc = ch;
}

*++pc = '\0';
free(pc_begin);

return 0;
}

=================== OUTPUT ==================
[arnuld@dune ztest]$ gcc -ansi -pedantic -Wall -Wextra test.c
[arnuld@dune ztest]$ ./a.out
like
word you entered is:
[arnuld@dune ztest]$

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 17 '08 #94

arnuld

On Wed, 17 Sep 2008 10:07:59 +0500, arnuld wrote:

.... SNIP...

Here is my code for get_single_word. PROBLEM: it does not print anything
I entered:

.... SNIP...

I have even tried using pointer to pointer but that still leaves me with
the same problem:
int main( void )
{
char* pw; /* pw means pointer to word */
get_single_word( &pw );

printf("word you entered is: %s\n", pw);

return 0;
}

int get_single_word( char** pc )
{
int idx;
int ch;
char *pc_begin;

*pc = calloc(AVERAGE_SIZE-1, sizeof(char));
pc_begin = *pc;

if( (! *pc) )
{
perror("can not allocate memory, sorry babe!");
return 1;
}

for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++*pc )
{
if( AVERAGE_SIZE == idx )
{
/* use realloc here which I have no idea how to write */
}

**pc = ch;
}

*++pc = '\0';
free(pc_begin);

return 0;
}

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 17 '08 #95

Ron Ford

On Tue, 16 Sep 2008 06:28:30 +0000, Richard Heathfield posted:

Old Wolf said:

>On Sep 15, 4:50 pm, Richard Heathfield <r...@see.sig.invalidwrote:
>>arnuld said:
Look at my code and see how std::string works and perhaps we can
settle on some common and standard meaning word.

Let me give you an example from ordinary English, where whitespace
delimiters are not sufficient:

"What did he say?", said Albert.
"He just said, 'I'll be there', I think", replied the captain.

Now, consider the whitespace-separated tokens:

A bit sidetracked from the original thread, but is
there actually any problem here besides identifying
whether a ' symbol is a quote mark or an apostrophe?

I think it's about here that I like to pretend I'm from Missouri.

Show me.

As it polls redder with the Palin nomination, Huck sighed, 'Ashcroft
sucks."
--
I believe that all government is evil, and that trying to improve it is
largely a waste of time.
H. L. Mencken

Sep 17 '08 #96

Richard Heathfield

arnuld said:

>On Mon, 15 Sep 2008 09:28:36 +0000, Richard Heathfield wrote:

>...SNIP...

>But what if something goes wrong? You'll need to be able to report an
error. The natural way to do this is via a return value, which means we
can't use that value for either the list or the count, and that leads us
to:

what we will do with that return value ? If something wrong occurs I can
simply exit the program telling the user that he did some thing stupid
and he is responsible for that.

Yes, you could do that, except that (a) it might not be the user's stupid
fault (it may simply be that your machine is low on memory), and (b) there
may be a way to recover. If this is a mere learning exercise and the
learning task is not error recovery, then yes, by all means bomb out.
That's the "student solution" and, like cryptosporidium, is very common.

>int get_words(char ***, size_t *);

Since they don't need to modify the caller's status, sort_words and
print_words can be of type int(char **, size_t).

I think there is qsort in std. lib. , hence we can use that but I don't
know whether it modifies the original array or not.

It does modify the original array (by sorting it, would you believe?), but
it won't modify the *pointer*, the one that indicates the location of the
first element of the array.

>Up to you, but I wouldn't bother setting a limit (or, if I did, I'd set
it at a million or so, and treat any string longer than that as a
reportable error). With dynamic allocation, you don't /need/ to set a
limit; you simply allocate as you go, and reallocate if necessary.

okay, I will write the program in parts. First we will write a simple
program that will ask the user to input and we will store that word
dynamically using calloc in some array. It will be called get_single_word
and it will form the basis of get_words function which will store all
words in an array.

Good. This sounds like functional decomposition - always a good way to
start off.

get_single_word returns an int because I want to use
get_single_word in get_words like this:

while( get_single_word )
{
/* code for get_words */
}

Presumably that's pseudocode, and you intend get_single_word to be a
function call, and the "code for get_words" consists of inserting into an
array the word retrieved by get_single_word(). Yes, that's reasonable.

>

Here is my code for get_single_word. PROBLEM: it does not print anything
I entered:
/* a program to get a single word from stdin */
#include <stdio.h>
#include <stdlib.h>

enum { AVERAGE_SIZE = 28 };
int get_single_word( char* );

int main( void )
{
char* pw; /* pw means pointer to word */
get_single_word( pw );

As the program prepares to call get_single_word, it evaluates pw - but the
value of pw is indeterminate, so evaluating it results in undefined
behaviour. In get_single_word, you intend to modify the pointer (by calloc
and possibly realloc), and that change needs to 'stick' in the caller, so
it's no good just passing the value. You must pass the /address/ of pw,
and make other necessary modifications to the function interface.

This is why, on this occasion, your program didn't output what you expected
it to output.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 17 '08 #97

Richard Heathfield

arnuld said:

I have even tried using pointer to pointer but that still leaves me with
the same problem:

No, it leaves you with a different problem. The symptoms may or may not be
the same, but the problem is different.

>

int main( void )
{
char* pw; /* pw means pointer to word */
get_single_word( &pw );

printf("word you entered is: %s\n", pw);

You need <stdio.hif you wish to call printf.

return 0;
}

int get_single_word( char** pc )
{
int idx;
int ch;
char *pc_begin;

*pc = calloc(AVERAGE_SIZE-1, sizeof(char));

You need <stdlib.hif you wish to call calloc. Also, why nail the call to
the type? This is better:

*pc = calloc(AVERAGE_SIZE - 1, sizeof **pc);

pc_begin = *pc;

if( (! *pc) )
{
perror("can not allocate memory, sorry babe!");
return 1;
}

Okay - although it's better not to embed messages like this in library
functions if you can avoid it.

Don't forget to check that return value in the caller.

for( idx = 0; ( (ch = getchar()) != EOF ); ++idx, ++*pc )

I thought you wanted to stop at whitespace?

Also, it's better to move pc_begin than *pc, if you must move either of
them. Given that you have idx keeping track of things, I see no reason to
modify *pc (and plenty of reasons not to), and no reason for pc_begin to
exist at all. You can simply do (*pc)[idx] = ch;

If you don't like the (), you could keep pc_begin, point it to *pc as you
have done, and just do: pc_begin[idx] = ch; instead. No need to increment
any pointers.

{
if( AVERAGE_SIZE == idx )
{
/* use realloc here which I have no idea how to write */

I'll show you how shortly. In the meantime, let's continue to look at what
you've got.

*++pc = '\0';

pc is char **, so ++pc is char ** (and utterly invalid), and *++pc is char
*, so you're setting a wild pointer to 0. Not good. Could be worse, but
not good.

free(pc_begin);

Why allocate it at all, if you're going to throw it away before you've even
used it?

Here's a better way to do this - still not a great way, but a better way. I
haven't tested it, by the way, but I'd be mildly surprised if it doesn't
work perfectly first time.

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define AVERAGE_SIZE 16

#define GSW_OK 0 /* success */
#define GSW_ENOMEM 1 /* can't allocate buffer - no word fetched */
#define GSW_ENORESIZE 2 /* can't resize buffer - partial word fetched */

int get_single_word( char** pc )
{
int rc = GSW_ENOMEM; /* if we succeed, we'll update the status */
size_t idx = 0;
int ch;
char *pc_begin = NULL;
size_t cursize = AVERAGE_SIZE;
char *new = NULL;

*pc = calloc(cursize, sizeof **pc);
if(*pc != NULL)
{
rc = GSW_OK; /* so far so good */
pc_begin = *pc;

while((ch = getchar()) != EOF && isspace((unsigned char)ch))
{
continue; /* skipping leading whitespace */
}
while(GSW_OK == rc &&
(ch = getchar()) != EOF &&
!isspace((unsigned char)ch))
{
if(cursize == idx + 1)
{
new = realloc(*pc, 2 * cursize * sizeof *new);
if(new == NULL)
{
rc = GSW_ENORESIZE; /* error - couldn't enlarge */
pc_begin[idx] = '\0';
}
else
{
*pc = new;
}
}
}
if(GSW_OK == rc)
{
pc_begin[idx++] = ch;
}
}

if(*pc != NULL)
{
pc_begin[idx] = '\0';
}

return rc;
}
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 17 '08 #98

arnuld

On Wed, 17 Sep 2008 06:29:37 +0000, Richard Heathfield wrote:

Yes, you could do that, except that (a) it might not be the user's stupid
fault (it may simply be that your machine is low on memory), and (b) there
may be a way to recover. If this is a mere learning exercise and the
learning task is not error recovery, then yes, by all means bomb out.
That's the "student solution" and, like cryptosporidium, is very common.

http://en.wikipedia.org/wiki/Cryptosporidium

...aye.... , so lets learn the practical aspects like error-recovery too. I
don't like academic solutions BTW

--
www.lispmachine.wordpress.com
my email is @ the above blog.
Google Groups is Blocked. Reason: Excessive Spamming

Sep 17 '08 #99

Richard Heathfield

arnuld said:

>On Wed, 17 Sep 2008 06:29:37 +0000, Richard Heathfield wrote:

>Yes, you could do that, except that (a) it might not be the user's
stupid fault (it may simply be that your machine is low on memory), and
(b) there may be a way to recover. If this is a mere learning exercise
and the learning task is not error recovery, then yes, by all means bomb
out. That's the "student solution" and, like cryptosporidium, is very
common.

http://en.wikipedia.org/wiki/Cryptosporidium

..aye.... , so lets learn the practical aspects like error-recovery too.
I don't like academic solutions BTW

I have discussed recovering from an allocation failure several times in
this group. In message <3D***************@eton.powernet.co.uk>, for
example, I listed five alternatives to giving up and dying.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Sep 17 '08 #100

Similar topics