By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,938 Members | 2,357 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,938 IT Pros & Developers. It's quick & easy.

the de-facto way to "parse" input

P: n/a
Hi all,

I am trying my hands at writing a shell for unix. A very rubbish
shell, but nonetheless, I come to a point where I am confused.

I would like to have something like

shellstop xyz

whereupon the command "stop" will take the argument "xyz" and perform
foo action on it. What in your opinion is the best (easiest?) way to
validate the input (perhaps iterate over a "valid commands" table),
and what calls would you use? (getc(), scanf(), a big while loop and
pointer arithmetic, ...)

I hope this question is not too ambiguous

many thanks
kb
Jun 27 '08 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Krumble Bunk wrote:
I am trying my hands at writing a shell for unix. A very rubbish
shell, but nonetheless, I come to a point where I am confused.

I would like to have something like

shellstop xyz

whereupon the command "stop" will take the argument "xyz" and perform
foo action on it. What in your opinion is the best (easiest?) way to
validate the input (perhaps iterate over a "valid commands" table),
and what calls would you use? (getc(), scanf(), a big while loop and
pointer arithmetic, ...)
I'd use `fgets` to read the line, expanding the buffer as necessary,
carve the line up into space-separated chunks (if we're doing a rubbish
shell, we won't worry about quoting ...) and then I can look the first
chunk up in a table.

If we want something less rubbish, I'd write a recursive-descent
parser for commands. That would force me to be explicit about the
grammar I'm using and what my tokens are supposed to be. I'd build
an abstract-syntax tree for the commands; on no account would I
try and execute them while parsing them.

And I'd write unit tests. Lots of unit tests. And get something
working end-to-end as soon as possible. (Because it's very
disheartening spending a day or more writing a Super Duper
Program That Does It All, and then spending a week or more
debugging it until it does /something/, as opposed to writing
the smallest program one can manage that recognisably does
something right. Like, read a command line in, and print out
the tokens, /and do nothing else/.)

And likely throw away the first attempt, as a learning exercise.

--
"I don't make decisions. I'm a bird." /A Fine and Private Place/

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Jun 27 '08 #2

P: n/a
On Jun 11, 2:46 pm, Chris Dollin <chris.dol...@hp.comwrote:
Krumble Bunk wrote:

[.....]

the tokens, /and do nothing else/.)

And likely throw away the first attempt, as a learning exercise.

--
"I don't make decisions. I'm a bird." /A Fine and Private Place/

Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Very good advice - I will investigate using lex/yacc.

thanks

kb
Jun 27 '08 #3

P: n/a
"Krumble Bunk" <kr*********@gmail.comwrote in message news
On Jun 11, 2:46 pm, Chris Dollin <chris.dol...@hp.comwrote:
>Krumble Bunk wrote:


[.....]

>the tokens, /and do nothing else/.)

And likely throw away the first attempt, as a learning exercise.

--
"I don't make decisions. I'm a bird." /A Fine and Private
Place/

Hewlett-Packard Limited Cain Road, Bracknell,
registered no:
registered office: Berks RG12 1HN 690597
England


Very good advice - I will investigate using lex/yacc.
You could also check out MiniBasic, on my website. Essentially writing a
mini-language for a shell is the same as writing a Basic interpreter, except
expressions consist of pipes and globs and redirections more often than
arithmetical operators. Most shells even have their own looping constructs.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Jun 27 '08 #4

P: n/a
On Jun 11, 7:12 pm, Krumble Bunk <krumbleb...@gmail.comwrote:
On Jun 11, 2:46 pm, Chris Dollin <chris.dol...@hp.comwrote:
Krumble Bunk wrote:

[.....]
the tokens, /and do nothing else/.)
And likely throw away the first attempt, as a learning exercise.
--
"I don't make decisions. I'm a bird." /A Fine and Private Place/
Hewlett-Packard Limited Cain Road, Bracknell, registered no:
registered office: Berks RG12 1HN 690597 England

Very good advice - I will investigate using lex/yacc.

thanks

kb
lex is by-large the de-facto way for tokenizing. I believe gcc makes
extensive use of lex/yacc ( or may be flex/bison but that does not
make a hell of a difference )
Jun 27 '08 #5

P: n/a

"Chris Dollin" <ch**********@hp.comwrote in message
news:g2**********@news-pa1.hpl.hp.com...
Krumble Bunk wrote:
>I am trying my hands at writing a shell for unix. A very rubbish
shell, but nonetheless, I come to a point where I am confused.

I would like to have something like

shellstop xyz
If we want something less rubbish, I'd write a recursive-descent
parser for commands. That would force me to be explicit about the
grammar I'm using and what my tokens are supposed to be. I'd build
an abstract-syntax tree for the commands; on no account would I
try and execute them while parsing them.
An AST for a simple command-line interpreter?

How complex would this shell have to be to make this worthwhile?
>(Because it's very disheartening spending a day or more writing
Might be more disheartening to start a huge project that doesn't finish
because it's overspecified.

I would have suggesting starting with something like the following,
replacing the system() call (and perhaps adjusting the parameter) with the
local equivalent. This would require the handler for each 'command' to be a
separate C program, but has the advantage of the parameters being already
separated.

A bit more work and the commands and parameters can be identified and
executed in the same program.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void main()
{
#define llength 1000
char line[llength];
int i,n;

puts("Type exit to exit.");
puts("");

while (1) {

printf("Prompt>");
fflush(stdout);

if (fgets(line,llength,stdin)==NULL) break;

n=strlen(line); /* get rid of troublesome trailing \n */
if (line[n-1]=='\n') line[n-1]=0;

if (strcmp(line,"exit")==0) break;

if (line[0])
system(line);
};

}

--
Bartc
Jun 27 '08 #6

P: n/a
On Jun 13, 7:07 pm, "Bartc" <b...@freeuk.comwrote:
"Chris Dollin" <chris.dol...@hp.comwrote in message

news:g2**********@news-pa1.hpl.hp.com...
Krumble Bunk wrote:
I am trying my hands at writing a shell for unix. A very rubbish
shell, but nonetheless, I come to a point where I am confused.
I would like to have something like
shellstop xyz
If we want something less rubbish, I'd write a recursive-descent
parser for commands. That would force me to be explicit about the
grammar I'm using and what my tokens are supposed to be. I'd build
an abstract-syntax tree for the commands; on no account would I
try and execute them while parsing them.

An AST for a simple command-line interpreter?

How complex would this shell have to be to make this worthwhile?
(Because it's very disheartening spending a day or more writing

Might be more disheartening to start a huge project that doesn't finish
because it's overspecified.

I would have suggesting starting with something like the following,
replacing the system() call (and perhaps adjusting the parameter) with the
local equivalent. This would require the handler for each 'command' to be a
separate C program, but has the advantage of the parameters being already
separated.

A bit more work and the commands and parameters can be identified and
executed in the same program.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void main()
<snip>
Bartc, haven't you been here long enough to remember main returns int?
Jun 27 '08 #7

P: n/a
On Jun 14, 3:41*pm, vipps...@gmail.com wrote:
On Jun 13, 7:07 pm, "Bartc" <b...@freeuk.comwrote:
"Chris Dollin" <chris.dol...@hp.comwrote in message
news:g2**********@news-pa1.hpl.hp.com...
Krumble Bunk wrote:
>I am trying my hands at writing a shell for unix. *A very rubbish
>shell, but nonetheless, I come to a point where I am confused.
>I would like to have something like
>shellstop xyz
If we want something less rubbish, I'd write a recursive-descent
parser for commands. That would force me to be explicit about the
grammar I'm using and what my tokens are supposed to be. I'd build
an abstract-syntax tree for the commands; on no account would I
try and execute them while parsing them.
An AST for a simple command-line interpreter?
How complex would this shell have to be to make this worthwhile?
>(Because it's very disheartening spending a day or more writing
Might be more disheartening to start a huge project that doesn't finish
because it's overspecified.
I would have suggesting starting with something like the following,
replacing the system() call (and perhaps adjusting the parameter) with the
local equivalent. This would require the handler for each 'command' to be a
separate C program, but has the advantage of the parameters being already
separated.
A bit more work and the commands and parameters can be identified and
executed in the same program.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main()

<snip>
Bartc, haven't you been here long enough to remember main returns int?- Hide quoted text -
Yes, but I suspect I didn't write that bit. Probably the remnants of a
copy&paste of someone else's code. Not my fault at all..

--
Bartc
Jun 27 '08 #8

P: n/a
Bartc wrote:
>
"Chris Dollin" <ch**********@hp.comwrote in message
news:g2**********@news-pa1.hpl.hp.com...
>Krumble Bunk wrote:
>>I am trying my hands at writing a shell for unix. A very rubbish
shell, but nonetheless, I come to a point where I am confused.

I would like to have something like

shellstop xyz
>If we want something less rubbish, I'd write a recursive-descent
parser for commands. That would force me to be explicit about the
grammar I'm using and what my tokens are supposed to be. I'd build
an abstract-syntax tree for the commands; on no account would I
try and execute them while parsing them.

An AST for a simple command-line interpreter?
"something less rubbish" allows for something that isn't simple.

ASTs aren't complicated, even in C.
How complex would this shell have to be to make this worthwhile?
Pipes, sequencing, commands. Brackets and built-in commands,
definitely.
>>(Because it's very disheartening spending a day or more writing

Might be more disheartening to start a huge project that doesn't finish
because it's overspecified.
Where did "huge" come from? And "overspecified"?

--
"Tells of trouble and warns of change to come." /Lothlorien/

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Jun 27 '08 #9

P: n/a
On Jun 16, 8:04*am, Chris Dollin <chris.dol...@hp.comwrote:
Bartc wrote:
"Chris Dollin" <chris.dol...@hp.comwrote in message
news:g2**********@news-pa1.hpl.hp.com...
Krumble Bunk wrote:
>I am trying my hands at writing a shell for unix. *A very rubbish
shell, but nonetheless, I come to a point where I am confused.
>I would like to have something like
>shellstop xyz
If we want something less rubbish, I'd write a recursive-descent
parser for commands.
An AST for a simple command-line interpreter?

"something less rubbish" allows for something that isn't simple.

ASTs aren't complicated, even in C.
How complex would this shell have to be to make this worthwhile?

Pipes, sequencing, commands. Brackets and built-in commands,
definitely.
I'm not familiar with unix shells. But I don't remember seeing
anything more complicated than a linear series of commands, filenames,
numbers and switches in Windows' shell. But then, maybe Windows' shell
is rubbish.
>(Because it's very disheartening spending a day or more writing
Might be more disheartening to start a huge project that doesn't finish
because it's overspecified.

Where did "huge" come from? And "overspecified"?
OK not huge. But I associate ASTs with compilers, and that would seem
an overkill for this task.

Perhaps the OP should start by writing the specifications of his/her
syntax, then it might become clearer which approach is best.

--
Bart
Jun 27 '08 #10

This discussion thread is closed

Replies have been disabled for this discussion.