getc and ungetc

Bill Cunningham

Would getc and ungetc be the best and most simple what to parse
expressions for a parser?

Bill

Nov 14 '05 #1

Subscribe Reply

2915

Trent Buck

Quoth Bill Cunningham on or about 2004-11-18:

Would getc and ungetc be the best and most simple what to parse
expressions for a parser?

ungetc can't be applied more than once `in a row' (i.e. sequentially).
I suspect that makes for a rather unsuitable function for a parser.

-t

Nov 14 '05 #2

Chris Barts

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bill Cunningham wrote:
| Would getc and ungetc be the best and most simple what to parse
| expressions for a parser?
|

It's usually easier to do like K&R did in "The C Programming Language"
(at the end, when they designed the RPN calculator): Read in a block of
text, and then define your own getchar()/ungetchar() functions to push
and pop characters on and off that buffer. You can read in more text (a
block at a time, for efficiency and convenience) when your getchar()
function tries to read beyond the end of your buffer.

You need to write a bit more code yourself, but it's fairly trivial code
and you can reuse it in any other parser you write.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBnXYsKxatjOtX+j0RAnY8AJ4gpW7DPFF/VgtMv7V2cdfzYINTawCeOcqi
fpvs+q6HPDjLx6sVc1ozUXk=
=A6oi
-----END PGP SIGNATURE-----

Nov 14 '05 #3

Malcolm

"Trent Buck" <NO************@bigpond.com> wrote

Would getc and ungetc be the best and most simple what to parse
expressions for a parser?

ungetc can't be applied more than once `in a row' (i.e. sequentially).
I suspect that makes for a rather unsuitable function for a parser.

It depends on your parser design.
Most simple parsers used for things like computer languages divide the input
stream into token, and then parse from left to right with one token of "look
ahead". So if your tokens are single characters then getc() and ungetc() may
be adequate.
Of course if you have ambitions to build a natural language parser, or even
some simple grammars with unusual characteristics, then this scheme won't
work, and you will need some method of scanning up and down many tokens on
input.

Nov 14 '05 #4

SM Ryan

# Of course if you have ambitions to build a natural language parser, or even
# some simple grammars with unusual characteristics, then this scheme won't
# work, and you will need some method of scanning up and down many tokens on
# input.

Such a parser runs the risk of exponential running time, while a tabular
parser doesn't need to back up and has cubic time worst case. ungetc has
at most marginal usability for some types of lexical scanners.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
There are subtler ways of badgering a witness.

Nov 14 '05 #5

Bill Cunningham

"SM Ryan" <wy*****@tango-sierra-oscar-foxtrot-tango.fake.org> wrote in
message news:10*************@corp.supernews.com...

# Of course if you have ambitions to build a natural language parser, or even # some simple grammars with unusual characteristics, then this scheme won't # work, and you will need some method of scanning up and down many tokens on # input.

Such a parser runs the risk of exponential running time, while a tabular
parser doesn't need to back up and has cubic time worst case. ungetc has
at most marginal usability for some types of lexical scanners.

I have k&r 2. What about redirecting fgetc to stdio and using it insteat
getc?

Bill

Nov 14 '05 #6

Bill Cunningham

"Bill Cunningham" <no****@nspam.net> wrote in message
news:cb***************@newsfe08.lga.highwinds-media.com...

"SM Ryan" <wy*****@tango-sierra-oscar-foxtrot-tango.fake.org> wrote in
message news:10*************@corp.supernews.com...
# Of course if you have ambitions to build a natural language parser, or even
# some simple grammars with unusual characteristics, then this scheme

won't
# work, and you will need some method of scanning up and down many

tokens on
# input.

Such a parser runs the risk of exponential running time, while a tabular
parser doesn't need to back up and has cubic time worst case. ungetc has
at most marginal usability for some types of lexical scanners.

I have k&r 2. What about redirecting fgetc to stdio and using it insteat
getc?

Bill

I've heard of top down recursive parsers.
Bill

Nov 14 '05 #7

Herbert Rosenau

On Thu, 18 Nov 2004 20:21:41 UTC, "Bill Cunningham" <no****@nspam.net>
wrote:

Would getc and ungetc be the best and most simple what to parse
expressions for a parser?

Yes. But be aware of that ungetc can only unget ONE char at a time.

On other hand you can simply by macro or function extend getc and
ungetc to unget more than one char. Your UNGETC() would accept a
number of chars to give back in your GETC() in reverse order. Your
GETC() will give back the chars UNGETC had received before it gets new
chars from the stream itself.

A bit tricky is to ungent a char that you have never gotten - legally
as there is nothing that forbids it and ungenc requires the char that
is to unget. Be sure that you does NOT tries to unget EOF, this won't
work. This can be very useful when you has a long list of keyword
delemiters with same meaning to a long list of similar keywords.

Your parser may convert keywords or keychars into tokens and save the
tokens until all or a part of the input stream is readed and then work
on the generated token, it may work in other ways you thinks it
matches your requirements.

getc() (in conjunktion with ungetc() ) gives you the strongest
possible control over the stream you can ever need. When needed you
can siply count the number of chars, words, lines... readed in as side
effect, reset these counters as needed..... You avoids supervising of
buffers - you does need one.

Build your parser as state mashine and you can reuse the same code
again and again beside the little number of statements you needs to
handle a specific (sub)state. You gets high flexibility as you would
easy extend the functionality of the parser by create a new
(sub)state. Makes maintenance an easy work.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation

Nov 14 '05 #8

Malcolm

"Bill Cunningham" <no****@nspam.net> wrote

I've heard of top down recursive parsers.

What is important in terms of the tokenizer is that they be left-right with
only one token of lookahead. Most practical parsers are in this class.

Unfortunately, using getc / ungetc means that the tokens are constrained to
be single characters. This may be Ok for a very simple application, but if
the tokens are naturally several characters long it will be a nuisance. You
can do it, for instance if you have a mathematical function tan() then you
could build it up from the latters 't' 'a' and 'n' rather than reading it in
as a single token TAN. But it is generally a lot easier to do the lexical
analysis before the parse proper.

Nov 14 '05 #9

Similar topics

2599

Simple fwrite after getc macro question..

by: Andrew Kibler | last post by:

Two sections of code, in the first one fwrite works, in the second one it doesn't (ms VC++) both are identical except in the working one fseek is used twice to set the file pointer, once just...

C / C++

15090

getc() vs. fgetc()

by: William L. Bahn | last post by:

I'm sure this has been asked before, and I have looked in the FAQ, but I'm looking for an explanation for the following: The functions pairs: gets()/fgets() puts()/fputs() printf()/fprintf()...

C / C++

3457

getc can return EOF, but ungetc can't sent it back... why?

by: TTroy | last post by:

Hello C programmers, Can someone tell me why ungetc can't sent back EOF, but it's sister function getc has no trouble sending it to us? For a file, this might not make a difference, but for an...

C / C++

3698

Getc or Getchar is not reading data

by: mailursubbu | last post by:

HI, Below is my program. I compiled it through g++. Now strange thing is, getc is not reading the data instead its printing the previously read data ! Please some one let me know whats wrong. ...

C / C++

4947

scanf(), ungetc() behaviour.

by: Argento | last post by:

I was curious at the start about how ungetc() returns the character to the stream, so i did the following coding. Things work as expected except if I change the scanf("%c",&j) to scanf("%d",&j). I...

C / C++

2389

ungetc

by: av | last post by:

Why is so danger to allow ungetc(EOF, pfile); (for close the imput stream) ?

C / C++

2550

getc/fgets

by: Richard Weeks | last post by:

Below is a fragment from a program that calculates statistics on x,y data. I want the user to be able to predict one or more predicted values of y from x, given the line of best fit. I have a...

C / C++

2058

getc and "large" bytes

by: vippstar | last post by:

Assuming all the values of int are in the range of unsigned char, what happends if getc returns EOF? Is it possible that EOF was the value of the byte read? Does that mean that code aiming for...

C / C++

7218

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

7103

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7370

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

7021

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

5614

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

5035

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

4701

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

3188

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

409

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General