473,782 Members | 2,423 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Writing a parser the right way?

I'm writing a parser for english language. This is a simple function to
identify, what kind of sentence we have. Do you think, this class
wrapping is right to represent the result of the function? Further
parsing then checks isinstance(text , Declarative).

-------------------
class Sentence(str): pass
class Declarative(Sen tence): pass
class Question(Senten ce): pass
class Command(Sentenc e): pass

def identify_senten ce(text):
text = text.strip()
if text[-1] == '.':
return Declarative(tex t)
elif text[-1] == '!':
return Command(text)
elif text[-1] == '?':
return Question(text)
return text
-------------------

At first i just returned the class, then i decided to derive Sentence
from str, so i can insert the text as well.

Sep 21 '05 #1
7 2441
beza1e1 wrote:
I'm writing a parser for english language. This is a simple function to
identify, what kind of sentence we have. Do you think, this class
wrapping is right to represent the result of the function? Further
parsing then checks isinstance(text , Declarative).

-------------------
class Sentence(str): pass
class Declarative(Sen tence): pass
class Question(Senten ce): pass
class Command(Sentenc e): pass


As far as the parser is concerned, making these separate classes is
unnecessary when you could just store the sentence type as a normal
data member of Sentence. So the answer to your question is no, in my
opinion.

However, when you come to actually use the resulting Sentence objects,
perhaps the behaviour is different? If you're looking to use a standard
interface to Sentences but are going to be doing substantially
different processing depending on which sentence type you have, then
yes, this class hierarchy may be useful to you.

--
Ben Sizer

Sep 21 '05 #2
Well, a declarative sentence is essentially subject-predicate-object,
while a question is predicate-subject-object. This is important in
further processing. So perhaps i should code this order into the
classes? I need to think a little bit more about this.

Thanks for your feed for thought! :)

Sep 21 '05 #3
beza1e1 wrote:
Well, a declarative sentence is essentially subject-predicate-object,
while a question is predicate-subject-object. This is important in
further processing. So perhaps i should code this order into the
classes? I need to think a little bit more about this.


A question is subject-predicate-object?

That was unknown by me.

Honestly, if you're trying a general English parser, good luck.
Sep 21 '05 #4
"beza1e1" <an************ *@googlemail.co m> wrote in message
news:11******** **************@ g47g2000cwa.goo glegroups.com.. .
I'm writing a parser for english language. This is a simple function to
identify, what kind of sentence we have. Do you think, this class
wrapping is right to represent the result of the function? Further
parsing then checks isinstance(text , Declarative).

-------------------
class Sentence(str): pass
class Declarative(Sen tence): pass
class Question(Senten ce): pass
class Command(Sentenc e): pass

def identify_senten ce(text):
text = text.strip()
if text[-1] == '.':
return Declarative(tex t)
elif text[-1] == '!':
return Command(text)
elif text[-1] == '?':
return Question(text)
return text
-------------------

At first i just returned the class, then i decided to derive Sentence
from str, so i can insert the text as well.

Andreas -

Are you trying to parse any English sentence, or just a limited form of
them? Parsing *any* English sentence (or question or interjection or
command) is a ***huge*** undertaking - Google for "natural language" and you
will find many efforts (with substantial time and money and manpower
resources) working on this problem. Applications range from automated
language translation to helpdesk automated analysis. I really suggest you
do a bit of research on this topic, just to get an idea of how big this job
is. Here's a Wikipedia link:
http://en.wikipedia.org/wiki/Natural...age_processing

Here are some simple examples, that quickly go beyond
subject-predicate-object:

I drive a truck.
I drive a red truck.
I drive a red truck to work.
I drive a red truck to the shop to work on it.
I drive a red truck to the shop to have some work done on it.
I drive a red truck very fast.
I drive a red truck through a red light.

Then factor in other sentences (past and future tenses, past and future
perfect tenses, figurative metaphors) and parsing general English is a major
job. The favorite test case of the natural language folks is "Time flies
like an arrow," which early auto-translation software converted to "Temporal
insects enjoy a pointed projectile."

On the other hand, if you plan to limit the type and/or content of the
sentences being parsed (such as computer system commands or adventure game
inputs, or descriptions of physical objects), then you can scope out a
reasonable capability by choosing a vocabulary of known verbs and objects,
and avoiding ambiguities (such as "set", as in "I set the set of glasses
next to the TV set," or "lead" as in "Lead me to the store that sells lead
pencils.").

Hope this sheds some light on your task,
-- Paul
Sep 21 '05 #5
Christopher Subich wrote:
beza1e1 wrote:
Well, a declarative sentence is essentially subject-predicate-object,
while a question is predicate-subject-object. This is important in
further processing. So perhaps i should code this order into the
classes? I need to think a little bit more about this.


A question is subject-predicate-object?

That was unknown by me.

Honestly, if you're trying a general English parser, good luck.


I second that. Have you read any of the natural language processing
reasearch in this area? There are a variety of English parsers already
available? Googling for "charniak parser" or "collins parser" should
get you something. I believe Dan Bikel has one too. Those are trained
on Wall Street Journal text. You might also look into Minipar, which is
rule-based and not as WSJ specific.

STeVe
Sep 21 '05 #6
Thanks for the hints. I just found NLTK and MontyLingua.

And yes, it is just adventure game language. This means every tense
except present tense is discarded as "not changing world". Furthermore
the parser will make a lot of assumptions, which are perhaps 90% right,
not perfect:

if word[-2:] == "ly":
return Adverb(word)

Note that uppercase words are identified before, so Willy is parsed
correctly as a noun. On the other hand "silly boy", will not return a
correct result.

Currently it is just a proof-of-concept. Maybe i can integrate a better
parser engine later. The idea is a kind of mud, where you talk correct
sentences instead of "go north". I envision a difference like Diablo to
Pen&Paper. I'd call it more a collaborative story telling game, than a
actual RPG.

I fed it your sentences, Paul. Result:
<['I', 'drive', 'a']> <['red']> <['truck']>
should be:
<['I']> <['drive']> <['a', 'red', 'truck']>

Verbs are the tricky part i think. There is no way to recognice them.
So i will have to get a database ... work to do. ;)

Sep 22 '05 #7
beza1e1 wrote:
Verbs are the tricky part i think. There is no way to recognice them.
So i will have to get a database ... work to do. ;)


Try the Brill tagger[1] or MXPOST[2].

STeVe

[1] http://www.cs.jhu.edu/~brill/code.html
[2] ftp://ftp.cis.upenn.edu/pub/adwait/jmx/jmx.tar.gz
Sep 22 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1512
by: anton muhin | last post by:
Hello, everybody! Can someone give an overview of existing Python parser generators? I played with TPG and like it a lot. However, I'd like to know more about alternatives. Google shows several options: PyLR, DParser, etc. I'm not intrested in ultra-speed: TPG although claims to be not lighting-quick seems quick enough for my needs, I'm rather looking for convinience and expressivness.
11
9617
by: Jean de Largentaye | last post by:
Hi, I need to parse a subset of C (a header file), and generate some unit tests for the functions listed in it. I thus need to parse the code, then rewrite function calls with wrong parameters. What I call "shaking the broken tree" :) I chose to make my UT-generator in Python 2.4. However, I am now encountering problems in choosing the right parser for the job. I struggle in choosing between the inappropriate, the out-of-date, the...
2
343
by: darin dimitrov | last post by:
I am looking for an implementation of a multipart content parser for ..NET (http://www.faqs.org/rfcs/rfc2388.html). I suppose that the HttpWebRequest class uses such a parser in order to extract the parameters and uploaded files from the request stream. Correct me if I am wrong but these classes are intended for an internal of the framework. So my question is what would be the easiest way to implement such a parser in .NET ? Thanks,...
6
1805
by: Jan Danielsson | last post by:
Hello all, I guess this is a question for people who have written a parser. Does an XML parser ever need to be recursive? I mean like: &fo&bar;o; I know this particular example is in the XML specs, and it says that it will not happen. But are there some really wild constructions that
4
2810
by: siddharthkhare | last post by:
Hi All, I need to parse certain text from a paragraph (like 20 lines). I know the exact tags that I am looking for. my approach is to define a xml (config) file that defines what tag I am looking for and corresponding regular expression to search for the pattern. Xml file will also have a way to say what should be the pervious tag
59
3478
by: riva | last post by:
I am developing a compression program. Is there any way to write a data to file in the form of bits, like write bit 0 then bit 1 and then bit 1 and so on ....
18
4728
by: Just Another Victim of the Ambient Morality | last post by:
Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: from pyparsing import * grammar = OneOrMore(Word(alphas)) + Literal('end') grammar.parseString('First Second Third end')
1
1351
by: Matthew Wilson | last post by:
I'm working on two coroutines -- one iterates through a huge stream, and emits chunks in pieces. The other routine takes each chunk, then scores it as good or bad and passes that score back to the original routine, so it can make a copy of the stream with the score appended on. I have the code working, but it just looks really ugly. Here's a vastly simplified version. One function yields some numbers, and the other function tells me...
4
3263
by: Bartc | last post by:
"vaib" <vaibhavpanghal@gmail.comwrote in message news:26a44cc5-0f08-41fe-859b-0d27daf3ca1d@f24g2000prh.googlegroups.com... I don't know the formal approach to these things but I haven't come across an RE grammar before, not for an entire language anyway. The usual approach if you're not using external tools is to program using 'recursive descent' or top-down, whatever the term is. In this case the grammar is built-in to the code. You...
0
9643
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9480
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10147
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9946
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8968
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5511
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4044
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3643
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2875
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.