473,395 Members | 2,467 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Pyparsing troubles

Hello,
I have written a small pyparsing parser to recognize dates in the style
"november 1st". I wrote something to the effect of:

expression = task + date

and tried to parse "Doctor's appointment on november 1st", hoping that
task would be "Doctor's appointment" and date would be "on november
1st" (the parser does match "on november 1st" to "date"). I have set
task as Regex(".*?"), ZeroOrMore(Word(alphas)), etc, but I can't get it
to match, it matches everything to task and ignores date until it gets
to the end of the string.

Can anyone help?

Dec 10 '06 #1
1 1094
po*******@gmail.com writes:
Hello,
I have written a small pyparsing parser to recognize dates in the style
"november 1st". I wrote something to the effect of:

expression = task + date

and tried to parse "Doctor's appointment on november 1st", hoping that
task would be "Doctor's appointment" and date would be "on november
1st" (the parser does match "on november 1st" to "date"). I have set
task as Regex(".*?"), ZeroOrMore(Word(alphas)), etc, but I can't get it
to match, it matches everything to task and ignores date until it gets
to the end of the string.

Can anyone help?
As described, this is a Natural Language Programming (NLP) problem,
which means you will have a lot more trouble with understanding what
you want to do than in coding it. Also, dates are notoriously tough
to parse, because of so many variants, so there are libraries to do
just that.

If you want to tackle it systematically:

1. Get a "corpus" of texts which illustrate the ways the users might
state the date. E.g., "2006-11-01", "1-Nov-06", "November 1",
"Nov. first", "first of November", "10 days prior to Veterans Day",
"next week", .....

2. If you can control the input, much better. Either by a form which
forces specific values for day, month, year, hour, minute, or by
requiring IETF format (yyyy-mm-ddThh:mm:ss).

3. Determine the syntax rules for each example. If possible, abstract
these to general rules which work on more than one example.

4. At this point, you should know enough to decide if it is a:

a) Regular expression, parseable with a regexp engine

b) Context Free Grammar (CFG), parseable with a LL(1) or LALR(1) parser.

c) Context Dependent Grammar, parseable with an ad hoc parser with special rules.

d) Free text, not parseable in the normal sense, but perhaps
understandable with statistical analysis NLP techniques.

f) Hodgepodge not amenable to machine analysis.

5. Then we could look at using pyparser. But we'd have to see
the pyparser code you tried.

--
Harry George
PLM Engineering Architecture
Dec 11 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could....
4
by: the.theorist | last post by:
Hey, I'm trying my hand and pyparsing a log file (named l.log): FIRSTLINE PROPERTY1 DATA1 PROPERTY2 DATA2 PROPERTYS LIST ID1 data1 ID2 data2
3
by: rh0dium | last post by:
Hi all, I have a file which I need to parse and I need to be able to break it down by sections. I know it's possible but I can't seem to figure this out. The sections are broken by <> with...
4
by: Bytter | last post by:
Hi, I'm trying to construct a parser, but I'm stuck with some basic stuff... For example, I want to match the following: letter = "A"..."Z" | "a"..."z" literal = letter+ include_bool := "+"...
13
by: 7stud | last post by:
To the developer: 1) I went to the pyparsing wiki to download the pyparsing module and try it 2) At the wiki, there was no index entry in the table of contents for Downloads. After searching...
1
by: Steve | last post by:
Hi All (especially Paul McGuire!) Could you lend a hand in the grammar and paring of the output from the function win32pdhutil.ShowAllProcesses()? This is the code that I have so far (it is...
18
by: Just Another Victim of the Ambient Morality | last post by:
Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: ...
3
by: hubritic | last post by:
I am trying to parse data that looks like this: IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 2BFA76F6 1208230607 T S SYSPROC SYSTEM SHUTDOWN BY USER...
5
by: Paul McGuire | last post by:
I've just uploaded to SourceForge and PyPI the latest update to pyparsing, version 1.5.1. It has been a couple of months since 1.5.0 was released, and a number of bug-fixes and enhancements have...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.