By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,403 Members | 1,089 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,403 IT Pros & Developers. It's quick & easy.

Proposed API change for pyparsing CaselessLiteral - could break existing code

P: n/a
***This is of especial interest for those who are using the pyparsing
module, and have defined grammars that make use of CaselessLiteral.***

One of the bugfix requests I recently got for pyparsing was to fix the
tokens returned by CaselessLiteral. CaselessLiteral is an interesting
special case of Literal, since it matches a large number of possible input
tokens. That is:

CaselessLiteral("abcd")

could match abcd, abcD, abCD, abCd, etc. When parsing something like SQL, a
CaselessLiteral('select') has even more options.

Ordinarily, the other parsing classes in pyparsing return the original
matched text from the input stream, but CaselessLiteral could make this
unwieldy - if you had something more complex like this possible grammar for
a Zork-type game:

verb = ( CaselessLiteral("pick") | CaselessLiteral("turn") |
CaselessLiteral("drop") | <...and so on...> )

then the code to process the tokens would have to neutralize the case of the
input text, since it could be 'picK', 'Turn', 'DROP'. So I expect that the
first thing the calling code would do with the results would be to convert
to upcase or lowercase. Since this was so predictable, I decided to build
that into the interface, that in the case of CaselessLiteral, the input text
would *not* be returned as the matching text, but the original specifying
text string would be returned instead - after all, the caller had already
decided that case was not significant. This is how the documentation reads,
that CaselessLiteral will return as its match text the original specifying
text string.

Unfortunately, I botched it, waaaaay back pre-1.0.0, when I first put
pyparsing up on SourceForge. CaselessLiteral always returns the input text
converted to uppercase. It is still a predictably-cased string, but not as
documented. In fact, if someone were to specify CaselessLiteral("pick") and
get returned "PICK", this could make for other problems.

So for those of you who are still reading, and who use pyparsing, and have
CaselessLiterals in your code, and test on the returned text, what choice
would you prefer:

1. Keep the current behavior, and just change the docs.
2. Fix the current behavior to match the docs, and fix up any code that uses
it.

My personal preference is #2. We are still early in pyparsing's code life -
it has only been generally available for about 4 months - and I think it
really is the preferred way to go.

On the other hand, pyparsing has been downloaded almost 900 times from SF,
and so I want to find out if this will end up making me lots of enemies. :)

Overall, this has been a gratifying experience - I have gotten many e-mails
from people who feel this tool is easy to pick up, and fills a common need.
I want to continue in those 900 peoples' good graces, or at least as many of
them as I can.

-- Paul
Jul 18 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a
"Paul McGuire" <pt***@austin.rr._bogus_.com> wrote in message news:<ge*******************@fe2.texas.rr.com>...
So for those of you who are still reading, and who use pyparsing, and have
CaselessLiterals in your code, and test on the returned text, what choice
would you prefer:

1. Keep the current behavior, and just change the docs.
2. Fix the current behavior to match the docs, and fix up any code that uses
it.

My personal preference is #2. We are still early in pyparsing's code life -
it has only been generally available for about 4 months - and I think it
really is the preferred way to go.


I'm +1 on #2, i.e., change the code to match the docs.
I have a use case where the parser is acting as a "cleanup"
to the input, making it conform to a coding standard,
and CaselessLiteral working as described in the docs
would be perfect. I suppose I could make a setParseAction
that would convert it to the appropriate case, but that
would slow it down, plus I'd have to keep track of the
literal in two places (well, I suppose I could define
a name for the value and import it from my grammar and
from the code that has the parse actions, but still...).

Thanks for pyparsing.
--dang
Jul 18 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.