Connecting Tech Pros Worldwide Help | Site Map

YARD : Generic regular expression parser

christopher diggins
Guest
 
Posts: n/a
#1: Jul 22 '05
There seems to be a gazillion regular expression libraries. Most of them
only work on text, but I wanted something that also worked on arbitrary
sequences of data ( this is useful, for instance, in building parse trees
from token lists ). This is possible, I think, using the Spirit library from
Boost, but the syntax and complexity again is too much for me. I almost
finished the YARD ( yet another recursive descent ) parser which is a really
lightweight truly generic regex parser (and runs like a bat out of hell).
Anyway, the syntax is essentially as follows:

You define rules as follows:

typedef CharRange_parser<'a', 'z'> LowerCaseLetter_parser;
typedef CharRange_parser<'A', 'Z'> UpperCaseLetter_parser;
typedef CharRange_parser<'0', '9'> Number_parser;
typedef re_or<LowerCaseLetter_parser, UpperCaseLetter_parser> Letter_parser;
typedef re_or<Letter_parser, Char_parser<'\''> > WordChar_parser;
typedef re_plus<WordChar_parser> Word_parser;
typedef re_or<Letter_parser, Char_parser<'_'> > IdentFirstChar_parser;
typedef re_or<IdentFirstChar_parser, Number_parser> IdentOtherChar_parser;
typedef re_and<IdentFirstChar_parser, re_star<IdentOtherChar_parser> >
Ident_parser;

Then you hand them to a tokenizer as follows:

int main ()
{
nBufSize = GetFileSize(sFileName);
pBuf = static_cast<char*>(calloc(nBufSize, 1));
ifstream f;
f.open(sFileName);
f.read(pBuf, nBufSize);
f.close();
Tokenizer<Word_parser> tknzr;
tknzr.Parse(pBuf, nBufSize);
OutputTokens(tknzr.Begin(), tknzr.End());
free(pBuf);
getchar();
return 0;
}

A tokenizer in this case is really simple:

template<typename Parser_T>
struct Tokenizer {
void Parse(char* pText, int nSize)
{
ParseInputStream stream(pText, nSize);
while (!stream.AtEnd()) {
int index = stream.GetIndex();
if (Rules_T::Accept(stream)) {
mTkns.push_back(Token(index, stream.GetIndex()));
}
stream.GotoNext();
}
}
TokenIter Begin() { return mTkns.begin(); }
TokenIter End() { return mTkns.end(); }
private:
TokenList mTkns;
};

What I want to know is this obvious to programmers how it works and how to
use it? Is the verbosity acceptable? Also, would it interest people more if
I showed some benchmarks comparing it to other libraries?

TIA

--
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com


Markus Elfring
Guest
 
Posts: n/a
#2: Jul 23 '05

re: YARD : Generic regular expression parser


Can the definitions that are described in the section "7 Regular expressions
[tr.re]" of the document "(Draft) Technical Report on Standard Library
Extensions" be changed with other template parameters to match your
suggested use cases?
http://www.open-std.org/jtc1/sc22/wg...2004/n1687.pdf


christopher diggins
Guest
 
Posts: n/a
#3: Jul 23 '05

re: YARD : Generic regular expression parser



"Markus Elfring" <Markus.Elfring@web.de> wrote in message
news:342u4pF470ueqU1@individual.net...[color=blue]
> Can the definitions that are described in the section "7 Regular
> expressions
> [tr.re]" of the document "(Draft) Technical Report on Standard Library
> Extensions" be changed with other template parameters to match your
> suggested use cases?
> http://www.open-std.org/jtc1/sc22/wg...2004/n1687.pdf[/color]


Sorry but I don't quite understand the question ( nor the document ), could
you explain more?

--
Christopher Diggins
http://www.cdiggins.com
http://www.heron-language.com


Markus Elfring
Guest
 
Posts: n/a
#4: Jul 23 '05

re: YARD : Generic regular expression parser


> Sorry but I don't quite understand the question ( nor the document ),
could[color=blue]
> you explain more?[/color]

What don't you understand from the referenced document?
Would you like to reuse anything from this template library for regular
expressions that is in development?

When do you want a regexp to be evaluated?
Compile (Boost::Spirit / Phoenix) or run time?

Regards,
Markus


Closed Thread