473,383 Members | 1,863 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Writing parser right way in c#

Hi All,
I need to parse certain text from a paragraph (like 20 lines).

I know the exact tags that I am looking for.

my approach is to define a xml (config) file that defines what tag I am
looking for and corresponding regular expression to search for the
pattern.

Xml file will also have a way to say what should be the pervious tag
and what should be the next tag. Again some of it through regular
expression and some of it through logic.

Run time just read the xml .find each tag and corresponding regular
expression execute it.

Assuming there may be more additions of the patterns and there might be
more rules coming up , Is this the best approach for this.

Are there other ways to make it more flexible and generic.

I don't want to end with stringent rules rather develop some sort of
extendable grammar.

Any Ideas
-KS

Mar 11 '06 #1
4 2793
si************@hotmail.com wrote:
Hi All,
I need to parse certain text from a paragraph (like 20 lines).

I know the exact tags that I am looking for.

my approach is to define a xml (config) file that defines what tag I
am looking for and corresponding regular expression to search for the
pattern.

Xml file will also have a way to say what should be the pervious tag
and what should be the next tag. Again some of it through regular
expression and some of it through logic.

Run time just read the xml .find each tag and corresponding regular
expression execute it.

Assuming there may be more additions of the patterns and there might
be more rules coming up , Is this the best approach for this.

Are there other ways to make it more flexible and generic.

I don't want to end with stringent rules rather develop some sort of
extendable grammar.

Any Ideas


You'll always end up with code that's tied to the grammar of your
'language', unless you're using an LR(n) parser core with action/goto
tables.

Normally, you'd use a lexical analyzer to convert text to tokens, then
interpret the tokens by a parser and 'handle' them by converting
streams of terminals (tokens) into non-terminals and execute actions
based on the determined non-terminals. Terminals and Non-terminals are
terms used in (E)BNF, the notation for grammar.

What you should focus on is to write something that works, rather than
something that can parse every language in the world, because that
won't work, there's always a part of the code that's tied to the
grammar. For example, if you're using a lr(n) parser generator which in
theory produces an action/goto table and uses a generic parser core, it
still has to have rule handlers which handle the action to be executed
when a non-terminal is found. For example, say you have the following
syntaxis:
http://www.microsoft.com
This then can be written in ENBF as:
URL -> UrlStartToken urltext UrlEndToken
UrlStartToken ->
UrlEndToken ->

urltext -> ...

Now, if the nonterminal 'URL' is found, it has to be handled, so the
rule handler for that nonterminal has to be written in code and is
therefore tied to the grammar and therefore not generic. But that's ok,
as you simply want to parse something, to get something done, not to
have something completely generic which doesn't do anything.

Frans

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Mar 12 '06 #2
thanks for the reply.
are there any lexical analyzer available that I can use from .NET.

Also another question.
1)My understanding is taking a lexer approach makes more sense if you
are writing a compiler for a language like c#.
Beacuse you have to write a hadler/action for each non-terminal .
You have to to know each terminal /non-terminal when you are writing
your parser(design time).

If you anticipate more patterns to be added after your parser is
deployed ...so it should just be configuartion file change. you just
add the search string and the regular expression in config file and
your parser can handle it.

is lexer right approach in that case also? or I am better off with
regular expression?
2)secondly when you right a lexer you care about every word in the line
that you are parssing.

for example

object o = new object ();

you would go with lexer approach if you want to parser throuh each
token to make aure it is syntactically correct.
but if you just want to search for ..lets say..second occurance of
string "object" which is lets say 15 character away form the first
occurance than you are better of with just using regular expression.

Is my assumption correct?
Thanks
KS

Mar 12 '06 #3
thanks for the reply.
are there any lexical analyzer available that I can use from .NET.

Also another question.
1)My understanding is taking a lexer approach makes more sense if you
are writing a compiler for a language like c#.
Beacuse you have to write a hadler/action for each non-terminal .
You have to to know each terminal /non-terminal when you are writing
your parser(design time).

If you anticipate more patterns to be added after your parser is
deployed ...so it should just be configuartion file change. you just
add the search string and the regular expression in config file and
your parser can handle it.

is lexer right approach in that case also? or I am better off with
regular expression?
2)secondly when you right a lexer you care about every word in the line
that you are parssing.

for example

object o = new object ();

you would go with lexer approach if you want to parser throuh each
token to make aure it is syntactically correct.
but if you just want to search for ..lets say..second occurance of
string "object" which is lets say 15 character away form the first
occurance than you are better of with just using regular expression.

Is my assumption correct?
Thanks
KS

Mar 12 '06 #4
si************@hotmail.com wrote:
thanks for the reply.
are there any lexical analyzer available that I can use from .NET.
not that I'm aware of, but they're not hard to write.
Also another question.
1)My understanding is taking a lexer approach makes more sense if you
are writing a compiler for a language like c#.
Beacuse you have to write a hadler/action for each non-terminal .
You have to to know each terminal /non-terminal when you are writing
your parser(design time).

If you anticipate more patterns to be added after your parser is
deployed ...so it should just be configuartion file change. you just
add the search string and the regular expression in config file and
your parser can handle it.

is lexer right approach in that case also? or I am better off with
regular expression?
a lexical analyzer is a routine which uses regular expressions :).
Best way is to define your tokens as regular expressions and use these
expressions to 'tokenize' your input stream. Especially if you're
having start/end tokens for statements.
2)secondly when you right a lexer you care about every word in the
line that you are parssing.

for example

object o = new object ();

you would go with lexer approach if you want to parser throuh each
token to make aure it is syntactically correct.
but if you just want to search for ..lets say..second occurance of
string "object" which is lets say 15 character away form the first
occurance than you are better of with just using regular expression.

Is my assumption correct?


You need 2 parts: a lexical analyzer, which converts the input stream
into a stream of tokens and a parser which converts the stream of
tokens into a stream of actions.

The parser is the place where the tokenstream is scanned for
correctness.

Frans
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Mar 13 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: anton muhin | last post by:
Hello, everybody! Can someone give an overview of existing Python parser generators? I played with TPG and like it a lot. However, I'd like to know more about alternatives. Google shows...
11
by: Jean de Largentaye | last post by:
Hi, I need to parse a subset of C (a header file), and generate some unit tests for the functions listed in it. I thus need to parse the code, then rewrite function calls with wrong parameters....
2
by: darin dimitrov | last post by:
I am looking for an implementation of a multipart content parser for ..NET (http://www.faqs.org/rfcs/rfc2388.html). I suppose that the HttpWebRequest class uses such a parser in order to extract...
6
by: Jan Danielsson | last post by:
Hello all, I guess this is a question for people who have written a parser. Does an XML parser ever need to be recursive? I mean like: &fo&bar;o; I know this particular example is in the...
7
by: beza1e1 | last post by:
I'm writing a parser for english language. This is a simple function to identify, what kind of sentence we have. Do you think, this class wrapping is right to represent the result of the function?...
59
by: riva | last post by:
I am developing a compression program. Is there any way to write a data to file in the form of bits, like write bit 0 then bit 1 and then bit 1 and so on ....
18
by: Just Another Victim of the Ambient Morality | last post by:
Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: ...
1
by: Matthew Wilson | last post by:
I'm working on two coroutines -- one iterates through a huge stream, and emits chunks in pieces. The other routine takes each chunk, then scores it as good or bad and passes that score back to the...
4
by: Bartc | last post by:
"vaib" <vaibhavpanghal@gmail.comwrote in message news:26a44cc5-0f08-41fe-859b-0d27daf3ca1d@f24g2000prh.googlegroups.com... I don't know the formal approach to these things but I haven't come...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.