473,783 Members | 2,350 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Writing parser right way in c#

Hi All,
I need to parse certain text from a paragraph (like 20 lines).

I know the exact tags that I am looking for.

my approach is to define a xml (config) file that defines what tag I am
looking for and corresponding regular expression to search for the
pattern.

Xml file will also have a way to say what should be the pervious tag
and what should be the next tag. Again some of it through regular
expression and some of it through logic.

Run time just read the xml .find each tag and corresponding regular
expression execute it.

Assuming there may be more additions of the patterns and there might be
more rules coming up , Is this the best approach for this.

Are there other ways to make it more flexible and generic.

I don't want to end with stringent rules rather develop some sort of
extendable grammar.

Any Ideas
-KS

Mar 11 '06 #1
4 2810
si************@ hotmail.com wrote:
Hi All,
I need to parse certain text from a paragraph (like 20 lines).

I know the exact tags that I am looking for.

my approach is to define a xml (config) file that defines what tag I
am looking for and corresponding regular expression to search for the
pattern.

Xml file will also have a way to say what should be the pervious tag
and what should be the next tag. Again some of it through regular
expression and some of it through logic.

Run time just read the xml .find each tag and corresponding regular
expression execute it.

Assuming there may be more additions of the patterns and there might
be more rules coming up , Is this the best approach for this.

Are there other ways to make it more flexible and generic.

I don't want to end with stringent rules rather develop some sort of
extendable grammar.

Any Ideas


You'll always end up with code that's tied to the grammar of your
'language', unless you're using an LR(n) parser core with action/goto
tables.

Normally, you'd use a lexical analyzer to convert text to tokens, then
interpret the tokens by a parser and 'handle' them by converting
streams of terminals (tokens) into non-terminals and execute actions
based on the determined non-terminals. Terminals and Non-terminals are
terms used in (E)BNF, the notation for grammar.

What you should focus on is to write something that works, rather than
something that can parse every language in the world, because that
won't work, there's always a part of the code that's tied to the
grammar. For example, if you're using a lr(n) parser generator which in
theory produces an action/goto table and uses a generic parser core, it
still has to have rule handlers which handle the action to be executed
when a non-terminal is found. For example, say you have the following
syntaxis:
http://www.microsoft.com
This then can be written in ENBF as:
URL -> UrlStartToken urltext UrlEndToken
UrlStartToken ->
UrlEndToken ->

urltext -> ...

Now, if the nonterminal 'URL' is found, it has to be handled, so the
rule handler for that nonterminal has to be written in code and is
therefore tied to the grammar and therefore not generic. But that's ok,
as you simply want to parse something, to get something done, not to
have something completely generic which doesn't do anything.

Frans

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Mar 12 '06 #2
thanks for the reply.
are there any lexical analyzer available that I can use from .NET.

Also another question.
1)My understanding is taking a lexer approach makes more sense if you
are writing a compiler for a language like c#.
Beacuse you have to write a hadler/action for each non-terminal .
You have to to know each terminal /non-terminal when you are writing
your parser(design time).

If you anticipate more patterns to be added after your parser is
deployed ...so it should just be configuartion file change. you just
add the search string and the regular expression in config file and
your parser can handle it.

is lexer right approach in that case also? or I am better off with
regular expression?
2)secondly when you right a lexer you care about every word in the line
that you are parssing.

for example

object o = new object ();

you would go with lexer approach if you want to parser throuh each
token to make aure it is syntactically correct.
but if you just want to search for ..lets say..second occurance of
string "object" which is lets say 15 character away form the first
occurance than you are better of with just using regular expression.

Is my assumption correct?
Thanks
KS

Mar 12 '06 #3
thanks for the reply.
are there any lexical analyzer available that I can use from .NET.

Also another question.
1)My understanding is taking a lexer approach makes more sense if you
are writing a compiler for a language like c#.
Beacuse you have to write a hadler/action for each non-terminal .
You have to to know each terminal /non-terminal when you are writing
your parser(design time).

If you anticipate more patterns to be added after your parser is
deployed ...so it should just be configuartion file change. you just
add the search string and the regular expression in config file and
your parser can handle it.

is lexer right approach in that case also? or I am better off with
regular expression?
2)secondly when you right a lexer you care about every word in the line
that you are parssing.

for example

object o = new object ();

you would go with lexer approach if you want to parser throuh each
token to make aure it is syntactically correct.
but if you just want to search for ..lets say..second occurance of
string "object" which is lets say 15 character away form the first
occurance than you are better of with just using regular expression.

Is my assumption correct?
Thanks
KS

Mar 12 '06 #4
si************@ hotmail.com wrote:
thanks for the reply.
are there any lexical analyzer available that I can use from .NET.
not that I'm aware of, but they're not hard to write.
Also another question.
1)My understanding is taking a lexer approach makes more sense if you
are writing a compiler for a language like c#.
Beacuse you have to write a hadler/action for each non-terminal .
You have to to know each terminal /non-terminal when you are writing
your parser(design time).

If you anticipate more patterns to be added after your parser is
deployed ...so it should just be configuartion file change. you just
add the search string and the regular expression in config file and
your parser can handle it.

is lexer right approach in that case also? or I am better off with
regular expression?
a lexical analyzer is a routine which uses regular expressions :).
Best way is to define your tokens as regular expressions and use these
expressions to 'tokenize' your input stream. Especially if you're
having start/end tokens for statements.
2)secondly when you right a lexer you care about every word in the
line that you are parssing.

for example

object o = new object ();

you would go with lexer approach if you want to parser throuh each
token to make aure it is syntactically correct.
but if you just want to search for ..lets say..second occurance of
string "object" which is lets say 15 character away form the first
occurance than you are better of with just using regular expression.

Is my assumption correct?


You need 2 parts: a lexical analyzer, which converts the input stream
into a stream of tokens and a parser which converts the stream of
tokens into a stream of actions.

The parser is the place where the tokenstream is scanned for
correctness.

Frans
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Mar 13 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
1512
by: anton muhin | last post by:
Hello, everybody! Can someone give an overview of existing Python parser generators? I played with TPG and like it a lot. However, I'd like to know more about alternatives. Google shows several options: PyLR, DParser, etc. I'm not intrested in ultra-speed: TPG although claims to be not lighting-quick seems quick enough for my needs, I'm rather looking for convinience and expressivness.
11
9617
by: Jean de Largentaye | last post by:
Hi, I need to parse a subset of C (a header file), and generate some unit tests for the functions listed in it. I thus need to parse the code, then rewrite function calls with wrong parameters. What I call "shaking the broken tree" :) I chose to make my UT-generator in Python 2.4. However, I am now encountering problems in choosing the right parser for the job. I struggle in choosing between the inappropriate, the out-of-date, the...
2
343
by: darin dimitrov | last post by:
I am looking for an implementation of a multipart content parser for ..NET (http://www.faqs.org/rfcs/rfc2388.html). I suppose that the HttpWebRequest class uses such a parser in order to extract the parameters and uploaded files from the request stream. Correct me if I am wrong but these classes are intended for an internal of the framework. So my question is what would be the easiest way to implement such a parser in .NET ? Thanks,...
6
1805
by: Jan Danielsson | last post by:
Hello all, I guess this is a question for people who have written a parser. Does an XML parser ever need to be recursive? I mean like: &fo&bar;o; I know this particular example is in the XML specs, and it says that it will not happen. But are there some really wild constructions that
7
2441
by: beza1e1 | last post by:
I'm writing a parser for english language. This is a simple function to identify, what kind of sentence we have. Do you think, this class wrapping is right to represent the result of the function? Further parsing then checks isinstance(text, Declarative). ------------------- class Sentence(str): pass class Declarative(Sentence): pass class Question(Sentence): pass class Command(Sentence): pass
59
3478
by: riva | last post by:
I am developing a compression program. Is there any way to write a data to file in the form of bits, like write bit 0 then bit 1 and then bit 1 and so on ....
18
4728
by: Just Another Victim of the Ambient Morality | last post by:
Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: from pyparsing import * grammar = OneOrMore(Word(alphas)) + Literal('end') grammar.parseString('First Second Third end')
1
1351
by: Matthew Wilson | last post by:
I'm working on two coroutines -- one iterates through a huge stream, and emits chunks in pieces. The other routine takes each chunk, then scores it as good or bad and passes that score back to the original routine, so it can make a copy of the stream with the score appended on. I have the code working, but it just looks really ugly. Here's a vastly simplified version. One function yields some numbers, and the other function tells me...
4
3263
by: Bartc | last post by:
"vaib" <vaibhavpanghal@gmail.comwrote in message news:26a44cc5-0f08-41fe-859b-0d27daf3ca1d@f24g2000prh.googlegroups.com... I don't know the formal approach to these things but I haven't come across an RE grammar before, not for an entire language anyway. The usual approach if you're not using external tools is to program using 'recursive descent' or top-down, whatever the term is. In this case the grammar is built-in to the code. You...
0
9643
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10313
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10081
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9946
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8968
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7494
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6735
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5511
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4044
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.