473,387 Members | 1,745 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Writing a parser

Hello all,

I guess this is a question for people who have written a parser.

Does an XML parser ever need to be recursive? I mean like:

&fo&bar;o;

I know this particular example is in the XML specs, and it says that
it will not happen. But are there some really wild constructions that
are allowed, that would require recurive parsing?

Like.. <tag <!-- Comment <tag2 attr="<fo&ou<!-- comment!
-->ml;o/>"></tag2> -->></tag>

Please, don't start taking that a part, I know all the errors in it.
However, what I want to demonstrate is the level of complexity I'm
wondering about. Any case where recursion is needed?

--
Kind Regards,
Jan Danielsson
Te audire no possum. Musa sapientum fixa est in aure.
Aug 17 '05 #1
6 1776
Hello,

Jan Danielsson wrote:

I guess this is a question for people who have written a parser.

Does an XML parser ever need to be recursive? I mean like:

&fo&bar;o;

I know this particular example is in the XML specs, and it says that
it will not happen. But are there some really wild constructions that
are allowed, that would require recurive parsing?

Like.. <tag <!-- Comment <tag2 attr="<fo&ou<!-- comment!
-->ml;o/>"></tag2> -->></tag>

Please, don't start taking that a part, I know all the errors in it.
However, what I want to demonstrate is the level of complexity I'm
wondering about. Any case where recursion is needed?


I'm no expert, but AFAIK a XML parser will have to stop if the XML
file is not well-formed. The above example contains errors (you
said it), so it is not well-formed. There's no need for a parser
to accept the above construct. I even think that a parser is not
allowed to accept it.

Gerald
Aug 17 '05 #2
Jan Danielsson wrote:
However, what I want to demonstrate is the level of complexity I'm
wondering about. Any case where recursion is needed?


Why do you worry about recursion ?
Recursive functions usually make parsers easier to implement.
If you *really* cant recurse in your implementation, use stacks
for holding the context.
Aug 17 '05 #3
Jürgen Kahrs wrote:
However, what I want to demonstrate is the level of complexity I'm
wondering about. Any case where recursion is needed?


Why do you worry about recursion ?
Recursive functions usually make parsers easier to implement.
If you *really* cant recurse in your implementation, use stacks
for holding the context.


I'm sorry, but I was talking about recursive *expressions* in *XML*,
not as in "a function calling itself". I already have a stack based
parser, but I'm beginning to wonder it is worth the trouble, I haven't
actually seen any examples where I would actually need the stack based
design, and there'a much neater way to solve it, imho, but it would
make certain recursions *in* *XML* impossible.

Sorry for the confusion.

--
Kind Regards,
Jan Danielsson
Te audire no possum. Musa sapientum fixa est in aure.
Aug 17 '05 #4
In article <43********@griseus.its.uu.se>,
Jan Danielsson <ja************@gmail.com> wrote:
Does an XML parser ever need to be recursive? I mean like:


Yes, but not in the way your examples are.

Elements may contain other elements:

<foo>...<bar>...</bar>...</foo>

Even if you don't return this as a nested structure (for example,
a SAX parser just returns start and end tags), you need to maintain
a stack of open elements so you can detect errors like this:

<foo>...<bar>...</bar>...</wrong>

The replacement text of entities may contain references to other
entities:

<!ENTITY foo "some text">
<!ENTITY bar "contains this [ &bar; ] text">

So that a reference in the document to "&foo;" must be expanded
to "contains this [ some text ] text".

And similarly for external entities.

-- Richard
Aug 17 '05 #5
Richard Tobin wrote:
<!ENTITY foo "some text">
<!ENTITY bar "contains this [ &bar; ] text">

So that a reference in the document to "&foo;" must be expanded
to "contains this [ some text ] text".
Surely you mean <!ENTITY bar "contains this [ &foo; ] text">


?

Soren
Aug 17 '05 #6
In article <bm********************@news000.worldonline.dk>,
Soren Kuula <do******@dongfang.dk> wrote:
<!ENTITY bar "contains this [ &bar; ] text">
Surely you mean
<!ENTITY bar "contains this [ &foo; ] text">


Yes, of course.

The one I typed is illegal (and must be reported as such by an XML
parser if it is used).

-- Richard
Aug 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Karalius, Joseph | last post by:
Can anyone explain what is happening here? I haven't found any useful info on Google yet. Thanks in advance. mmagnet:/home/jkaralius/src/zopeplone/Python-2.3.5 # make gcc -pthread -c...
0
by: darin dimitrov | last post by:
I am looking for an implementation of a multipart content parser for ..NET (http://www.faqs.org/rfcs/rfc2388.html). I suppose that the HttpWebRequest class uses such a parser in order to extract...
7
by: beza1e1 | last post by:
I'm writing a parser for english language. This is a simple function to identify, what kind of sentence we have. Do you think, this class wrapping is right to represent the result of the function?...
9
by: Cesar A. K. Grossmann | last post by:
Hi I'm trying to build a parser for a file I create. The file format is as follow: IDENTIFIER = NUMBER STRING STRING; COMPOSITE = STRING { ITEM }; ITEM = NUMBER IDENTIFIER|COMPOSITE
4
by: siddharthkhare | last post by:
Hi All, I need to parse certain text from a paragraph (like 20 lines). I know the exact tags that I am looking for. my approach is to define a xml (config) file that defines what tag I am...
59
by: riva | last post by:
I am developing a compression program. Is there any way to write a data to file in the form of bits, like write bit 0 then bit 1 and then bit 1 and so on ....
1
by: Matthew Wilson | last post by:
I'm working on two coroutines -- one iterates through a huge stream, and emits chunks in pieces. The other routine takes each chunk, then scores it as good or bad and passes that score back to the...
3
by: Kinokunya | last post by:
Hi guys, My group and I will be working on our final year project, the scope to do a program/web-based application similar areas of functionalities like the PyLint and PyChecker; a Python syntax...
1
by: Mudcat | last post by:
In short what I'm trying to do is read a document using an xml parser and then upload that data back into a database. I've got the code more or less completed using xml.etree.ElementTree for the...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.