469,647 Members | 1,778 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,647 developers. It's quick & easy.

XML Parser Components?

Hi Folks,

This is my first post to this group, and I really am not sure whether
this is the right group to ask my question. If its not an appropriate
question to this group, please correct me and guide me to the right
place.

The thing is, I have been asked to design a XML parser using C. I have
done some study on XML so far and I know that I should have a design
before I start my coding.

And since I am new to the part of parser, I really am confused about
what would be components of my parser. All I know now is that I need a
validating component that validates the XML file, which should then
pass the XML file on to the parsing component for parsing.

My confusion lies on the parsing component. Its like I can't decide
what should be the sub-components of the parsing component.

Would some of you people be kind enough to enlighten me on this issue.

Thanks in Advance.

Mahesh.

Sep 14 '06 #1
6 1272
ma**************@gmail.com wrote:
validating component that validates the XML file, which should then
pass the XML file on to the parsing component for parsing.
It's usually done the other way around -- write a nonvalidating parser
to deal with the syntactic issues, then attach the validator to that.
(That isn't the only solution, or always the best solution, just the
easiest way to think about the problem.)
My confusion lies on the parsing component. Its like I can't decide
what should be the sub-components of the parsing component.
For a basic implementation, read any good book on parser design and/or
feed the XML grammar into any standard parser generator tool (eg the
YACC/LEX set).

Strong suggestion that -- unless this is a class assignment or you
believe you have a new approach that has significant advantages -- you
consider instead using one of the many parsers already available. (And I
assume that if the latter applied, you wouldn't have posted this vague a
question.) Reinventing wheels is sometimes useful; reimplementing
existing wheels is generally a waste of resources.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Sep 14 '06 #2
Joe Kesselman wrote:
Strong suggestion that -- unless this is a class assignment or you
believe you have a new approach that has significant advantages -- you
consider instead using one of the many parsers already available. (And I
Joe is right. If you really think that you should
write your own parser, be prepared to deal with all
the details of Unicode. For example, have you ever
heard of the BOM at the beginning of an XML file ?
Will your parser be able to deal with UTF-7 as well
as UTF-32 ?

Use Expat or libxml:

http://expat.sourceforge.net/
http://xmlsoft.org/
Sep 14 '06 #3
Jürgen Kahrs wrote:
Joe is right. If you really think that you should
write your own parser, be prepared to deal with all
the details of Unicode.
Well, one can start with an I/O library that handles Unicode; those
exist too. And sometimes it does make sense to have an implementation
that only supports a limited set of encodings, if you are certain that
those are all your application is ever going to see.

But there are lots of details in XML itself, especially if you want a
modern XML environment that supports namespaces, validation against
schemas, the standard XML APIs (DOM and/or SAX)...

A basic XML parser is a reasonable term project. A practical, efficient,
robust, validating XML parser is rather more. So unless this is a class
assignment (or equivalent), I'd definite go back to whoever said "write
one" and ask them why they want you to do that.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Sep 14 '06 #4

Jürgen Kahrs wrote:
Joe Kesselman wrote:
Strong suggestion that -- unless this is a class assignment or you
believe you have a new approach that has significant advantages -- you
consider instead using one of the many parsers already available. (And I

Joe is right. If you really think that you should
write your own parser, be prepared to deal with all
the details of Unicode. For example, have you ever
heard of the BOM at the beginning of an XML file ?
Will your parser be able to deal with UTF-7 as well
as UTF-32 ?
My parser need to worry only about UTF-8, which, i think, is not that
difficult to deal as compared to what you were asking (the UTF's).
>
Use Expat or libxml:

http://expat.sourceforge.net/
http://xmlsoft.org/
Sep 15 '06 #5
ma**************@gmail.com wrote:
>Will your parser be able to deal with UTF-7 as well
as UTF-32 ?

My parser need to worry only about UTF-8, which, i think, is not that
difficult to deal as compared to what you were asking (the UTF's).
Even UTF-8 data may contain a Byte-Oder-Mark (BOM).
Be prepared to read up to 4 bytes per "character"
and be prepared to read them in any byte-order.

But (as Joe suggested), there are libraries that
do the conversion for you. Use the libiconv, which
is a POSIX lib (see "man iconv").
Sep 15 '06 #6

Jürgen Kahrs wrote:
ma**************@gmail.com wrote:
Will your parser be able to deal with UTF-7 as well
as UTF-32 ?
My parser need to worry only about UTF-8, which, i think, is not that
difficult to deal as compared to what you were asking (the UTF's).

Even UTF-8 data may contain a Byte-Oder-Mark (BOM).
Be prepared to read up to 4 bytes per "character"
and be prepared to read them in any byte-order.
I shall make sure to handle the BOM.
>
But (as Joe suggested), there are libraries that
do the conversion for you. Use the libiconv, which
is a POSIX lib (see "man iconv").
I surely will look into the libconv. And I thank all of you guys who
have given suggestions and such.

Sep 18 '06 #7

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

5 posts views Thread by Rutger Claes | last post: by
2 posts views Thread by | last post: by
6 posts views Thread by wilk | last post: by
2 posts views Thread by Big D | last post: by
12 posts views Thread by Janiek Buysrogge | last post: by
7 posts views Thread by jagsmiles | last post: by
2 posts views Thread by Mike Lowery | last post: by
4 posts views Thread by fbrewster | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.