By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,458 Members | 1,278 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,458 IT Pros & Developers. It's quick & easy.

minimal xml parser?

P: n/a
Does anyone know of a minimal/mini/tiny/small xml parser
in c? I'm looking for something small that accepts a stream
or string, builds a c structure, and then returns an opaque
pointer to that structure. There should then be a function
to search that structure given the pointer, tag, and an
optional attribute. I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).

Thanks.

Mike
Jul 20 '05 #1
Share this Question
Share on Google+
16 Replies


P: n/a
"Mike" <mi***@mikee.ath.cx> wrote in message
news:10*************@corp.supernews.com...
I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).


XML does not support "binary data" as the term is commonly used. All data
within an XML instance must be valid per the specified character encoding.

You should read the relevant sections of the XML specification before
determining if XML is an appropriate representation for your requirements.

/kmc
Jul 20 '05 #2

P: n/a
In article <k4********************@comcast.com>, Keith M. Corbett wrote:
"Mike" <mi***@mikee.ath.cx> wrote in message
news:10*************@corp.supernews.com...
I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).


XML does not support "binary data" as the term is commonly used. All data
within an XML instance must be valid per the specified character encoding.

You should read the relevant sections of the XML specification before
determining if XML is an appropriate representation for your requirements.

/kmc


XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.

Mike
Jul 20 '05 #3

P: n/a
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.

--
William Park <op**********@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Jul 20 '05 #4

P: n/a
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.

Mike
Jul 20 '05 #5

P: n/a
In article <10*************@corp.supernews.com>,
Mike <mi***@mikee.ath.cx> wrote:
% In article <2u*************@uni-berlin.de>, William Park wrote:
% > Mike <mi***@mikee.ath.cx> wrote:
% >> XML has been chosen, I need to write the parser. Oh, and I do not have
% >> to validate the XML, just parse it.
% >
% > Expat (www.libexpat.org). Practically every language has some sort of
% > support for it, even Bash shell.
% >
%
% Thanks for the expat suggestion. I have also read for libxml. I'd like to
% find a few hundred lines of c code to do this.

I expect that you won't find a conforming XML parser which is a only few
hundred lines long. The smallest conforming parsers I know of are expat
and rxp, and they're in the thousands of lines. There's also tinyxml, which
is not a conforming parser, and which is still in the thousands of lines.

Although the tempation to write a minimal ``parser'' yourself may be
overwhelming, I think you're better off using an existing, conforming,
parser. Otherwise, you will almost certainly end up with a system that
rejects valid XML files, and what's the good of that?
I think you're looking for something like rxp's API.
--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Jul 20 '05 #6

P: n/a
Mike wrote:
Does anyone know of a minimal/mini/tiny/small xml parser
in c? I'm looking for something small that accepts a stream
or string, builds a c structure, and then returns an opaque
pointer to that structure. There should then be a function
to search that structure given the pointer, tag, and an
optional attribute. I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).


You could try Mini-XML. See

http://www.easysw.com/~mike/mxml/

--
To reply by e-mail, please remove the extra dot
in the given address: m.collado -> mcollado

Jul 20 '05 #7

P: n/a
On Mon, 25 Oct 2004 23:50:10 -0000, Mike <mi***@mikee.ath.cx> wrote:
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.

Mike

you really need the source code? I'm sure you could find a parser in library form ready for you to
use.
Jul 20 '05 #8

P: n/a
Mike wrote:
Does anyone know of a minimal/mini/tiny/small xml parser
in c? I'm looking for something small that accepts a stream
or string, builds a c structure, and then returns an opaque
pointer to that structure. There should then be a function
to search that structure given the pointer, tag, and an
optional attribute. I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).


My Mini-XML library might be what you are looking for:

http://www.easysw.com/~mike/mxml/

--
__________________________________________________ ____________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com
Jul 20 '05 #9

P: n/a
Patrick TJ McPhee wrote:
...
I expect that you won't find a conforming XML parser which is a only
few hundred lines long. The smallest conforming parsers I know of are
expat and rxp, and they're in the thousands of lines. There's also
tinyxml, which is not a conforming parser, and which is still in the
thousands of lines.
...


It is a myth that conforming XML parsers have to be big; *validating*
parsers, perhaps, but not a simple non-validating parser which
accepts XML syntax and encoding.

Mini-XML started as 696 lines of C code (it has since grown to a
little over 2700 lines of code) and is a fully conformant XML
parser that provides everything except validation (and I'm thinking
how I could do that without bloating it...)

--
__________________________________________________ ____________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com
Jul 20 '05 #10

P: n/a
Mike <mi***@mikee.ath.cx> wrote:
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.


Are you talking about actually doing the parsing (duplicating what Expat
does), or just calling API functions?

If former, then I doubt there is one. If latter, then Gawk, Python,
Bash, all have a binding to Expat.

--
William Park <op**********@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Jul 20 '05 #11

P: n/a
In article <41**************@easysw.com>,
Michael Sweet <mi**@easysw.com> wrote:

% Patrick TJ McPhee wrote:
% > ...
% > I expect that you won't find a conforming XML parser which is a only
% > few hundred lines long.

[...]

% It is a myth that conforming XML parsers have to be big; *validating*
% parsers, perhaps, but not a simple non-validating parser which
% accepts XML syntax and encoding.
%
% Mini-XML started as 696 lines of C code (it has since grown to a

Which is to say, more than a few hundred lines, and it seems like
it wasn't conforming at that.

% little over 2700 lines of code) and is a fully conformant XML

Which is to say, thousands of lines.
--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Jul 20 '05 #12

P: n/a
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
In article <2u*************@uni-berlin.de>, William Park wrote:
> Mike <mi***@mikee.ath.cx> wrote:
>> XML has been chosen, I need to write the parser. Oh, and I do not have
>> to validate the XML, just parse it.
>
> Expat (www.libexpat.org). Practically every language has some sort of
> support for it, even Bash shell.
>


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.


Are you talking about actually doing the parsing (duplicating what Expat
does), or just calling API functions?

If former, then I doubt there is one. If latter, then Gawk, Python,
Bash, all have a binding to Expat.


I'm talking about the actual parsing.
Jul 20 '05 #13

P: n/a
Patrick TJ McPhee wrote:
In article <41**************@easysw.com>,
Michael Sweet <mi**@easysw.com> wrote:

% Patrick TJ McPhee wrote:
% > ...
% > I expect that you won't find a conforming XML parser which is a only
% > few hundred lines long.

[...]

% It is a myth that conforming XML parsers have to be big; *validating*
% parsers, perhaps, but not a simple non-validating parser which
% accepts XML syntax and encoding.
%
% Mini-XML started as 696 lines of C code (it has since grown to a

Which is to say, more than a few hundred lines, and it seems like
it wasn't conforming at that.
Actually, it was, however features were added to make it perform
better and support more use cases.
% little over 2700 lines of code) and is a fully conformant XML

Which is to say, thousands of lines.


But still a tiny fraction of the size of other XML parsers out
there...

--
__________________________________________________ ____________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com
Jul 20 '05 #14

P: n/a
In article <41**************@easysw.com>,
Michael Sweet <mi**@easysw.com> wrote:

[I wrote]

% > Which is to say, thousands of lines.
%
% But still a tiny fraction of the size of other XML parsers out
% there...

Except for the ones I cited in the post you seemed to contradict,
which are roughly the same size.

--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Jul 20 '05 #15

P: n/a

"Mike" <mi***@mikee.ath.cx> wrote in message
news:10*************@corp.supernews.com...
Does anyone know of a minimal/mini/tiny/small xml parser
in c? I'm looking for something small that accepts a stream
or string, builds a c structure, and then returns an opaque
pointer to that structure. There should then be a function
to search that structure given the pointer, tag, and an
optional attribute. I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).

oh well, this thread is new enough that I think I will add my comment.

if motivated, maybe my parser could be made to work in your case.
kalloc/kfree are for my allocator.
kralloc is a rotating allocator (allocates from a large circular buffer),
and thus does not need freeing.

ObjType_New can be replaced by kalloc (or malloc if needed).

be warned if replacing kalloc or such with malloc in that it will be
necessary to zero memory returned by malloc (not necissarily done by malloc
by default).

I ommited, eg, the printer here though...

part of the header:
----
#define TOKEN_NULL 0
#define TOKEN_SPECIAL 1
#define TOKEN_STRING 2
#define TOKEN_SYMBOL 3

typedef struct NetParse_Attr_s NetParse_Attr;
typedef struct NetParse_Node_s NetParse_Node;

struct NetParse_Attr_s {
NetParse_Attr *next;
char *ns;
char *key;
char *value;
};

struct NetParse_Node_s {
NetParse_Node *next;
char *ns;
char *key;
char *text;
NetParse_Attr *attr;
NetParse_Node *first;
};
dump of part of my parser:
----
/*--
Cat pdlib;Parse;XML
Form
char *NetParse_XML_EatWhite(char *s);
Description
Skips over whitespace.
Status Internal
--*/
char *NetParse_XML_EatWhite(char *s)
{
int i, r;

i=0;
while(*s && *s<=' ')
{
if(*s=='\n')
{
line++;
*s=' ';
}
i=1;
s++;
}

if(i)s=NetParse_XML_EatWhite(s);

return(s);
}

/*--
Cat pdlib;Parse;XML
Form
int NetParse_XML_SpecialP(char *s);
Description
Returns a nonzero value if *s is special.
Status Internal
--*/
int NetParse_XML_SpecialP(char *s)
{
switch(*s)
{
case '<':
return(1);
break;
case '>':
return(1);
break;
case '/':
return(1);
break;
case '=':
return(1);
break;
case '?':
return(1);
break;
case ':':
return(1);
break;
default:
return(0);
break;
}
return(0);
}

/*--
Cat pdlib;Parse;XML
Form
int NetParse_XML_ContSpecialP(char *s);
Description
Returns nonzero if this will get the parsers attention when reading as
text.
This includes '<' and '&'.
Status Internal
--*/
int NetParse_XML_ContSpecialP(char *s)
{
switch(*s)
{
case '<':
return(1);
break;
case '&':
return(1);
break;
default:
return(0);
break;
}
return(0);
}

/*--
Cat pdlib;Parse;XML
Form
char *NetParse_XML_Token(char *s, char *b, int *t);
Description
Reads a token from the XML stream.
This includes:
Individual symbols;
Globs of text/tags;
Strings.
b is the buffer.
t is an integer to hold the token type
TOKEN_NULL, a null terminator was reached;
TOKEN_SPECIAL, a special character.
TOKEN_STRING, a quoted string literal (escapes processed).
TOKEN_SYMBOL, an unquoted bit of text (eg: a tag).
Returns the next character after the token.
Status Internal
--*/
char *NetParse_XML_Token(char *s, char *b, int *t)
{
char *ob, *is, *t2;
char *buf;
int i;

is=s;
if(!b)b=kralloc(256);
ob=b;
*b=0;

if(t)*t=0;

buf=kralloc(16);

s=NetParse_XML_EatWhite(s);
if(!*s)
{
*t=TOKEN_NULL;
return(s);
}

if(NetParse_XML_SpecialP(s))
{
if(t)*t=TOKEN_SPECIAL;

*b++=*s++;
*b=0;
}else if((*s=='"') || (*s=='\'')) /* quoted string */
{
if(t)*t=TOKEN_STRING;
s++;
while(*s && (*s!='"') && (*s!='\''))
{
if(*s=='&')
{
s++;
t2=buf;
while(*s && (*s!=';'))*t2++=*s++;
if(!*s)return(NULL);
*t2++=0;
s++;

if(buf[0]=='#')
{
if(buf[1]=='x')
{
t=buf+2;
i=0;
while(*t)
{
i<<=4;
if((*t>='0') && (*t<='9'))
i+=*t-'0';
if((*t>='A') && (*t<='F'))
i+=*t-'A'+10;
if((*t>='a') && (*t<='f'))
i+=*t-'a'+10;
t++;
}
*b++=i;
}else *b++=atoi(buf+1);
}
if(!strcmp(buf, "amp"))*b++='&';
if(!strcmp(buf, "lt"))*b++='<';
if(!strcmp(buf, "gt"))*b++='>';
if(!strcmp(buf, "quot"))*b++='"';
if(!strcmp(buf, "apos"))*b++='\'';
}else *b++=*s++;
}
if(!*s)
{
*t=TOKEN_NULL;
return(is);
}
*b++=0;
s++;
}else
{
if(t)*t=TOKEN_SYMBOL;

while(*s && (*s>' ') && !NetParse_XML_SpecialP(s) &&
((b-ob)<254))
*b++=*s++;
*b++=0;

if(!*s)
{
*t=TOKEN_NULL;
return(is);
}
}
return(s);
}

/*--
Cat pdlib;Parse;XML
Form
char *NetParse_XML_ParseText(char *s, char *b);
Description
Parse a glob of text from the stream.
Handles escapes and such.
Status Internal
--*/
char *NetParse_XML_ParseText(char *s, char *b)
{
char *ob, *t;
char buf[16];
int i, gws, rws;

if(!b)b=kralloc(4096);
ob=b;
*b=0;

s=NetParse_XML_EatWhite(s);
if(!*s)return(NULL);

gws=0;
rws=0;
while(1)
{
while(*s && !NetParse_XML_ContSpecialP(s))
{
if((*s=='\r') || (*s=='\n'))
{
s=NetParse_XML_EatWhite(s);
if(!rws)
{
*b++=' ';
gws++;
}
continue;
}
gws=0;
if(*s<=' ')rws++;
else rws=0;
*b++=*s++;
}
if(!*s)return(NULL);

if(*s=='&')
{
s++;
t=buf;
while(*s && (*s!=';'))*t++=*s++;
if(!*s)return(NULL);
*t++=0;
s++;

if(buf[0]=='#')
{
if(buf[1]=='x')
{
t=buf+2;
i=0;
while(*t)
{
i<<=4;
if((*t>='0') && (*t<='9'))
i+=*t-'0';
if((*t>='A') && (*t<='F'))
i+=*t-'A'+10;
if((*t>='a') && (*t<='f'))
i+=*t-'a'+10;
t++;
}
gws=0;
if(i<=' ')rws++;
else rws=0;
*b++=i;
}else
{
i=atoi(buf+1);
gws=0;
if(i<=' ')rws++;
else rws=0;
*b++=i;
}
continue;
}
rws=0;
gws=0;

if(!strcmp(buf, "amp"))*b++='&';
if(!strcmp(buf, "lt"))*b++='<';
if(!strcmp(buf, "gt"))*b++='>';
if(!strcmp(buf, "apos"))*b++='\'';
if(!strcmp(buf, "quot"))*b++='"';
}else break;
}
b-=gws;
*b++=0;

return(s);
}

/*--
Cat pdlib;Parse;XML
Form
NetParse_Attr *NetParse_XML_ParseOpts(char **s);
Description
Parse the list of attributes within a tag.
Status Internal
--*/
NetParse_Attr *NetParse_XML_ParseOpts(char **s)
{
// char ns[32];
// char var[32];
// char eq[16];
// char val[256];
char *is, *ns, *var, *eq, *val;
int ty;
NetParse_Attr *lst, *end, *tmp;

ns=kralloc(256);
var=kralloc(256);
eq=kralloc(256);
val=kralloc(4096);

lst=NULL;
end=NULL;

is=*s;
while(1)
{
NetParse_XML_Token(*s, var, &ty);
if(ty==TOKEN_NULL)
{
kprint("m1\n");
*s=NULL;
return(NULL);
}

if((var[0]=='>') && (ty==TOKEN_SPECIAL))
break;
if((var[0]=='/') && (ty==TOKEN_SPECIAL))
break;
if((var[0]=='?') && (ty==TOKEN_SPECIAL))
break;
if(ty==TOKEN_NULL)
{
kprint("m2\n");
*s=NULL;
return(NULL);
}
if(ty!=TOKEN_SYMBOL)
{
kprint("parse error (inv attribute).\n");
return(NULL);
}

*s=NetParse_XML_Token(*s, var, &ty);
if(ty==TOKEN_NULL)
{
kprint("m3\n");
*s=NULL;
return(NULL);
}
*s=NetParse_XML_Token(*s, eq, &ty);
if(ty==TOKEN_NULL)
{
kprint("m4\n");
*s=NULL;
return(NULL);
}

if((ty==TOKEN_SPECIAL) && (eq[0]==':'))
{
strcpy(ns, var);

*s=NetParse_XML_Token(*s, var, &ty);
if(ty==TOKEN_NULL)
{
kprint("m41\n");
*s=NULL;
return(NULL);
}
*s=NetParse_XML_Token(*s, eq, &ty);
if(ty==TOKEN_NULL)
{
kprint("m42\n");
*s=NULL;
return(NULL);
}
}else ns[0]=0;

if((ty!=TOKEN_SPECIAL) || (eq[0]!='='))
{
kprint("parse error (attr equal).\n");
return(NULL);
}

*s=NetParse_XML_Token(*s, val, &ty);
if(ty==TOKEN_NULL)
{
kprint("m5\n");
*s=NULL;
return(NULL);
}

if(ty!=TOKEN_STRING)
{
kprint("parse error (inv attribute arg).\n");
return(NULL);
}

// t=CONS(SYM(var), CONS(STRING(val), MISC_EOL));
// x=CONS(t, x);
// tmp=kalloc(sizeof(NetParse_Attr));
tmp=NetParse_NewAttr();
tmp->next=NULL;
if(ns[0])tmp->ns=kstrdup(ns);
tmp->key=kstrdup(var);
tmp->value=kstrdup(val);

if(end)
{
end->next=tmp;
end=tmp;
}else
{
lst=tmp;
end=tmp;
}
}

return(lst);
}

/*--
Cat pdlib;Parse;XML
Form
NetParse_Node *NetParse_XML_ParseExpr(char **s);
Description
Parses an XML expression.
s is updated to reflect the change.

NULL is returned on parse error or end-of-stream.
s is not updated for end of stream conditions, which can be used to
seperate it from a parse error.
--*/
NetParse_Node *NetParse_XML_ParseExpr(char **s)
{
// char buf[256], buf2[16];
// char key[32], ns[32];
char *buf, *buf2, *key, *ns;

int ty, i;
char *s2, *s3, *s4, *is;

// elem kv, opts, t, x;
NetParse_Node *tmp, *t, *end;

is=*s;
*s=NetParse_XML_EatWhite(*s);
if(!*(*s))return(NULL);

buf=kalloc(256);
buf2=kalloc(256);
key=kalloc(256);
ns=kalloc(256);

// strncpy(buf, *s, 5);
// buf[5]=0;
// kprint("parse: %s\n", buf);

NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if((buf[0]=='<') && (ty==TOKEN_SPECIAL))
{
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if(*s[0]=='?')
*s=*s+1;
if(*s[0]=='!')
{
if(!strncmp(*s, "[CDATA[", 7))
{
*s=*s+7;
s2=kalloc(65536);
s3=s2;
s4=*s;
while(*s4)
{
if(!strncmp(s4, "]]>", 3))
{
s4+=3;
break;
}
if(!strncmp(s4, "]]&gt;", 6))
{
s4+=6;

*s3++=']';
*s3++=']';
*s3++='>';
continue;
}
*s3++=*s4++;
}
if(!*s4)
{
kfree(s2);

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}

*s3++=0;
*s=s4;

tmp=NetParse_NewNode();
tmp->next=NULL;
tmp->key=NULL;
tmp->text=kstrdup(s2);
tmp->attr=NULL;
tmp->first=NULL;

kfree(s2);

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}

s2=*s;
i=1;
while(*s2 && i)
{
if(*s2=='<')i++;
if(*s2=='>')i--;
if(*s2=='[')i++;
if(*s2==']')i--;
s2++;
}

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=s2;
return(NetParse_XML_ParseExpr(s));
}

*s=NetParse_XML_Token(*s, key, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if(ty!=TOKEN_SYMBOL)
{
kprint("parse error (inv tag).\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(NULL);
}

if(**s==':')
{
*s=*s+1;
strcpy(ns, key);
*s=NetParse_XML_Token(*s, key, &ty);
}else ns[0]=0;

if((**s>' ') && (**s!='>') && (**s!='/'))
{
kprint("parse error (inv char after tag).\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(NULL);
}
// kv=SYM(key);
// opts=NetParse_XML_ParseOpts(s);
// if(opts==MISC_UNDEFINED)return(t);

// tmp=kalloc(sizeof(NetParse_Node));
tmp=NetParse_NewNode();
tmp->next=NULL;
if(ns[0])tmp->ns=kstrdup(ns);
tmp->key=kstrdup(key);
s3=*s;
tmp->attr=NetParse_XML_ParseOpts(s);
if(!*s)
{
kprint("attr traunc\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
tmp->first=NULL;

*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if((buf[0]=='/') && (ty==TOKEN_SPECIAL))
{
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
return(tmp);
}
if((buf[0]=='?') && (ty==TOKEN_SPECIAL))
{
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
// x=CONS(kv, CONS(opts, MISC_EOL));
// x=CONS(SYM("?"), x);
strcpy(buf, "?");
strcat(buf, tmp->key);
kfree(tmp->key);
tmp->key=kstrdup(buf);

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}
if(buf[0]!='>')
{
kprint("parse error (expected close '>').\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(NULL);
}

end=NULL;
// x=MISC_EOL;
while(1)
{
s2=*s;
s2=NetParse_XML_Token(s2, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
s2=NetParse_XML_Token(s2, buf2, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}

if(buf[0]=='<' && buf2[0]=='/')
{
s2=NetParse_XML_Token(s2, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
s2=NetParse_XML_Token(s2, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
*s=s2;
break;
}
s3=*s;
t=NetParse_XML_ParseExpr(s);
if(*s==s3)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}

if(!t)return(t);
// x=CONS(t, x);
if(end)
{
end->next=t;
end=t;
}else
{
tmp->first=t;
end=t;
}
}
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}

s2=kalloc(65536);
*s=NetParse_XML_ParseText(*s, s2);
if(!*s)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}

// tmp=kalloc(sizeof(NetParse_Node));
tmp=NetParse_NewNode();
tmp->next=NULL;
tmp->key=NULL;
tmp->text=kstrdup(s2);
tmp->attr=NULL;
tmp->first=NULL;

kfree(s2);

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}

/*--
Cat pdlib;Parse;XML
Form
NetParse_Node *NetParse_XML_LoadFile(char *name);
Description
loads XML from a file.
returns NULL on failure.
--*/
NetParse_Node *NetParse_XML_LoadFile(char *name)
{
VFILE *fd;
char *buf, *s;
NetParse_Node *n;

fd=vffopen(name, "rb");
if(!fd)return(NULL);

buf=vf_bufferin(fd);
if(!buf)return(NULL);

s=buf;
while(*s)
{
n=NetParse_XML_ParseExpr(&s);
if(!n)break;
if(n->key[0]=='?')continue;
return(n);
}
return(NULL);
}

and part of the crap for dealing with parse trees:
----
/*--
Cat pdlib;Parse
Form
int NetParse_Init();
Description
Init function for NetParse, called implicitly by node/attr creation.
--*/
int NetParse_Init()
{
static int init=0;

if(init)return(1);
init=1;

ObjType_NewType("netparse_attr_t", "*struct;string;string;");
ObjType_NewType("netparse_node_t",
"*struct;string;string;*struct;*struct;");
return(0);
}

/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_NewAttr();
Description
Creates a new attribute.
--*/
NetParse_Attr *NetParse_NewAttr()
{
NetParse_Attr *tmp;

NetParse_Init();

// tmp=kalloc(sizeof(NetParse_Attr));
tmp=ObjType_New("netparse_attr_t", sizeof(NetParse_Attr));
tmp->next=NULL;
tmp->key=NULL;
tmp->value=NULL;

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_AddAttr(NetParse_Node *node, char *key, char
*value);
Description
Adds an attribute to a node (or sets the attribute if present).
--*/
NetParse_Attr *NetParse_AddAttr(NetParse_Node *node, char *key, char *value)
{
NetParse_Attr *tmp, *cur;

cur=node->attr;
while(cur)
{
if(!strcmp(cur->key, key))
{
if(cur->value)kfree(cur->value);
cur->value=kstrdup(value);
return(cur);
}
cur=cur->next;
}

// tmp=kalloc(sizeof(NetParse_Attr));
tmp=NetParse_NewAttr();
tmp->next=NULL;
tmp->key=kstrdup(key);
tmp->value=kstrdup(value);

if(!node->attr)
{
node->attr=tmp;
return(tmp);
}
cur=node->attr;
while(cur->next)cur=cur->next;
cur->next=tmp;

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_AddAttrList(NetParse_Attr *lst, char *key, char
*value);
Description
Adds an attribute to a list of attributes (or assigns the attribure \
if allready present).
Returns the start of the list, or the new attribute if lst is NULL.
--*/
NetParse_Attr *NetParse_AddAttrList(NetParse_Attr *lst, char *key, char
*value)
{
NetParse_Attr *tmp, *cur;

cur=lst;
while(cur)
{
if(!strcmp(cur->key, key))
{
if(cur->value)kfree(cur->value);
cur->value=kstrdup(value);
return(lst);
}
cur=cur->next;
}

// tmp=kalloc(sizeof(NetParse_Attr));
tmp=NetParse_NewAttr();
tmp->next=NULL;
tmp->key=kstrdup(key);
tmp->value=kstrdup(value);

if(!lst)return(tmp);

cur=lst;
while(cur->next)cur=cur->next;
cur->next=tmp;

return(lst);
}

/*--
Cat pdlib;Parse
Form
char *NetParse_GetNodeAttr(NetParse_Node *node, char *key);
Description
Gets an attribute associated with a node.
Returns NULL if not found.
--*/
char *NetParse_GetNodeAttr(NetParse_Node *node, char *key)
{
NetParse_Attr *cur;

cur=node->attr;
while(cur)
{
if(!strcmp(cur->key, key))return(cur->value);
cur=cur->next;
}
return(NULL);
}

/*--
Cat pdlib;Parse
Form
int NetParse_GetNodeAttrIsP(NetParse_Node *node, char *key, char *value);
Description
Check if a given node has a certain attribute as a certain value.
--*/
int NetParse_GetNodeAttrIsP(NetParse_Node *node, char *key, char *value)
{
NetParse_Attr *cur;

cur=node->attr;
while(cur)
{
if(!strcmp(cur->key, key))
{
if(!strcmp(cur->value, value))
return(1);
else return(0);
}
cur=cur->next;
}
return(0);
}

/*--
Cat pdlib;Parse
Form
char *NetParse_GetAttrList(NetParse_Attr *lst, char *key);
Description
Gets an attribute in a list.
Returns NULL if not found.
--*/
char *NetParse_GetAttrList(NetParse_Attr *lst, char *key)
{
NetParse_Attr *cur;

cur=lst;
while(cur)
{
if(!strcmp(cur->key, key))return(cur->value);
cur=cur->next;
}
return(NULL);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_NewNode();
Description
Creates a new node.
--*/
NetParse_Node *NetParse_NewNode()
{
NetParse_Node *tmp;

NetParse_Init();

// tmp=kalloc(sizeof(NetParse_Node));
tmp=ObjType_New("netparse_node_t", sizeof(NetParse_Node));
tmp->next=NULL;
tmp->key=NULL;
tmp->text=NULL;
tmp->attr=NULL;
tmp->first=NULL;

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_AddNodeEnd(NetParse_Node *first, NetParse_Node
*node);
Description
Adds a new node at the end of a list of nodes.
--*/
NetParse_Node *NetParse_AddNodeEnd(NetParse_Node *first, NetParse_Node
*node)
{
NetParse_Node *cur;

if(!first)return(node);

cur=first;
while(cur->next)cur=cur->next;
cur->next=node;

return(first);
}

/*--
Cat pdlib;Parse
Form
int NetParse_AddChildNode(NetParse_Node *parent, NetParse_Node *node);
Description
Add a new child node to a parent.
--*/
int NetParse_AddChildNode(NetParse_Node *parent,
NetParse_Node *node)
{
NetParse_Node *cur;

if(!parent->first)
{
parent->first=node;
return(0);
}

cur=parent->first;
while(cur->next)cur=cur->next;
cur->next=node;

return(0);
}

/*--
Cat pdlib;Parse
Form
int NetParse_FreeAttr(NetParse_Attr *attr);
Description
Frees an attribute.
Also frees any following attributes.
--*/
int NetParse_FreeAttr(NetParse_Attr *attr)
{
if(attr->next)NetParse_FreeAttr(attr->next);
if(attr->key)kfree(attr->key);
if(attr->value)kfree(attr->value);
kfree(attr);

return(0);
}

/*--
Cat pdlib;Parse
Form
int NetParse_FreeNode(NetParse_Node *node);
Description
Frees a node and any associated attributes.
Also frees any child nodes.
--*/
int NetParse_FreeNode(NetParse_Node *node)
{
NetParse_Node *cur, *next;

if(node->key)kfree(node->key);
if(node->text)kfree(node->text);
if(node->attr)NetParse_FreeAttr(node->attr);

cur=node->first;
while(cur)
{
next=cur->next;

if(cur->key)kfree(cur->key);
if(cur->text)kfree(cur->text);
if(cur->attr)NetParse_FreeAttr(cur->attr);
if(cur->first)NetParse_FreeNode(cur->first);
kfree(cur);

cur=next;
}
kfree(node);

return(0);
}

/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_CopyAttr(NetParse_Attr *attr);
Description
Copies an attribute along with any following attributes.
--*/
NetParse_Attr *NetParse_CopyAttr(NetParse_Attr *attr)
{
NetParse_Attr *tmp;

tmp=NetParse_NewAttr();

if(attr->next)
tmp->next=NetParse_CopyAttr(attr->next);
if(attr->key)
tmp->key=kstrdup(attr->key);
if(attr->value)
tmp->value=kstrdup(attr->value);

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_CopyNode(NetParse_Node *node);
Description
Makes a copy of a node tree, copies any attributes or children.
--*/
NetParse_Node *NetParse_CopyNode(NetParse_Node *node)
{
NetParse_Node *cur;
NetParse_Node *tmp, *lst, *end, *t2;

tmp=NetParse_NewNode();
if(node->key)
tmp->key=kstrdup(node->key);
if(node->text)
tmp->text=kstrdup(node->text);
if(node->attr)
tmp->attr=NetParse_CopyAttr(node->attr);

lst=NULL;
end=NULL;
cur=node->first;
while(cur)
{
t2=NetParse_CopyNode(cur);
if(end)end->next=t2;
end=t2;

if(!lst)lst=end;
cur=cur->next;
}
tmp->first=lst;

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_FindKey(NetParse_Node *first, char *key);
Description
Finds a node in a list with a given key.
Returns NULL if not found.
--*/
NetParse_Node *NetParse_FindKey(NetParse_Node *first, char *key)
{
NetParse_Node *cur;

cur=first;
while(cur)
{
if(cur->key)
if(!strcmp(cur->key, key))
return(cur);
cur=cur->next;
}
return(NULL);
}

//abstract interface funcs

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_NewNodeKey(char *ns, char *key);
Description
Creates a new node with a given namespace prefix and key.
ns may be NULL in most cases (the tag does not have a namespace \
prefix).
--*/
NetParse_Node *NetParse_NewNodeKey(char *ns, char *key)
{
NetParse_Node *tmp;

tmp=NetParse_NewNode();
tmp->key=kstrdup(key);

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_NewNodeText(char *text);
Description
Creates a new text node with the contents given.
--*/
NetParse_Node *NetParse_NewNodeText(char *text)
{
NetParse_Node *tmp;

tmp=NetParse_NewNode();
tmp->text=kstrdup(text);

return(tmp);
}

/*--
Cat pdlib;Parse
Form
char *NetParse_GetNodeNS(NetParse_Node *node);
char *NetParse_GetNodeKey(NetParse_Node *node);
char *NetParse_GetNodeText(NetParse_Node *node);
NetParse_Node *NetParse_GetNodeFirst(NetParse_Node *node);
NetParse_Node *NetParse_GetNodeNext(NetParse_Node *node);
Description
Get a property of a node, each will return NULL in the case that \
the given property is not present.
--*/
char *NetParse_GetNodeNS(NetParse_Node *node)
{
return(node->ns);
}

char *NetParse_GetNodeKey(NetParse_Node *node)
{
return(node->key);
}

char *NetParse_GetNodeText(NetParse_Node *node)
{
return(node->text);
}

NetParse_Node *NetParse_GetNodeFirst(NetParse_Node *node)
{
return(node->first);
}

NetParse_Node *NetParse_GetNodeNext(NetParse_Node *node)
{
return(node->next);
}

/*--
Cat pdlib;Parse
Form
int NetParse_SetNodeNS(NetParse_Node *node, char *value);
int NetParse_SetNodeKey(NetParse_Node *node, char *value);
int NetParse_SetNodeText(NetParse_Node *node, char *value);
int NetParse_SetNodeFirst(NetParse_Node *node, NetParse_Node *node2);
int NetParse_SetNodeNext(NetParse_Node *node, NetParse_Node *node2);
Description
Set a property of a node, the return value will be 0 if no errors \
occure.
--*/
int NetParse_SetNodeNS(NetParse_Node *node, char *value)
{
node->ns=kstrdup(value);
return(0);
}

int NetParse_SetNodeKey(NetParse_Node *node, char *value)
{
node->key=kstrdup(value);
return(0);
}

int NetParse_SetNodeText(NetParse_Node *node, char *value)
{
node->text=kstrdup(value);
return(0);
}

int NetParse_SetNodeFirst(NetParse_Node *node, NetParse_Node *node2)
{
node->first=node2;
return(0);
}

int NetParse_SetNodeNext(NetParse_Node *node, NetParse_Node *node2)
{
node->next=node2;
return(0);
}

Jul 20 '05 #16

P: n/a
In article <10*************@corp.supernews.com>, Mike wrote:
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.

Mike


Thanks for all the replies. I have chosen and am using mxml.

Mike
Jul 20 '05 #17

This discussion thread is closed

Replies have been disabled for this discussion.