473,387 Members | 1,532 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

minimal xml parser?

Does anyone know of a minimal/mini/tiny/small xml parser
in c? I'm looking for something small that accepts a stream
or string, builds a c structure, and then returns an opaque
pointer to that structure. There should then be a function
to search that structure given the pointer, tag, and an
optional attribute. I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).

Thanks.

Mike
Jul 20 '05 #1
16 6277
"Mike" <mi***@mikee.ath.cx> wrote in message
news:10*************@corp.supernews.com...
I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).


XML does not support "binary data" as the term is commonly used. All data
within an XML instance must be valid per the specified character encoding.

You should read the relevant sections of the XML specification before
determining if XML is an appropriate representation for your requirements.

/kmc
Jul 20 '05 #2
In article <k4********************@comcast.com>, Keith M. Corbett wrote:
"Mike" <mi***@mikee.ath.cx> wrote in message
news:10*************@corp.supernews.com...
I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).


XML does not support "binary data" as the term is commonly used. All data
within an XML instance must be valid per the specified character encoding.

You should read the relevant sections of the XML specification before
determining if XML is an appropriate representation for your requirements.

/kmc


XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.

Mike
Jul 20 '05 #3
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.

--
William Park <op**********@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Jul 20 '05 #4
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.

Mike
Jul 20 '05 #5
In article <10*************@corp.supernews.com>,
Mike <mi***@mikee.ath.cx> wrote:
% In article <2u*************@uni-berlin.de>, William Park wrote:
% > Mike <mi***@mikee.ath.cx> wrote:
% >> XML has been chosen, I need to write the parser. Oh, and I do not have
% >> to validate the XML, just parse it.
% >
% > Expat (www.libexpat.org). Practically every language has some sort of
% > support for it, even Bash shell.
% >
%
% Thanks for the expat suggestion. I have also read for libxml. I'd like to
% find a few hundred lines of c code to do this.

I expect that you won't find a conforming XML parser which is a only few
hundred lines long. The smallest conforming parsers I know of are expat
and rxp, and they're in the thousands of lines. There's also tinyxml, which
is not a conforming parser, and which is still in the thousands of lines.

Although the tempation to write a minimal ``parser'' yourself may be
overwhelming, I think you're better off using an existing, conforming,
parser. Otherwise, you will almost certainly end up with a system that
rejects valid XML files, and what's the good of that?
I think you're looking for something like rxp's API.
--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Jul 20 '05 #6
Mike wrote:
Does anyone know of a minimal/mini/tiny/small xml parser
in c? I'm looking for something small that accepts a stream
or string, builds a c structure, and then returns an opaque
pointer to that structure. There should then be a function
to search that structure given the pointer, tag, and an
optional attribute. I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).


You could try Mini-XML. See

http://www.easysw.com/~mike/mxml/

--
To reply by e-mail, please remove the extra dot
in the given address: m.collado -> mcollado

Jul 20 '05 #7
On Mon, 25 Oct 2004 23:50:10 -0000, Mike <mi***@mikee.ath.cx> wrote:
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.

Mike

you really need the source code? I'm sure you could find a parser in library form ready for you to
use.
Jul 20 '05 #8
Mike wrote:
Does anyone know of a minimal/mini/tiny/small xml parser
in c? I'm looking for something small that accepts a stream
or string, builds a c structure, and then returns an opaque
pointer to that structure. There should then be a function
to search that structure given the pointer, tag, and an
optional attribute. I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).


My Mini-XML library might be what you are looking for:

http://www.easysw.com/~mike/mxml/

--
__________________________________________________ ____________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com
Jul 20 '05 #9
Patrick TJ McPhee wrote:
...
I expect that you won't find a conforming XML parser which is a only
few hundred lines long. The smallest conforming parsers I know of are
expat and rxp, and they're in the thousands of lines. There's also
tinyxml, which is not a conforming parser, and which is still in the
thousands of lines.
...


It is a myth that conforming XML parsers have to be big; *validating*
parsers, perhaps, but not a simple non-validating parser which
accepts XML syntax and encoding.

Mini-XML started as 696 lines of C code (it has since grown to a
little over 2700 lines of code) and is a fully conformant XML
parser that provides everything except validation (and I'm thinking
how I could do that without bloating it...)

--
__________________________________________________ ____________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com
Jul 20 '05 #10
Mike <mi***@mikee.ath.cx> wrote:
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.


Are you talking about actually doing the parsing (duplicating what Expat
does), or just calling API functions?

If former, then I doubt there is one. If latter, then Gawk, Python,
Bash, all have a binding to Expat.

--
William Park <op**********@yahoo.ca>
Open Geometry Consulting, Toronto, Canada
Jul 20 '05 #11
In article <41**************@easysw.com>,
Michael Sweet <mi**@easysw.com> wrote:

% Patrick TJ McPhee wrote:
% > ...
% > I expect that you won't find a conforming XML parser which is a only
% > few hundred lines long.

[...]

% It is a myth that conforming XML parsers have to be big; *validating*
% parsers, perhaps, but not a simple non-validating parser which
% accepts XML syntax and encoding.
%
% Mini-XML started as 696 lines of C code (it has since grown to a

Which is to say, more than a few hundred lines, and it seems like
it wasn't conforming at that.

% little over 2700 lines of code) and is a fully conformant XML

Which is to say, thousands of lines.
--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Jul 20 '05 #12
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
In article <2u*************@uni-berlin.de>, William Park wrote:
> Mike <mi***@mikee.ath.cx> wrote:
>> XML has been chosen, I need to write the parser. Oh, and I do not have
>> to validate the XML, just parse it.
>
> Expat (www.libexpat.org). Practically every language has some sort of
> support for it, even Bash shell.
>


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.


Are you talking about actually doing the parsing (duplicating what Expat
does), or just calling API functions?

If former, then I doubt there is one. If latter, then Gawk, Python,
Bash, all have a binding to Expat.


I'm talking about the actual parsing.
Jul 20 '05 #13
Patrick TJ McPhee wrote:
In article <41**************@easysw.com>,
Michael Sweet <mi**@easysw.com> wrote:

% Patrick TJ McPhee wrote:
% > ...
% > I expect that you won't find a conforming XML parser which is a only
% > few hundred lines long.

[...]

% It is a myth that conforming XML parsers have to be big; *validating*
% parsers, perhaps, but not a simple non-validating parser which
% accepts XML syntax and encoding.
%
% Mini-XML started as 696 lines of C code (it has since grown to a

Which is to say, more than a few hundred lines, and it seems like
it wasn't conforming at that.
Actually, it was, however features were added to make it perform
better and support more use cases.
% little over 2700 lines of code) and is a fully conformant XML

Which is to say, thousands of lines.


But still a tiny fraction of the size of other XML parsers out
there...

--
__________________________________________________ ____________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com
Jul 20 '05 #14
In article <41**************@easysw.com>,
Michael Sweet <mi**@easysw.com> wrote:

[I wrote]

% > Which is to say, thousands of lines.
%
% But still a tiny fraction of the size of other XML parsers out
% there...

Except for the ones I cited in the post you seemed to contradict,
which are roughly the same size.

--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Jul 20 '05 #15

"Mike" <mi***@mikee.ath.cx> wrote in message
news:10*************@corp.supernews.com...
Does anyone know of a minimal/mini/tiny/small xml parser
in c? I'm looking for something small that accepts a stream
or string, builds a c structure, and then returns an opaque
pointer to that structure. There should then be a function
to search that structure given the pointer, tag, and an
optional attribute. I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).

oh well, this thread is new enough that I think I will add my comment.

if motivated, maybe my parser could be made to work in your case.
kalloc/kfree are for my allocator.
kralloc is a rotating allocator (allocates from a large circular buffer),
and thus does not need freeing.

ObjType_New can be replaced by kalloc (or malloc if needed).

be warned if replacing kalloc or such with malloc in that it will be
necessary to zero memory returned by malloc (not necissarily done by malloc
by default).

I ommited, eg, the printer here though...

part of the header:
----
#define TOKEN_NULL 0
#define TOKEN_SPECIAL 1
#define TOKEN_STRING 2
#define TOKEN_SYMBOL 3

typedef struct NetParse_Attr_s NetParse_Attr;
typedef struct NetParse_Node_s NetParse_Node;

struct NetParse_Attr_s {
NetParse_Attr *next;
char *ns;
char *key;
char *value;
};

struct NetParse_Node_s {
NetParse_Node *next;
char *ns;
char *key;
char *text;
NetParse_Attr *attr;
NetParse_Node *first;
};
dump of part of my parser:
----
/*--
Cat pdlib;Parse;XML
Form
char *NetParse_XML_EatWhite(char *s);
Description
Skips over whitespace.
Status Internal
--*/
char *NetParse_XML_EatWhite(char *s)
{
int i, r;

i=0;
while(*s && *s<=' ')
{
if(*s=='\n')
{
line++;
*s=' ';
}
i=1;
s++;
}

if(i)s=NetParse_XML_EatWhite(s);

return(s);
}

/*--
Cat pdlib;Parse;XML
Form
int NetParse_XML_SpecialP(char *s);
Description
Returns a nonzero value if *s is special.
Status Internal
--*/
int NetParse_XML_SpecialP(char *s)
{
switch(*s)
{
case '<':
return(1);
break;
case '>':
return(1);
break;
case '/':
return(1);
break;
case '=':
return(1);
break;
case '?':
return(1);
break;
case ':':
return(1);
break;
default:
return(0);
break;
}
return(0);
}

/*--
Cat pdlib;Parse;XML
Form
int NetParse_XML_ContSpecialP(char *s);
Description
Returns nonzero if this will get the parsers attention when reading as
text.
This includes '<' and '&'.
Status Internal
--*/
int NetParse_XML_ContSpecialP(char *s)
{
switch(*s)
{
case '<':
return(1);
break;
case '&':
return(1);
break;
default:
return(0);
break;
}
return(0);
}

/*--
Cat pdlib;Parse;XML
Form
char *NetParse_XML_Token(char *s, char *b, int *t);
Description
Reads a token from the XML stream.
This includes:
Individual symbols;
Globs of text/tags;
Strings.
b is the buffer.
t is an integer to hold the token type
TOKEN_NULL, a null terminator was reached;
TOKEN_SPECIAL, a special character.
TOKEN_STRING, a quoted string literal (escapes processed).
TOKEN_SYMBOL, an unquoted bit of text (eg: a tag).
Returns the next character after the token.
Status Internal
--*/
char *NetParse_XML_Token(char *s, char *b, int *t)
{
char *ob, *is, *t2;
char *buf;
int i;

is=s;
if(!b)b=kralloc(256);
ob=b;
*b=0;

if(t)*t=0;

buf=kralloc(16);

s=NetParse_XML_EatWhite(s);
if(!*s)
{
*t=TOKEN_NULL;
return(s);
}

if(NetParse_XML_SpecialP(s))
{
if(t)*t=TOKEN_SPECIAL;

*b++=*s++;
*b=0;
}else if((*s=='"') || (*s=='\'')) /* quoted string */
{
if(t)*t=TOKEN_STRING;
s++;
while(*s && (*s!='"') && (*s!='\''))
{
if(*s=='&')
{
s++;
t2=buf;
while(*s && (*s!=';'))*t2++=*s++;
if(!*s)return(NULL);
*t2++=0;
s++;

if(buf[0]=='#')
{
if(buf[1]=='x')
{
t=buf+2;
i=0;
while(*t)
{
i<<=4;
if((*t>='0') && (*t<='9'))
i+=*t-'0';
if((*t>='A') && (*t<='F'))
i+=*t-'A'+10;
if((*t>='a') && (*t<='f'))
i+=*t-'a'+10;
t++;
}
*b++=i;
}else *b++=atoi(buf+1);
}
if(!strcmp(buf, "amp"))*b++='&';
if(!strcmp(buf, "lt"))*b++='<';
if(!strcmp(buf, "gt"))*b++='>';
if(!strcmp(buf, "quot"))*b++='"';
if(!strcmp(buf, "apos"))*b++='\'';
}else *b++=*s++;
}
if(!*s)
{
*t=TOKEN_NULL;
return(is);
}
*b++=0;
s++;
}else
{
if(t)*t=TOKEN_SYMBOL;

while(*s && (*s>' ') && !NetParse_XML_SpecialP(s) &&
((b-ob)<254))
*b++=*s++;
*b++=0;

if(!*s)
{
*t=TOKEN_NULL;
return(is);
}
}
return(s);
}

/*--
Cat pdlib;Parse;XML
Form
char *NetParse_XML_ParseText(char *s, char *b);
Description
Parse a glob of text from the stream.
Handles escapes and such.
Status Internal
--*/
char *NetParse_XML_ParseText(char *s, char *b)
{
char *ob, *t;
char buf[16];
int i, gws, rws;

if(!b)b=kralloc(4096);
ob=b;
*b=0;

s=NetParse_XML_EatWhite(s);
if(!*s)return(NULL);

gws=0;
rws=0;
while(1)
{
while(*s && !NetParse_XML_ContSpecialP(s))
{
if((*s=='\r') || (*s=='\n'))
{
s=NetParse_XML_EatWhite(s);
if(!rws)
{
*b++=' ';
gws++;
}
continue;
}
gws=0;
if(*s<=' ')rws++;
else rws=0;
*b++=*s++;
}
if(!*s)return(NULL);

if(*s=='&')
{
s++;
t=buf;
while(*s && (*s!=';'))*t++=*s++;
if(!*s)return(NULL);
*t++=0;
s++;

if(buf[0]=='#')
{
if(buf[1]=='x')
{
t=buf+2;
i=0;
while(*t)
{
i<<=4;
if((*t>='0') && (*t<='9'))
i+=*t-'0';
if((*t>='A') && (*t<='F'))
i+=*t-'A'+10;
if((*t>='a') && (*t<='f'))
i+=*t-'a'+10;
t++;
}
gws=0;
if(i<=' ')rws++;
else rws=0;
*b++=i;
}else
{
i=atoi(buf+1);
gws=0;
if(i<=' ')rws++;
else rws=0;
*b++=i;
}
continue;
}
rws=0;
gws=0;

if(!strcmp(buf, "amp"))*b++='&';
if(!strcmp(buf, "lt"))*b++='<';
if(!strcmp(buf, "gt"))*b++='>';
if(!strcmp(buf, "apos"))*b++='\'';
if(!strcmp(buf, "quot"))*b++='"';
}else break;
}
b-=gws;
*b++=0;

return(s);
}

/*--
Cat pdlib;Parse;XML
Form
NetParse_Attr *NetParse_XML_ParseOpts(char **s);
Description
Parse the list of attributes within a tag.
Status Internal
--*/
NetParse_Attr *NetParse_XML_ParseOpts(char **s)
{
// char ns[32];
// char var[32];
// char eq[16];
// char val[256];
char *is, *ns, *var, *eq, *val;
int ty;
NetParse_Attr *lst, *end, *tmp;

ns=kralloc(256);
var=kralloc(256);
eq=kralloc(256);
val=kralloc(4096);

lst=NULL;
end=NULL;

is=*s;
while(1)
{
NetParse_XML_Token(*s, var, &ty);
if(ty==TOKEN_NULL)
{
kprint("m1\n");
*s=NULL;
return(NULL);
}

if((var[0]=='>') && (ty==TOKEN_SPECIAL))
break;
if((var[0]=='/') && (ty==TOKEN_SPECIAL))
break;
if((var[0]=='?') && (ty==TOKEN_SPECIAL))
break;
if(ty==TOKEN_NULL)
{
kprint("m2\n");
*s=NULL;
return(NULL);
}
if(ty!=TOKEN_SYMBOL)
{
kprint("parse error (inv attribute).\n");
return(NULL);
}

*s=NetParse_XML_Token(*s, var, &ty);
if(ty==TOKEN_NULL)
{
kprint("m3\n");
*s=NULL;
return(NULL);
}
*s=NetParse_XML_Token(*s, eq, &ty);
if(ty==TOKEN_NULL)
{
kprint("m4\n");
*s=NULL;
return(NULL);
}

if((ty==TOKEN_SPECIAL) && (eq[0]==':'))
{
strcpy(ns, var);

*s=NetParse_XML_Token(*s, var, &ty);
if(ty==TOKEN_NULL)
{
kprint("m41\n");
*s=NULL;
return(NULL);
}
*s=NetParse_XML_Token(*s, eq, &ty);
if(ty==TOKEN_NULL)
{
kprint("m42\n");
*s=NULL;
return(NULL);
}
}else ns[0]=0;

if((ty!=TOKEN_SPECIAL) || (eq[0]!='='))
{
kprint("parse error (attr equal).\n");
return(NULL);
}

*s=NetParse_XML_Token(*s, val, &ty);
if(ty==TOKEN_NULL)
{
kprint("m5\n");
*s=NULL;
return(NULL);
}

if(ty!=TOKEN_STRING)
{
kprint("parse error (inv attribute arg).\n");
return(NULL);
}

// t=CONS(SYM(var), CONS(STRING(val), MISC_EOL));
// x=CONS(t, x);
// tmp=kalloc(sizeof(NetParse_Attr));
tmp=NetParse_NewAttr();
tmp->next=NULL;
if(ns[0])tmp->ns=kstrdup(ns);
tmp->key=kstrdup(var);
tmp->value=kstrdup(val);

if(end)
{
end->next=tmp;
end=tmp;
}else
{
lst=tmp;
end=tmp;
}
}

return(lst);
}

/*--
Cat pdlib;Parse;XML
Form
NetParse_Node *NetParse_XML_ParseExpr(char **s);
Description
Parses an XML expression.
s is updated to reflect the change.

NULL is returned on parse error or end-of-stream.
s is not updated for end of stream conditions, which can be used to
seperate it from a parse error.
--*/
NetParse_Node *NetParse_XML_ParseExpr(char **s)
{
// char buf[256], buf2[16];
// char key[32], ns[32];
char *buf, *buf2, *key, *ns;

int ty, i;
char *s2, *s3, *s4, *is;

// elem kv, opts, t, x;
NetParse_Node *tmp, *t, *end;

is=*s;
*s=NetParse_XML_EatWhite(*s);
if(!*(*s))return(NULL);

buf=kalloc(256);
buf2=kalloc(256);
key=kalloc(256);
ns=kalloc(256);

// strncpy(buf, *s, 5);
// buf[5]=0;
// kprint("parse: %s\n", buf);

NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if((buf[0]=='<') && (ty==TOKEN_SPECIAL))
{
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if(*s[0]=='?')
*s=*s+1;
if(*s[0]=='!')
{
if(!strncmp(*s, "[CDATA[", 7))
{
*s=*s+7;
s2=kalloc(65536);
s3=s2;
s4=*s;
while(*s4)
{
if(!strncmp(s4, "]]>", 3))
{
s4+=3;
break;
}
if(!strncmp(s4, "]]&gt;", 6))
{
s4+=6;

*s3++=']';
*s3++=']';
*s3++='>';
continue;
}
*s3++=*s4++;
}
if(!*s4)
{
kfree(s2);

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}

*s3++=0;
*s=s4;

tmp=NetParse_NewNode();
tmp->next=NULL;
tmp->key=NULL;
tmp->text=kstrdup(s2);
tmp->attr=NULL;
tmp->first=NULL;

kfree(s2);

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}

s2=*s;
i=1;
while(*s2 && i)
{
if(*s2=='<')i++;
if(*s2=='>')i--;
if(*s2=='[')i++;
if(*s2==']')i--;
s2++;
}

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=s2;
return(NetParse_XML_ParseExpr(s));
}

*s=NetParse_XML_Token(*s, key, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if(ty!=TOKEN_SYMBOL)
{
kprint("parse error (inv tag).\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(NULL);
}

if(**s==':')
{
*s=*s+1;
strcpy(ns, key);
*s=NetParse_XML_Token(*s, key, &ty);
}else ns[0]=0;

if((**s>' ') && (**s!='>') && (**s!='/'))
{
kprint("parse error (inv char after tag).\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(NULL);
}
// kv=SYM(key);
// opts=NetParse_XML_ParseOpts(s);
// if(opts==MISC_UNDEFINED)return(t);

// tmp=kalloc(sizeof(NetParse_Node));
tmp=NetParse_NewNode();
tmp->next=NULL;
if(ns[0])tmp->ns=kstrdup(ns);
tmp->key=kstrdup(key);
s3=*s;
tmp->attr=NetParse_XML_ParseOpts(s);
if(!*s)
{
kprint("attr traunc\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
tmp->first=NULL;

*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if((buf[0]=='/') && (ty==TOKEN_SPECIAL))
{
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
return(tmp);
}
if((buf[0]=='?') && (ty==TOKEN_SPECIAL))
{
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
// x=CONS(kv, CONS(opts, MISC_EOL));
// x=CONS(SYM("?"), x);
strcpy(buf, "?");
strcat(buf, tmp->key);
kfree(tmp->key);
tmp->key=kstrdup(buf);

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}
if(buf[0]!='>')
{
kprint("parse error (expected close '>').\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(NULL);
}

end=NULL;
// x=MISC_EOL;
while(1)
{
s2=*s;
s2=NetParse_XML_Token(s2, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
s2=NetParse_XML_Token(s2, buf2, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}

if(buf[0]=='<' && buf2[0]=='/')
{
s2=NetParse_XML_Token(s2, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
s2=NetParse_XML_Token(s2, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
*s=s2;
break;
}
s3=*s;
t=NetParse_XML_ParseExpr(s);
if(*s==s3)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}

if(!t)return(t);
// x=CONS(t, x);
if(end)
{
end->next=t;
end=t;
}else
{
tmp->first=t;
end=t;
}
}
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}

s2=kalloc(65536);
*s=NetParse_XML_ParseText(*s, s2);
if(!*s)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}

// tmp=kalloc(sizeof(NetParse_Node));
tmp=NetParse_NewNode();
tmp->next=NULL;
tmp->key=NULL;
tmp->text=kstrdup(s2);
tmp->attr=NULL;
tmp->first=NULL;

kfree(s2);

kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}

/*--
Cat pdlib;Parse;XML
Form
NetParse_Node *NetParse_XML_LoadFile(char *name);
Description
loads XML from a file.
returns NULL on failure.
--*/
NetParse_Node *NetParse_XML_LoadFile(char *name)
{
VFILE *fd;
char *buf, *s;
NetParse_Node *n;

fd=vffopen(name, "rb");
if(!fd)return(NULL);

buf=vf_bufferin(fd);
if(!buf)return(NULL);

s=buf;
while(*s)
{
n=NetParse_XML_ParseExpr(&s);
if(!n)break;
if(n->key[0]=='?')continue;
return(n);
}
return(NULL);
}

and part of the crap for dealing with parse trees:
----
/*--
Cat pdlib;Parse
Form
int NetParse_Init();
Description
Init function for NetParse, called implicitly by node/attr creation.
--*/
int NetParse_Init()
{
static int init=0;

if(init)return(1);
init=1;

ObjType_NewType("netparse_attr_t", "*struct;string;string;");
ObjType_NewType("netparse_node_t",
"*struct;string;string;*struct;*struct;");
return(0);
}

/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_NewAttr();
Description
Creates a new attribute.
--*/
NetParse_Attr *NetParse_NewAttr()
{
NetParse_Attr *tmp;

NetParse_Init();

// tmp=kalloc(sizeof(NetParse_Attr));
tmp=ObjType_New("netparse_attr_t", sizeof(NetParse_Attr));
tmp->next=NULL;
tmp->key=NULL;
tmp->value=NULL;

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_AddAttr(NetParse_Node *node, char *key, char
*value);
Description
Adds an attribute to a node (or sets the attribute if present).
--*/
NetParse_Attr *NetParse_AddAttr(NetParse_Node *node, char *key, char *value)
{
NetParse_Attr *tmp, *cur;

cur=node->attr;
while(cur)
{
if(!strcmp(cur->key, key))
{
if(cur->value)kfree(cur->value);
cur->value=kstrdup(value);
return(cur);
}
cur=cur->next;
}

// tmp=kalloc(sizeof(NetParse_Attr));
tmp=NetParse_NewAttr();
tmp->next=NULL;
tmp->key=kstrdup(key);
tmp->value=kstrdup(value);

if(!node->attr)
{
node->attr=tmp;
return(tmp);
}
cur=node->attr;
while(cur->next)cur=cur->next;
cur->next=tmp;

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_AddAttrList(NetParse_Attr *lst, char *key, char
*value);
Description
Adds an attribute to a list of attributes (or assigns the attribure \
if allready present).
Returns the start of the list, or the new attribute if lst is NULL.
--*/
NetParse_Attr *NetParse_AddAttrList(NetParse_Attr *lst, char *key, char
*value)
{
NetParse_Attr *tmp, *cur;

cur=lst;
while(cur)
{
if(!strcmp(cur->key, key))
{
if(cur->value)kfree(cur->value);
cur->value=kstrdup(value);
return(lst);
}
cur=cur->next;
}

// tmp=kalloc(sizeof(NetParse_Attr));
tmp=NetParse_NewAttr();
tmp->next=NULL;
tmp->key=kstrdup(key);
tmp->value=kstrdup(value);

if(!lst)return(tmp);

cur=lst;
while(cur->next)cur=cur->next;
cur->next=tmp;

return(lst);
}

/*--
Cat pdlib;Parse
Form
char *NetParse_GetNodeAttr(NetParse_Node *node, char *key);
Description
Gets an attribute associated with a node.
Returns NULL if not found.
--*/
char *NetParse_GetNodeAttr(NetParse_Node *node, char *key)
{
NetParse_Attr *cur;

cur=node->attr;
while(cur)
{
if(!strcmp(cur->key, key))return(cur->value);
cur=cur->next;
}
return(NULL);
}

/*--
Cat pdlib;Parse
Form
int NetParse_GetNodeAttrIsP(NetParse_Node *node, char *key, char *value);
Description
Check if a given node has a certain attribute as a certain value.
--*/
int NetParse_GetNodeAttrIsP(NetParse_Node *node, char *key, char *value)
{
NetParse_Attr *cur;

cur=node->attr;
while(cur)
{
if(!strcmp(cur->key, key))
{
if(!strcmp(cur->value, value))
return(1);
else return(0);
}
cur=cur->next;
}
return(0);
}

/*--
Cat pdlib;Parse
Form
char *NetParse_GetAttrList(NetParse_Attr *lst, char *key);
Description
Gets an attribute in a list.
Returns NULL if not found.
--*/
char *NetParse_GetAttrList(NetParse_Attr *lst, char *key)
{
NetParse_Attr *cur;

cur=lst;
while(cur)
{
if(!strcmp(cur->key, key))return(cur->value);
cur=cur->next;
}
return(NULL);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_NewNode();
Description
Creates a new node.
--*/
NetParse_Node *NetParse_NewNode()
{
NetParse_Node *tmp;

NetParse_Init();

// tmp=kalloc(sizeof(NetParse_Node));
tmp=ObjType_New("netparse_node_t", sizeof(NetParse_Node));
tmp->next=NULL;
tmp->key=NULL;
tmp->text=NULL;
tmp->attr=NULL;
tmp->first=NULL;

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_AddNodeEnd(NetParse_Node *first, NetParse_Node
*node);
Description
Adds a new node at the end of a list of nodes.
--*/
NetParse_Node *NetParse_AddNodeEnd(NetParse_Node *first, NetParse_Node
*node)
{
NetParse_Node *cur;

if(!first)return(node);

cur=first;
while(cur->next)cur=cur->next;
cur->next=node;

return(first);
}

/*--
Cat pdlib;Parse
Form
int NetParse_AddChildNode(NetParse_Node *parent, NetParse_Node *node);
Description
Add a new child node to a parent.
--*/
int NetParse_AddChildNode(NetParse_Node *parent,
NetParse_Node *node)
{
NetParse_Node *cur;

if(!parent->first)
{
parent->first=node;
return(0);
}

cur=parent->first;
while(cur->next)cur=cur->next;
cur->next=node;

return(0);
}

/*--
Cat pdlib;Parse
Form
int NetParse_FreeAttr(NetParse_Attr *attr);
Description
Frees an attribute.
Also frees any following attributes.
--*/
int NetParse_FreeAttr(NetParse_Attr *attr)
{
if(attr->next)NetParse_FreeAttr(attr->next);
if(attr->key)kfree(attr->key);
if(attr->value)kfree(attr->value);
kfree(attr);

return(0);
}

/*--
Cat pdlib;Parse
Form
int NetParse_FreeNode(NetParse_Node *node);
Description
Frees a node and any associated attributes.
Also frees any child nodes.
--*/
int NetParse_FreeNode(NetParse_Node *node)
{
NetParse_Node *cur, *next;

if(node->key)kfree(node->key);
if(node->text)kfree(node->text);
if(node->attr)NetParse_FreeAttr(node->attr);

cur=node->first;
while(cur)
{
next=cur->next;

if(cur->key)kfree(cur->key);
if(cur->text)kfree(cur->text);
if(cur->attr)NetParse_FreeAttr(cur->attr);
if(cur->first)NetParse_FreeNode(cur->first);
kfree(cur);

cur=next;
}
kfree(node);

return(0);
}

/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_CopyAttr(NetParse_Attr *attr);
Description
Copies an attribute along with any following attributes.
--*/
NetParse_Attr *NetParse_CopyAttr(NetParse_Attr *attr)
{
NetParse_Attr *tmp;

tmp=NetParse_NewAttr();

if(attr->next)
tmp->next=NetParse_CopyAttr(attr->next);
if(attr->key)
tmp->key=kstrdup(attr->key);
if(attr->value)
tmp->value=kstrdup(attr->value);

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_CopyNode(NetParse_Node *node);
Description
Makes a copy of a node tree, copies any attributes or children.
--*/
NetParse_Node *NetParse_CopyNode(NetParse_Node *node)
{
NetParse_Node *cur;
NetParse_Node *tmp, *lst, *end, *t2;

tmp=NetParse_NewNode();
if(node->key)
tmp->key=kstrdup(node->key);
if(node->text)
tmp->text=kstrdup(node->text);
if(node->attr)
tmp->attr=NetParse_CopyAttr(node->attr);

lst=NULL;
end=NULL;
cur=node->first;
while(cur)
{
t2=NetParse_CopyNode(cur);
if(end)end->next=t2;
end=t2;

if(!lst)lst=end;
cur=cur->next;
}
tmp->first=lst;

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_FindKey(NetParse_Node *first, char *key);
Description
Finds a node in a list with a given key.
Returns NULL if not found.
--*/
NetParse_Node *NetParse_FindKey(NetParse_Node *first, char *key)
{
NetParse_Node *cur;

cur=first;
while(cur)
{
if(cur->key)
if(!strcmp(cur->key, key))
return(cur);
cur=cur->next;
}
return(NULL);
}

//abstract interface funcs

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_NewNodeKey(char *ns, char *key);
Description
Creates a new node with a given namespace prefix and key.
ns may be NULL in most cases (the tag does not have a namespace \
prefix).
--*/
NetParse_Node *NetParse_NewNodeKey(char *ns, char *key)
{
NetParse_Node *tmp;

tmp=NetParse_NewNode();
tmp->key=kstrdup(key);

return(tmp);
}

/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_NewNodeText(char *text);
Description
Creates a new text node with the contents given.
--*/
NetParse_Node *NetParse_NewNodeText(char *text)
{
NetParse_Node *tmp;

tmp=NetParse_NewNode();
tmp->text=kstrdup(text);

return(tmp);
}

/*--
Cat pdlib;Parse
Form
char *NetParse_GetNodeNS(NetParse_Node *node);
char *NetParse_GetNodeKey(NetParse_Node *node);
char *NetParse_GetNodeText(NetParse_Node *node);
NetParse_Node *NetParse_GetNodeFirst(NetParse_Node *node);
NetParse_Node *NetParse_GetNodeNext(NetParse_Node *node);
Description
Get a property of a node, each will return NULL in the case that \
the given property is not present.
--*/
char *NetParse_GetNodeNS(NetParse_Node *node)
{
return(node->ns);
}

char *NetParse_GetNodeKey(NetParse_Node *node)
{
return(node->key);
}

char *NetParse_GetNodeText(NetParse_Node *node)
{
return(node->text);
}

NetParse_Node *NetParse_GetNodeFirst(NetParse_Node *node)
{
return(node->first);
}

NetParse_Node *NetParse_GetNodeNext(NetParse_Node *node)
{
return(node->next);
}

/*--
Cat pdlib;Parse
Form
int NetParse_SetNodeNS(NetParse_Node *node, char *value);
int NetParse_SetNodeKey(NetParse_Node *node, char *value);
int NetParse_SetNodeText(NetParse_Node *node, char *value);
int NetParse_SetNodeFirst(NetParse_Node *node, NetParse_Node *node2);
int NetParse_SetNodeNext(NetParse_Node *node, NetParse_Node *node2);
Description
Set a property of a node, the return value will be 0 if no errors \
occure.
--*/
int NetParse_SetNodeNS(NetParse_Node *node, char *value)
{
node->ns=kstrdup(value);
return(0);
}

int NetParse_SetNodeKey(NetParse_Node *node, char *value)
{
node->key=kstrdup(value);
return(0);
}

int NetParse_SetNodeText(NetParse_Node *node, char *value)
{
node->text=kstrdup(value);
return(0);
}

int NetParse_SetNodeFirst(NetParse_Node *node, NetParse_Node *node2)
{
node->first=node2;
return(0);
}

int NetParse_SetNodeNext(NetParse_Node *node, NetParse_Node *node2)
{
node->next=node2;
return(0);
}

Jul 20 '05 #16
In article <10*************@corp.supernews.com>, Mike wrote:
In article <2u*************@uni-berlin.de>, William Park wrote:
Mike <mi***@mikee.ath.cx> wrote:
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.


Expat (www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.


Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.

Mike


Thanks for all the replies. I have chosen and am using mxml.

Mike
Jul 20 '05 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Karalius, Joseph | last post by:
Can anyone explain what is happening here? I haven't found any useful info on Google yet. Thanks in advance. mmagnet:/home/jkaralius/src/zopeplone/Python-2.3.5 # make gcc -pthread -c...
3
by: Himanshu Garg | last post by:
Hello, I am trying to pinpoint an apparent bug in HTML::Parser. The encoding of the text seems to change incorrectly if the locale isn't set properly. However Parser.pm in the directory...
6
by: Don HO | last post by:
Hi, I'm developing a project in C++ under MS Windows (without MFC). I want to use an xml file as the configuration file of the program. The problem is : after downloading xerces, I realized...
0
by: Roberto Nunnari | last post by:
Hi all. I announce that there's a brand new XML parser in the Open Source arena: NunniMCAX - release 1.0 http://nunnimcax.nunnisoft.ch/en/ NunniMCAX is a C, non validating XML parser. Its...
1
by: Tom | last post by:
I need a very, very minimal LaTeX system on Windows. I only need to have the possibility to get DVI files out of my tex files (with minimal fonts). An I need it without any installer (no settings...
0
by: Roberto Nunnari | last post by:
Hi all. I announce that there's a brand new XML parser in the Open Source arena: NunniMCAX - release 1.0 http://nunnimcax.nunnisoft.ch/en/ NunniMCAX is a C, non validating XML parser. Its...
28
by: Marc Gravell | last post by:
In Linq, you can apparently get a meaningful body from and expression's .ToString(); random question - does anybody know if linq also includes a parser? It just seemed it might be a handy way to...
0
by: UncleRic | last post by:
Environment: Mac OS X (10.4.10) on MacBook Pro I'm a Perl Neophyte. I've downloaded the XML::Parser module and am attempting to install it in my working directory (referenced via PERL5LIB env): ...
18
by: Just Another Victim of the Ambient Morality | last post by:
Is pyparsing really a recursive descent parser? I ask this because there are grammars it can't parse that my recursive descent parser would parse, should I have written one. For instance: ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.