minimal xml parser? | | |
Does anyone know of a minimal/mini/tiny/small xml parser
in c? I'm looking for something small that accepts a stream
or string, builds a c structure, and then returns an opaque
pointer to that structure. There should then be a function
to search that structure given the pointer, tag, and an
optional attribute. I'm looking initially at only text data,
no numbers, though eventuall there will be some binary
data (CDATA?).
Thanks.
Mike | | | | re: minimal xml parser?
"Mike" <mikee@mikee.ath.cx> wrote in message
news:10nqqmnq3n1ln8d@corp.supernews.com...[color=blue]
> I'm looking initially at only text data,
> no numbers, though eventuall there will be some binary
> data (CDATA?).[/color]
XML does not support "binary data" as the term is commonly used. All data
within an XML instance must be valid per the specified character encoding.
You should read the relevant sections of the XML specification before
determining if XML is an appropriate representation for your requirements.
/kmc | | | | re: minimal xml parser?
In article <k4qdnSQeG5SZHODcRVn-1A@comcast.com>, Keith M. Corbett wrote:[color=blue]
> "Mike" <mikee@mikee.ath.cx> wrote in message
> news:10nqqmnq3n1ln8d@corp.supernews.com...[color=green]
>> I'm looking initially at only text data,
>> no numbers, though eventuall there will be some binary
>> data (CDATA?).[/color]
>
> XML does not support "binary data" as the term is commonly used. All data
> within an XML instance must be valid per the specified character encoding.
>
> You should read the relevant sections of the XML specification before
> determining if XML is an appropriate representation for your requirements.
>
> /kmc
>
>[/color]
XML has been chosen, I need to write the parser. Oh, and I do not have
to validate the XML, just parse it.
Mike | | | | re: minimal xml parser?
Mike <mikee@mikee.ath.cx> wrote:[color=blue]
> XML has been chosen, I need to write the parser. Oh, and I do not have
> to validate the XML, just parse it.[/color]
Expat ( www.libexpat.org). Practically every language has some sort of
support for it, even Bash shell.
--
William Park <opengeometry@yahoo.ca>
Open Geometry Consulting, Toronto, Canada | | | | re: minimal xml parser?
In article <2u5h4gF26m9qbU1@uni-berlin.de>, William Park wrote:[color=blue]
> Mike <mikee@mikee.ath.cx> wrote:[color=green]
>> XML has been chosen, I need to write the parser. Oh, and I do not have
>> to validate the XML, just parse it.[/color]
>
> Expat ( www.libexpat.org). Practically every language has some sort of
> support for it, even Bash shell.
>[/color]
Thanks for the expat suggestion. I have also read for libxml. I'd like to
find a few hundred lines of c code to do this.
Mike | | | | re: minimal xml parser?
In article <10nr49ihiaog893@corp.supernews.com>,
Mike <mikee@mikee.ath.cx> wrote:
% In article <2u5h4gF26m9qbU1@uni-berlin.de>, William Park wrote:
% > Mike <mikee@mikee.ath.cx> wrote:
% >> XML has been chosen, I need to write the parser. Oh, and I do not have
% >> to validate the XML, just parse it.
% >
% > Expat ( www.libexpat.org). Practically every language has some sort of
% > support for it, even Bash shell.
% >
%
% Thanks for the expat suggestion. I have also read for libxml. I'd like to
% find a few hundred lines of c code to do this.
I expect that you won't find a conforming XML parser which is a only few
hundred lines long. The smallest conforming parsers I know of are expat
and rxp, and they're in the thousands of lines. There's also tinyxml, which
is not a conforming parser, and which is still in the thousands of lines.
Although the tempation to write a minimal ``parser'' yourself may be
overwhelming, I think you're better off using an existing, conforming,
parser. Otherwise, you will almost certainly end up with a system that
rejects valid XML files, and what's the good of that?
I think you're looking for something like rxp's API.
--
Patrick TJ McPhee
East York Canada ptjm@interlog.com | | | | re: minimal xml parser?
Mike wrote:[color=blue]
> Does anyone know of a minimal/mini/tiny/small xml parser
> in c? I'm looking for something small that accepts a stream
> or string, builds a c structure, and then returns an opaque
> pointer to that structure. There should then be a function
> to search that structure given the pointer, tag, and an
> optional attribute. I'm looking initially at only text data,
> no numbers, though eventuall there will be some binary
> data (CDATA?).[/color]
You could try Mini-XML. See http://www.easysw.com/~mike/mxml/
--
To reply by e-mail, please remove the extra dot
in the given address: m.collado -> mcollado | | | | re: minimal xml parser?
On Mon, 25 Oct 2004 23:50:10 -0000, Mike <mikee@mikee.ath.cx> wrote:
[color=blue]
>In article <2u5h4gF26m9qbU1@uni-berlin.de>, William Park wrote:[color=green]
>> Mike <mikee@mikee.ath.cx> wrote:[color=darkred]
>>> XML has been chosen, I need to write the parser. Oh, and I do not have
>>> to validate the XML, just parse it.[/color]
>>
>> Expat ( www.libexpat.org). Practically every language has some sort of
>> support for it, even Bash shell.
>>[/color]
>
>Thanks for the expat suggestion. I have also read for libxml. I'd like to
>find a few hundred lines of c code to do this.
>
>Mike[/color]
you really need the source code? I'm sure you could find a parser in library form ready for you to
use. | | | | re: minimal xml parser?
Mike wrote:[color=blue]
> Does anyone know of a minimal/mini/tiny/small xml parser
> in c? I'm looking for something small that accepts a stream
> or string, builds a c structure, and then returns an opaque
> pointer to that structure. There should then be a function
> to search that structure given the pointer, tag, and an
> optional attribute. I'm looking initially at only text data,
> no numbers, though eventuall there will be some binary
> data (CDATA?).[/color]
My Mini-XML library might be what you are looking for: http://www.easysw.com/~mike/mxml/
--
__________________________________________________ ____________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com | | | | re: minimal xml parser?
Patrick TJ McPhee wrote:[color=blue]
> ...
> I expect that you won't find a conforming XML parser which is a only
> few hundred lines long. The smallest conforming parsers I know of are
> expat and rxp, and they're in the thousands of lines. There's also
> tinyxml, which is not a conforming parser, and which is still in the
> thousands of lines.
> ...[/color]
It is a myth that conforming XML parsers have to be big; *validating*
parsers, perhaps, but not a simple non-validating parser which
accepts XML syntax and encoding.
Mini-XML started as 696 lines of C code (it has since grown to a
little over 2700 lines of code) and is a fully conformant XML
parser that provides everything except validation (and I'm thinking
how I could do that without bloating it...)
--
__________________________________________________ ____________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com | | | | re: minimal xml parser?
Mike <mikee@mikee.ath.cx> wrote:[color=blue]
> In article <2u5h4gF26m9qbU1@uni-berlin.de>, William Park wrote:[color=green]
> > Mike <mikee@mikee.ath.cx> wrote:[color=darkred]
> >> XML has been chosen, I need to write the parser. Oh, and I do not have
> >> to validate the XML, just parse it.[/color]
> >
> > Expat ( www.libexpat.org). Practically every language has some sort of
> > support for it, even Bash shell.
> >[/color]
>
> Thanks for the expat suggestion. I have also read for libxml. I'd like to
> find a few hundred lines of c code to do this.[/color]
Are you talking about actually doing the parsing (duplicating what Expat
does), or just calling API functions?
If former, then I doubt there is one. If latter, then Gawk, Python,
Bash, all have a binding to Expat.
--
William Park <opengeometry@yahoo.ca>
Open Geometry Consulting, Toronto, Canada | | | | re: minimal xml parser?
In article <417E45D5.6010101@easysw.com>,
Michael Sweet <mike@easysw.com> wrote:
% Patrick TJ McPhee wrote:
% > ...
% > I expect that you won't find a conforming XML parser which is a only
% > few hundred lines long.
[...]
% It is a myth that conforming XML parsers have to be big; *validating*
% parsers, perhaps, but not a simple non-validating parser which
% accepts XML syntax and encoding.
%
% Mini-XML started as 696 lines of C code (it has since grown to a
Which is to say, more than a few hundred lines, and it seems like
it wasn't conforming at that.
% little over 2700 lines of code) and is a fully conformant XML
Which is to say, thousands of lines.
--
Patrick TJ McPhee
East York Canada ptjm@interlog.com | | | | re: minimal xml parser?
In article <2u8h10F281e0mU1@uni-berlin.de>, William Park wrote:[color=blue]
> Mike <mikee@mikee.ath.cx> wrote:[color=green]
>> In article <2u5h4gF26m9qbU1@uni-berlin.de>, William Park wrote:[color=darkred]
>> > Mike <mikee@mikee.ath.cx> wrote:
>> >> XML has been chosen, I need to write the parser. Oh, and I do not have
>> >> to validate the XML, just parse it.
>> >
>> > Expat ( www.libexpat.org). Practically every language has some sort of
>> > support for it, even Bash shell.
>> >[/color]
>>
>> Thanks for the expat suggestion. I have also read for libxml. I'd like to
>> find a few hundred lines of c code to do this.[/color]
>
> Are you talking about actually doing the parsing (duplicating what Expat
> does), or just calling API functions?
>
> If former, then I doubt there is one. If latter, then Gawk, Python,
> Bash, all have a binding to Expat.
>[/color]
I'm talking about the actual parsing. | | | | re: minimal xml parser?
Patrick TJ McPhee wrote:[color=blue]
> In article <417E45D5.6010101@easysw.com>,
> Michael Sweet <mike@easysw.com> wrote:
>
> % Patrick TJ McPhee wrote:
> % > ...
> % > I expect that you won't find a conforming XML parser which is a only
> % > few hundred lines long.
>
> [...]
>
> % It is a myth that conforming XML parsers have to be big; *validating*
> % parsers, perhaps, but not a simple non-validating parser which
> % accepts XML syntax and encoding.
> %
> % Mini-XML started as 696 lines of C code (it has since grown to a
>
> Which is to say, more than a few hundred lines, and it seems like
> it wasn't conforming at that.[/color]
Actually, it was, however features were added to make it perform
better and support more use cases.
[color=blue]
> % little over 2700 lines of code) and is a fully conformant XML
>
> Which is to say, thousands of lines.[/color]
But still a tiny fraction of the size of other XML parsers out
there...
--
__________________________________________________ ____________________
Michael Sweet, Easy Software Products mike at easysw dot com
Printing Software for UNIX http://www.easysw.com | | | | re: minimal xml parser?
In article <417F9021.7080906@easysw.com>,
Michael Sweet <mike@easysw.com> wrote:
[I wrote]
% > Which is to say, thousands of lines.
%
% But still a tiny fraction of the size of other XML parsers out
% there...
Except for the ones I cited in the post you seemed to contradict,
which are roughly the same size.
--
Patrick TJ McPhee
East York Canada ptjm@interlog.com | | | | re: minimal xml parser?
"Mike" <mikee@mikee.ath.cx> wrote in message
news:10nqqmnq3n1ln8d@corp.supernews.com...[color=blue]
> Does anyone know of a minimal/mini/tiny/small xml parser
> in c? I'm looking for something small that accepts a stream
> or string, builds a c structure, and then returns an opaque
> pointer to that structure. There should then be a function
> to search that structure given the pointer, tag, and an
> optional attribute. I'm looking initially at only text data,
> no numbers, though eventuall there will be some binary
> data (CDATA?).
>[/color]
oh well, this thread is new enough that I think I will add my comment.
if motivated, maybe my parser could be made to work in your case.
kalloc/kfree are for my allocator.
kralloc is a rotating allocator (allocates from a large circular buffer),
and thus does not need freeing.
ObjType_New can be replaced by kalloc (or malloc if needed).
be warned if replacing kalloc or such with malloc in that it will be
necessary to zero memory returned by malloc (not necissarily done by malloc
by default).
I ommited, eg, the printer here though...
part of the header:
----
#define TOKEN_NULL 0
#define TOKEN_SPECIAL 1
#define TOKEN_STRING 2
#define TOKEN_SYMBOL 3
typedef struct NetParse_Attr_s NetParse_Attr;
typedef struct NetParse_Node_s NetParse_Node;
struct NetParse_Attr_s {
NetParse_Attr *next;
char *ns;
char *key;
char *value;
};
struct NetParse_Node_s {
NetParse_Node *next;
char *ns;
char *key;
char *text;
NetParse_Attr *attr;
NetParse_Node *first;
};
dump of part of my parser:
----
/*--
Cat pdlib;Parse;XML
Form
char *NetParse_XML_EatWhite(char *s);
Description
Skips over whitespace.
Status Internal
--*/
char *NetParse_XML_EatWhite(char *s)
{
int i, r;
i=0;
while(*s && *s<=' ')
{
if(*s=='\n')
{
line++;
*s=' ';
}
i=1;
s++;
}
if(i)s=NetParse_XML_EatWhite(s);
return(s);
}
/*--
Cat pdlib;Parse;XML
Form
int NetParse_XML_SpecialP(char *s);
Description
Returns a nonzero value if *s is special.
Status Internal
--*/
int NetParse_XML_SpecialP(char *s)
{
switch(*s)
{
case '<':
return(1);
break;
case '>':
return(1);
break;
case '/':
return(1);
break;
case '=':
return(1);
break;
case '?':
return(1);
break;
case ':':
return(1);
break;
default:
return(0);
break;
}
return(0);
}
/*--
Cat pdlib;Parse;XML
Form
int NetParse_XML_ContSpecialP(char *s);
Description
Returns nonzero if this will get the parsers attention when reading as
text.
This includes '<' and '&'.
Status Internal
--*/
int NetParse_XML_ContSpecialP(char *s)
{
switch(*s)
{
case '<':
return(1);
break;
case '&':
return(1);
break;
default:
return(0);
break;
}
return(0);
}
/*--
Cat pdlib;Parse;XML
Form
char *NetParse_XML_Token(char *s, char *b, int *t);
Description
Reads a token from the XML stream.
This includes:
Individual symbols;
Globs of text/tags;
Strings.
b is the buffer.
t is an integer to hold the token type
TOKEN_NULL, a null terminator was reached;
TOKEN_SPECIAL, a special character.
TOKEN_STRING, a quoted string literal (escapes processed).
TOKEN_SYMBOL, an unquoted bit of text (eg: a tag).
Returns the next character after the token.
Status Internal
--*/
char *NetParse_XML_Token(char *s, char *b, int *t)
{
char *ob, *is, *t2;
char *buf;
int i;
is=s;
if(!b)b=kralloc(256);
ob=b;
*b=0;
if(t)*t=0;
buf=kralloc(16);
s=NetParse_XML_EatWhite(s);
if(!*s)
{
*t=TOKEN_NULL;
return(s);
}
if(NetParse_XML_SpecialP(s))
{
if(t)*t=TOKEN_SPECIAL;
*b++=*s++;
*b=0;
}else if((*s=='"') || (*s=='\'')) /* quoted string */
{
if(t)*t=TOKEN_STRING;
s++;
while(*s && (*s!='"') && (*s!='\''))
{
if(*s=='&')
{
s++;
t2=buf;
while(*s && (*s!=';'))*t2++=*s++;
if(!*s)return(NULL);
*t2++=0;
s++;
if(buf[0]=='#')
{
if(buf[1]=='x')
{
t=buf+2;
i=0;
while(*t)
{
i<<=4;
if((*t>='0') && (*t<='9'))
i+=*t-'0';
if((*t>='A') && (*t<='F'))
i+=*t-'A'+10;
if((*t>='a') && (*t<='f'))
i+=*t-'a'+10;
t++;
}
*b++=i;
}else *b++=atoi(buf+1);
}
if(!strcmp(buf, "amp"))*b++='&';
if(!strcmp(buf, "lt"))*b++='<';
if(!strcmp(buf, "gt"))*b++='>';
if(!strcmp(buf, "quot"))*b++='"';
if(!strcmp(buf, "apos"))*b++='\'';
}else *b++=*s++;
}
if(!*s)
{
*t=TOKEN_NULL;
return(is);
}
*b++=0;
s++;
}else
{
if(t)*t=TOKEN_SYMBOL;
while(*s && (*s>' ') && !NetParse_XML_SpecialP(s) &&
((b-ob)<254))
*b++=*s++;
*b++=0;
if(!*s)
{
*t=TOKEN_NULL;
return(is);
}
}
return(s);
}
/*--
Cat pdlib;Parse;XML
Form
char *NetParse_XML_ParseText(char *s, char *b);
Description
Parse a glob of text from the stream.
Handles escapes and such.
Status Internal
--*/
char *NetParse_XML_ParseText(char *s, char *b)
{
char *ob, *t;
char buf[16];
int i, gws, rws;
if(!b)b=kralloc(4096);
ob=b;
*b=0;
s=NetParse_XML_EatWhite(s);
if(!*s)return(NULL);
gws=0;
rws=0;
while(1)
{
while(*s && !NetParse_XML_ContSpecialP(s))
{
if((*s=='\r') || (*s=='\n'))
{
s=NetParse_XML_EatWhite(s);
if(!rws)
{
*b++=' ';
gws++;
}
continue;
}
gws=0;
if(*s<=' ')rws++;
else rws=0;
*b++=*s++;
}
if(!*s)return(NULL);
if(*s=='&')
{
s++;
t=buf;
while(*s && (*s!=';'))*t++=*s++;
if(!*s)return(NULL);
*t++=0;
s++;
if(buf[0]=='#')
{
if(buf[1]=='x')
{
t=buf+2;
i=0;
while(*t)
{
i<<=4;
if((*t>='0') && (*t<='9'))
i+=*t-'0';
if((*t>='A') && (*t<='F'))
i+=*t-'A'+10;
if((*t>='a') && (*t<='f'))
i+=*t-'a'+10;
t++;
}
gws=0;
if(i<=' ')rws++;
else rws=0;
*b++=i;
}else
{
i=atoi(buf+1);
gws=0;
if(i<=' ')rws++;
else rws=0;
*b++=i;
}
continue;
}
rws=0;
gws=0;
if(!strcmp(buf, "amp"))*b++='&';
if(!strcmp(buf, "lt"))*b++='<';
if(!strcmp(buf, "gt"))*b++='>';
if(!strcmp(buf, "apos"))*b++='\'';
if(!strcmp(buf, "quot"))*b++='"';
}else break;
}
b-=gws;
*b++=0;
return(s);
}
/*--
Cat pdlib;Parse;XML
Form
NetParse_Attr *NetParse_XML_ParseOpts(char **s);
Description
Parse the list of attributes within a tag.
Status Internal
--*/
NetParse_Attr *NetParse_XML_ParseOpts(char **s)
{
// char ns[32];
// char var[32];
// char eq[16];
// char val[256];
char *is, *ns, *var, *eq, *val;
int ty;
NetParse_Attr *lst, *end, *tmp;
ns=kralloc(256);
var=kralloc(256);
eq=kralloc(256);
val=kralloc(4096);
lst=NULL;
end=NULL;
is=*s;
while(1)
{
NetParse_XML_Token(*s, var, &ty);
if(ty==TOKEN_NULL)
{
kprint("m1\n");
*s=NULL;
return(NULL);
}
if((var[0]=='>') && (ty==TOKEN_SPECIAL))
break;
if((var[0]=='/') && (ty==TOKEN_SPECIAL))
break;
if((var[0]=='?') && (ty==TOKEN_SPECIAL))
break;
if(ty==TOKEN_NULL)
{
kprint("m2\n");
*s=NULL;
return(NULL);
}
if(ty!=TOKEN_SYMBOL)
{
kprint("parse error (inv attribute).\n");
return(NULL);
}
*s=NetParse_XML_Token(*s, var, &ty);
if(ty==TOKEN_NULL)
{
kprint("m3\n");
*s=NULL;
return(NULL);
}
*s=NetParse_XML_Token(*s, eq, &ty);
if(ty==TOKEN_NULL)
{
kprint("m4\n");
*s=NULL;
return(NULL);
}
if((ty==TOKEN_SPECIAL) && (eq[0]==':'))
{
strcpy(ns, var);
*s=NetParse_XML_Token(*s, var, &ty);
if(ty==TOKEN_NULL)
{
kprint("m41\n");
*s=NULL;
return(NULL);
}
*s=NetParse_XML_Token(*s, eq, &ty);
if(ty==TOKEN_NULL)
{
kprint("m42\n");
*s=NULL;
return(NULL);
}
}else ns[0]=0;
if((ty!=TOKEN_SPECIAL) || (eq[0]!='='))
{
kprint("parse error (attr equal).\n");
return(NULL);
}
*s=NetParse_XML_Token(*s, val, &ty);
if(ty==TOKEN_NULL)
{
kprint("m5\n");
*s=NULL;
return(NULL);
}
if(ty!=TOKEN_STRING)
{
kprint("parse error (inv attribute arg).\n");
return(NULL);
}
// t=CONS(SYM(var), CONS(STRING(val), MISC_EOL));
// x=CONS(t, x);
// tmp=kalloc(sizeof(NetParse_Attr));
tmp=NetParse_NewAttr();
tmp->next=NULL;
if(ns[0])tmp->ns=kstrdup(ns);
tmp->key=kstrdup(var);
tmp->value=kstrdup(val);
if(end)
{
end->next=tmp;
end=tmp;
}else
{
lst=tmp;
end=tmp;
}
}
return(lst);
}
/*--
Cat pdlib;Parse;XML
Form
NetParse_Node *NetParse_XML_ParseExpr(char **s);
Description
Parses an XML expression.
s is updated to reflect the change.
NULL is returned on parse error or end-of-stream.
s is not updated for end of stream conditions, which can be used to
seperate it from a parse error.
--*/
NetParse_Node *NetParse_XML_ParseExpr(char **s)
{
// char buf[256], buf2[16];
// char key[32], ns[32];
char *buf, *buf2, *key, *ns;
int ty, i;
char *s2, *s3, *s4, *is;
// elem kv, opts, t, x;
NetParse_Node *tmp, *t, *end;
is=*s;
*s=NetParse_XML_EatWhite(*s);
if(!*(*s))return(NULL);
buf=kalloc(256);
buf2=kalloc(256);
key=kalloc(256);
ns=kalloc(256);
// strncpy(buf, *s, 5);
// buf[5]=0;
// kprint("parse: %s\n", buf);
NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if((buf[0]=='<') && (ty==TOKEN_SPECIAL))
{
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if(*s[0]=='?')
*s=*s+1;
if(*s[0]=='!')
{
if(!strncmp(*s, "[CDATA[", 7))
{
*s=*s+7;
s2=kalloc(65536);
s3=s2;
s4=*s;
while(*s4)
{
if(!strncmp(s4, "]]>", 3))
{
s4+=3;
break;
}
if(!strncmp(s4, "]]>", 6))
{
s4+=6;
*s3++=']';
*s3++=']';
*s3++='>';
continue;
}
*s3++=*s4++;
}
if(!*s4)
{
kfree(s2);
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
*s3++=0;
*s=s4;
tmp=NetParse_NewNode();
tmp->next=NULL;
tmp->key=NULL;
tmp->text=kstrdup(s2);
tmp->attr=NULL;
tmp->first=NULL;
kfree(s2);
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}
s2=*s;
i=1;
while(*s2 && i)
{
if(*s2=='<')i++;
if(*s2=='>')i--;
if(*s2=='[')i++;
if(*s2==']')i--;
s2++;
}
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=s2;
return(NetParse_XML_ParseExpr(s));
}
*s=NetParse_XML_Token(*s, key, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if(ty!=TOKEN_SYMBOL)
{
kprint("parse error (inv tag).\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(NULL);
}
if(**s==':')
{
*s=*s+1;
strcpy(ns, key);
*s=NetParse_XML_Token(*s, key, &ty);
}else ns[0]=0;
if((**s>' ') && (**s!='>') && (**s!='/'))
{
kprint("parse error (inv char after tag).\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(NULL);
}
// kv=SYM(key);
// opts=NetParse_XML_ParseOpts(s);
// if(opts==MISC_UNDEFINED)return(t);
// tmp=kalloc(sizeof(NetParse_Node));
tmp=NetParse_NewNode();
tmp->next=NULL;
if(ns[0])tmp->ns=kstrdup(ns);
tmp->key=kstrdup(key);
s3=*s;
tmp->attr=NetParse_XML_ParseOpts(s);
if(!*s)
{
kprint("attr traunc\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
tmp->first=NULL;
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if((buf[0]=='/') && (ty==TOKEN_SPECIAL))
{
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
return(tmp);
}
if((buf[0]=='?') && (ty==TOKEN_SPECIAL))
{
*s=NetParse_XML_Token(*s, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
// x=CONS(kv, CONS(opts, MISC_EOL));
// x=CONS(SYM("?"), x);
strcpy(buf, "?");
strcat(buf, tmp->key);
kfree(tmp->key);
tmp->key=kstrdup(buf);
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}
if(buf[0]!='>')
{
kprint("parse error (expected close '>').\n");
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(NULL);
}
end=NULL;
// x=MISC_EOL;
while(1)
{
s2=*s;
s2=NetParse_XML_Token(s2, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
s2=NetParse_XML_Token(s2, buf2, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if(buf[0]=='<' && buf2[0]=='/')
{
s2=NetParse_XML_Token(s2, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
s2=NetParse_XML_Token(s2, buf, &ty);
if(ty==TOKEN_NULL)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
*s=s2;
break;
}
s3=*s;
t=NetParse_XML_ParseExpr(s);
if(*s==s3)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
if(!t)return(t);
// x=CONS(t, x);
if(end)
{
end->next=t;
end=t;
}else
{
tmp->first=t;
end=t;
}
}
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}
s2=kalloc(65536);
*s=NetParse_XML_ParseText(*s, s2);
if(!*s)
{
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
*s=is;
return(NULL);
}
// tmp=kalloc(sizeof(NetParse_Node));
tmp=NetParse_NewNode();
tmp->next=NULL;
tmp->key=NULL;
tmp->text=kstrdup(s2);
tmp->attr=NULL;
tmp->first=NULL;
kfree(s2);
kfree(buf);
kfree(buf2);
kfree(key);
kfree(ns);
return(tmp);
}
/*--
Cat pdlib;Parse;XML
Form
NetParse_Node *NetParse_XML_LoadFile(char *name);
Description
loads XML from a file.
returns NULL on failure.
--*/
NetParse_Node *NetParse_XML_LoadFile(char *name)
{
VFILE *fd;
char *buf, *s;
NetParse_Node *n;
fd=vffopen(name, "rb");
if(!fd)return(NULL);
buf=vf_bufferin(fd);
if(!buf)return(NULL);
s=buf;
while(*s)
{
n=NetParse_XML_ParseExpr(&s);
if(!n)break;
if(n->key[0]=='?')continue;
return(n);
}
return(NULL);
}
and part of the crap for dealing with parse trees:
----
/*--
Cat pdlib;Parse
Form
int NetParse_Init();
Description
Init function for NetParse, called implicitly by node/attr creation.
--*/
int NetParse_Init()
{
static int init=0;
if(init)return(1);
init=1;
ObjType_NewType("netparse_attr_t", "*struct;string;string;");
ObjType_NewType("netparse_node_t",
"*struct;string;string;*struct;*struct;");
return(0);
}
/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_NewAttr();
Description
Creates a new attribute.
--*/
NetParse_Attr *NetParse_NewAttr()
{
NetParse_Attr *tmp;
NetParse_Init();
// tmp=kalloc(sizeof(NetParse_Attr));
tmp=ObjType_New("netparse_attr_t", sizeof(NetParse_Attr));
tmp->next=NULL;
tmp->key=NULL;
tmp->value=NULL;
return(tmp);
}
/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_AddAttr(NetParse_Node *node, char *key, char
*value);
Description
Adds an attribute to a node (or sets the attribute if present).
--*/
NetParse_Attr *NetParse_AddAttr(NetParse_Node *node, char *key, char *value)
{
NetParse_Attr *tmp, *cur;
cur=node->attr;
while(cur)
{
if(!strcmp(cur->key, key))
{
if(cur->value)kfree(cur->value);
cur->value=kstrdup(value);
return(cur);
}
cur=cur->next;
}
// tmp=kalloc(sizeof(NetParse_Attr));
tmp=NetParse_NewAttr();
tmp->next=NULL;
tmp->key=kstrdup(key);
tmp->value=kstrdup(value);
if(!node->attr)
{
node->attr=tmp;
return(tmp);
}
cur=node->attr;
while(cur->next)cur=cur->next;
cur->next=tmp;
return(tmp);
}
/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_AddAttrList(NetParse_Attr *lst, char *key, char
*value);
Description
Adds an attribute to a list of attributes (or assigns the attribure \
if allready present).
Returns the start of the list, or the new attribute if lst is NULL.
--*/
NetParse_Attr *NetParse_AddAttrList(NetParse_Attr *lst, char *key, char
*value)
{
NetParse_Attr *tmp, *cur;
cur=lst;
while(cur)
{
if(!strcmp(cur->key, key))
{
if(cur->value)kfree(cur->value);
cur->value=kstrdup(value);
return(lst);
}
cur=cur->next;
}
// tmp=kalloc(sizeof(NetParse_Attr));
tmp=NetParse_NewAttr();
tmp->next=NULL;
tmp->key=kstrdup(key);
tmp->value=kstrdup(value);
if(!lst)return(tmp);
cur=lst;
while(cur->next)cur=cur->next;
cur->next=tmp;
return(lst);
}
/*--
Cat pdlib;Parse
Form
char *NetParse_GetNodeAttr(NetParse_Node *node, char *key);
Description
Gets an attribute associated with a node.
Returns NULL if not found.
--*/
char *NetParse_GetNodeAttr(NetParse_Node *node, char *key)
{
NetParse_Attr *cur;
cur=node->attr;
while(cur)
{
if(!strcmp(cur->key, key))return(cur->value);
cur=cur->next;
}
return(NULL);
}
/*--
Cat pdlib;Parse
Form
int NetParse_GetNodeAttrIsP(NetParse_Node *node, char *key, char *value);
Description
Check if a given node has a certain attribute as a certain value.
--*/
int NetParse_GetNodeAttrIsP(NetParse_Node *node, char *key, char *value)
{
NetParse_Attr *cur;
cur=node->attr;
while(cur)
{
if(!strcmp(cur->key, key))
{
if(!strcmp(cur->value, value))
return(1);
else return(0);
}
cur=cur->next;
}
return(0);
}
/*--
Cat pdlib;Parse
Form
char *NetParse_GetAttrList(NetParse_Attr *lst, char *key);
Description
Gets an attribute in a list.
Returns NULL if not found.
--*/
char *NetParse_GetAttrList(NetParse_Attr *lst, char *key)
{
NetParse_Attr *cur;
cur=lst;
while(cur)
{
if(!strcmp(cur->key, key))return(cur->value);
cur=cur->next;
}
return(NULL);
}
/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_NewNode();
Description
Creates a new node.
--*/
NetParse_Node *NetParse_NewNode()
{
NetParse_Node *tmp;
NetParse_Init();
// tmp=kalloc(sizeof(NetParse_Node));
tmp=ObjType_New("netparse_node_t", sizeof(NetParse_Node));
tmp->next=NULL;
tmp->key=NULL;
tmp->text=NULL;
tmp->attr=NULL;
tmp->first=NULL;
return(tmp);
}
/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_AddNodeEnd(NetParse_Node *first, NetParse_Node
*node);
Description
Adds a new node at the end of a list of nodes.
--*/
NetParse_Node *NetParse_AddNodeEnd(NetParse_Node *first, NetParse_Node
*node)
{
NetParse_Node *cur;
if(!first)return(node);
cur=first;
while(cur->next)cur=cur->next;
cur->next=node;
return(first);
}
/*--
Cat pdlib;Parse
Form
int NetParse_AddChildNode(NetParse_Node *parent, NetParse_Node *node);
Description
Add a new child node to a parent.
--*/
int NetParse_AddChildNode(NetParse_Node *parent,
NetParse_Node *node)
{
NetParse_Node *cur;
if(!parent->first)
{
parent->first=node;
return(0);
}
cur=parent->first;
while(cur->next)cur=cur->next;
cur->next=node;
return(0);
}
/*--
Cat pdlib;Parse
Form
int NetParse_FreeAttr(NetParse_Attr *attr);
Description
Frees an attribute.
Also frees any following attributes.
--*/
int NetParse_FreeAttr(NetParse_Attr *attr)
{
if(attr->next)NetParse_FreeAttr(attr->next);
if(attr->key)kfree(attr->key);
if(attr->value)kfree(attr->value);
kfree(attr);
return(0);
}
/*--
Cat pdlib;Parse
Form
int NetParse_FreeNode(NetParse_Node *node);
Description
Frees a node and any associated attributes.
Also frees any child nodes.
--*/
int NetParse_FreeNode(NetParse_Node *node)
{
NetParse_Node *cur, *next;
if(node->key)kfree(node->key);
if(node->text)kfree(node->text);
if(node->attr)NetParse_FreeAttr(node->attr);
cur=node->first;
while(cur)
{
next=cur->next;
if(cur->key)kfree(cur->key);
if(cur->text)kfree(cur->text);
if(cur->attr)NetParse_FreeAttr(cur->attr);
if(cur->first)NetParse_FreeNode(cur->first);
kfree(cur);
cur=next;
}
kfree(node);
return(0);
}
/*--
Cat pdlib;Parse
Form
NetParse_Attr *NetParse_CopyAttr(NetParse_Attr *attr);
Description
Copies an attribute along with any following attributes.
--*/
NetParse_Attr *NetParse_CopyAttr(NetParse_Attr *attr)
{
NetParse_Attr *tmp;
tmp=NetParse_NewAttr();
if(attr->next)
tmp->next=NetParse_CopyAttr(attr->next);
if(attr->key)
tmp->key=kstrdup(attr->key);
if(attr->value)
tmp->value=kstrdup(attr->value);
return(tmp);
}
/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_CopyNode(NetParse_Node *node);
Description
Makes a copy of a node tree, copies any attributes or children.
--*/
NetParse_Node *NetParse_CopyNode(NetParse_Node *node)
{
NetParse_Node *cur;
NetParse_Node *tmp, *lst, *end, *t2;
tmp=NetParse_NewNode();
if(node->key)
tmp->key=kstrdup(node->key);
if(node->text)
tmp->text=kstrdup(node->text);
if(node->attr)
tmp->attr=NetParse_CopyAttr(node->attr);
lst=NULL;
end=NULL;
cur=node->first;
while(cur)
{
t2=NetParse_CopyNode(cur);
if(end)end->next=t2;
end=t2;
if(!lst)lst=end;
cur=cur->next;
}
tmp->first=lst;
return(tmp);
}
/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_FindKey(NetParse_Node *first, char *key);
Description
Finds a node in a list with a given key.
Returns NULL if not found.
--*/
NetParse_Node *NetParse_FindKey(NetParse_Node *first, char *key)
{
NetParse_Node *cur;
cur=first;
while(cur)
{
if(cur->key)
if(!strcmp(cur->key, key))
return(cur);
cur=cur->next;
}
return(NULL);
}
//abstract interface funcs
/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_NewNodeKey(char *ns, char *key);
Description
Creates a new node with a given namespace prefix and key.
ns may be NULL in most cases (the tag does not have a namespace \
prefix).
--*/
NetParse_Node *NetParse_NewNodeKey(char *ns, char *key)
{
NetParse_Node *tmp;
tmp=NetParse_NewNode();
tmp->key=kstrdup(key);
return(tmp);
}
/*--
Cat pdlib;Parse
Form
NetParse_Node *NetParse_NewNodeText(char *text);
Description
Creates a new text node with the contents given.
--*/
NetParse_Node *NetParse_NewNodeText(char *text)
{
NetParse_Node *tmp;
tmp=NetParse_NewNode();
tmp->text=kstrdup(text);
return(tmp);
}
/*--
Cat pdlib;Parse
Form
char *NetParse_GetNodeNS(NetParse_Node *node);
char *NetParse_GetNodeKey(NetParse_Node *node);
char *NetParse_GetNodeText(NetParse_Node *node);
NetParse_Node *NetParse_GetNodeFirst(NetParse_Node *node);
NetParse_Node *NetParse_GetNodeNext(NetParse_Node *node);
Description
Get a property of a node, each will return NULL in the case that \
the given property is not present.
--*/
char *NetParse_GetNodeNS(NetParse_Node *node)
{
return(node->ns);
}
char *NetParse_GetNodeKey(NetParse_Node *node)
{
return(node->key);
}
char *NetParse_GetNodeText(NetParse_Node *node)
{
return(node->text);
}
NetParse_Node *NetParse_GetNodeFirst(NetParse_Node *node)
{
return(node->first);
}
NetParse_Node *NetParse_GetNodeNext(NetParse_Node *node)
{
return(node->next);
}
/*--
Cat pdlib;Parse
Form
int NetParse_SetNodeNS(NetParse_Node *node, char *value);
int NetParse_SetNodeKey(NetParse_Node *node, char *value);
int NetParse_SetNodeText(NetParse_Node *node, char *value);
int NetParse_SetNodeFirst(NetParse_Node *node, NetParse_Node *node2);
int NetParse_SetNodeNext(NetParse_Node *node, NetParse_Node *node2);
Description
Set a property of a node, the return value will be 0 if no errors \
occure.
--*/
int NetParse_SetNodeNS(NetParse_Node *node, char *value)
{
node->ns=kstrdup(value);
return(0);
}
int NetParse_SetNodeKey(NetParse_Node *node, char *value)
{
node->key=kstrdup(value);
return(0);
}
int NetParse_SetNodeText(NetParse_Node *node, char *value)
{
node->text=kstrdup(value);
return(0);
}
int NetParse_SetNodeFirst(NetParse_Node *node, NetParse_Node *node2)
{
node->first=node2;
return(0);
}
int NetParse_SetNodeNext(NetParse_Node *node, NetParse_Node *node2)
{
node->next=node2;
return(0);
} | | | | re: minimal xml parser?
In article <10nr49ihiaog893@corp.supernews.com>, Mike wrote:[color=blue]
> In article <2u5h4gF26m9qbU1@uni-berlin.de>, William Park wrote:[color=green]
>> Mike <mikee@mikee.ath.cx> wrote:[color=darkred]
>>> XML has been chosen, I need to write the parser. Oh, and I do not have
>>> to validate the XML, just parse it.[/color]
>>
>> Expat ( www.libexpat.org). Practically every language has some sort of
>> support for it, even Bash shell.
>>[/color]
>
> Thanks for the expat suggestion. I have also read for libxml. I'd like to
> find a few hundred lines of c code to do this.
>
> Mike[/color]
Thanks for all the replies. I have chosen and am using mxml.
Mike |  | Similar .NET Framework bytes | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,353 network members.
|