By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,227 Members | 1,376 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,227 IT Pros & Developers. It's quick & easy.

Expat problems

P: n/a
Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?

X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)

-----Source code-----
#include <stdio.h>
#include <expat.h>

void startElement(void *userData, const char *name, const char **atts)
{
printf("Got element: %S\nwith userData:\n%s\n", name, (char *)userData);
}

void endElement(void *userData, const char *name)
{
}

int main(int argc, char *argv[])
{
FILE *fp;
char *buffer;
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);

buffer = (char *)malloc(fsize+1);

if (buffer == NULL)
exit(2);

fread(buffer, 1, fsize, fp);

buffer[fsize] = '\0';

printf("%s\n", buffer);

fclose(fp);

parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
XML_SetUserData(parser, &userData);
XML_SetElementHandler(parser, startElement, endElement);
do {
done = fsize < sizeof(buffer);
if (!XML_Parse(parser, buffer, fsize, 0)) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
} while (!done);

XML_ParserFree(parser);

}

return 0;
}
-------------------

-----XML input-----
<?xml version="1.0" ?>
<a>
</a>

-------------------

/Jakob
Nov 13 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
Jakob Møbjerg Nielsen wrote:
Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?

X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)

-----Source code-----
#include <stdio.h>
#include <expat.h> Not a standard header. What is in here?


void startElement(void *userData, const char *name, const char **atts)
{
printf("Got element: %S\nwith userData:\n%s\n", name, (char *)userData); My understanding is that the printf() format specifiers are case
sensitive, although I'm sure somebody here will correct me if I'm
wrong.

}

void endElement(void *userData, const char *name)
{
}

int main(int argc, char *argv[])
{
FILE *fp;
char *buffer;
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);
There is no guarantee that the ending position of a file is the
same as the size of the file. Character translations and other
stuff may obscure the size. The only method to know the actual
size of the file is to open the file in binary mode and count
all the characters.


buffer = (char *)malloc(fsize+1); In the times when memory was small and precious, input data
was read in by "chunks" instead of the whole file into memory.
Granted, reading it into memory is the most efficient method,
there is no guarantee that your platform or the platform that
this program will run on will have enough memory for the largest
sized file. Harddisks are becoming larger these days.

I say read in the data in chunks.


if (buffer == NULL)
exit(2); You might want to be nice to the user and print a reason why
the program is aborting.

fread(buffer, 1, fsize, fp); See above about reading in chunks.

buffer[fsize] = '\0';

printf("%s\n", buffer); You are printing the enter file here. Could take a while.
Is this necessary?


fclose(fp);

parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
XML_SetUserData(parser, &userData);
XML_SetElementHandler(parser, startElement, endElement);
do {
done = fsize < sizeof(buffer); The expression "sizeof(buffer)" returns the size of the pointer,
not the buffer. By the way, if you look up a few lines, you
will note that the buffer was allocated with a size of
"fsize + 1". So, what is this statement supposed to do?

if (!XML_Parse(parser, buffer, fsize, 0)) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
} while (!done); See my question about the assignment to "done" above.
Why do you bother processing the data in chunks when
you have read the entire file into memory?

XML_ParserFree(parser);

}

return 0;
}


I cannot comment on the correctness of the XML_*()
function calls since I don't have that header file
and you haven't supplied those declarations.
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 13 '05 #2

P: n/a
Thomas Matthews wrote:
#include <expat.h> Not a standard header. What is in here?


Expat - the XML parser.
buffer = (char *)malloc(fsize+1);

I say read in the data in chunks.


Well, this is just for testing with small XML files (probably not above
1M).
printf("%s\n", buffer);

You are printing the enter file here. Could take a while.
Is this necessary?


Debugging :-)
I didn't want to start gdb just for looking at the contents of buffer.
} while (!done);

See my question about the assignment to "done" above.
Why do you bother processing the data in chunks when
you have read the entire file into memory?


Because, later on, the data will be streamed in from a socket.
I cannot comment on the correctness of the XML_*()
function calls since I don't have that header file
and you haven't supplied those declarations.


There is quite a few:
http://guinness.cs.stevens-tech.edu/...reference.html

Anyway, I've tried cleaning up a bit and played around with
feeding the parser in a "stream-like" manner, but I still
get that pesky "junk after document element" message. If I
use UTF-8 I get a "not well-formed (invalid token)".

#include <stdio.h>
#include <expat.h>

void startElement(void *userData, const char *name, const char **atts)
{
printf("Got start-element: %s\n", name);
}

void endElement(void *userData, const char *name)
{
printf("Got end-element: %s\n", name);
}

int main(int argc, char *argv[])
{
FILE *fp;
char buffer[1];
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
XML_SetUserData(parser, &userData);
XML_SetElementHandler(parser, startElement, endElement);
do {
if (!feof(fp)) {
buffer[0] = fgetc(fp);
if (!XML_Parse(parser, buffer, strlen(buffer), feof(fp))) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
}
} while (!feof(fp));
XML_ParserFree(parser);
}
return 0;
}
--
Jakob Møbjerg Nielsen | "Nine-tenths of the universe is the
ja***@dataloger.dk | knowledge of the position and direction
http://www.jakobnielsen.dk/ | of everything in the other tenth."
| -- Terry Pratchett, Thief of Time
Nov 13 '05 #3

P: n/a
Examine for example elements.c Expat example file for more carefully,
copy the parsing loop (do loop) from there.

Replace only stdin with your FILE*. You might also want to open file in "rb"
(binary mode) to avoid CRLF translations.

it seems ou're trying something funny with strlen() in your code.

with respect,
Toni Uusitalo
Nov 13 '05 #4

P: n/a
In article <bp**********@sunsite.dk>,
Jakob Møbjerg Nielsen <ja***@dataloger.dk> wrote:

% Expat keeps telling me that there is "junk after document element".

% if ((fp = fopen(*++argv, "r")) == NULL) {
% fprintf(stderr, "%s: Can't open %s", prog, *argv);
% exit(1);
% } else {
% fseek(fp, 0, SEEK_END);
% fsize = ftell(fp);
% rewind(fp);
%
% buffer = (char *)malloc(fsize+1);
%
% if (buffer == NULL)
% exit(2);
%
% fread(buffer, 1, fsize, fp);

If you're not on a Unix system, ftell() might give you a larger value than
fread() returns. You might want to check the return value of fread().

% printf("%s\n", buffer);

You might want to do a hex dump rather than just printing up to the first
NULL. If there are trailing NULLS after the last >, expat while give you
an error message.

--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Nov 13 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.