473,396 Members | 1,826 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Expat problems

Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?

X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)

-----Source code-----
#include <stdio.h>
#include <expat.h>

void startElement(void *userData, const char *name, const char **atts)
{
printf("Got element: %S\nwith userData:\n%s\n", name, (char *)userData);
}

void endElement(void *userData, const char *name)
{
}

int main(int argc, char *argv[])
{
FILE *fp;
char *buffer;
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);

buffer = (char *)malloc(fsize+1);

if (buffer == NULL)
exit(2);

fread(buffer, 1, fsize, fp);

buffer[fsize] = '\0';

printf("%s\n", buffer);

fclose(fp);

parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
XML_SetUserData(parser, &userData);
XML_SetElementHandler(parser, startElement, endElement);
do {
done = fsize < sizeof(buffer);
if (!XML_Parse(parser, buffer, fsize, 0)) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
} while (!done);

XML_ParserFree(parser);

}

return 0;
}
-------------------

-----XML input-----
<?xml version="1.0" ?>
<a>
</a>

-------------------

/Jakob
Jul 20 '05 #1
4 1958
Jakob Møbjerg Nielsen wrote:
Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?

X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)

-----Source code-----
#include <stdio.h>
#include <expat.h> Not a standard header. What is in here?


void startElement(void *userData, const char *name, const char **atts)
{
printf("Got element: %S\nwith userData:\n%s\n", name, (char *)userData); My understanding is that the printf() format specifiers are case
sensitive, although I'm sure somebody here will correct me if I'm
wrong.

}

void endElement(void *userData, const char *name)
{
}

int main(int argc, char *argv[])
{
FILE *fp;
char *buffer;
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);
There is no guarantee that the ending position of a file is the
same as the size of the file. Character translations and other
stuff may obscure the size. The only method to know the actual
size of the file is to open the file in binary mode and count
all the characters.


buffer = (char *)malloc(fsize+1); In the times when memory was small and precious, input data
was read in by "chunks" instead of the whole file into memory.
Granted, reading it into memory is the most efficient method,
there is no guarantee that your platform or the platform that
this program will run on will have enough memory for the largest
sized file. Harddisks are becoming larger these days.

I say read in the data in chunks.


if (buffer == NULL)
exit(2); You might want to be nice to the user and print a reason why
the program is aborting.

fread(buffer, 1, fsize, fp); See above about reading in chunks.

buffer[fsize] = '\0';

printf("%s\n", buffer); You are printing the enter file here. Could take a while.
Is this necessary?


fclose(fp);

parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
XML_SetUserData(parser, &userData);
XML_SetElementHandler(parser, startElement, endElement);
do {
done = fsize < sizeof(buffer); The expression "sizeof(buffer)" returns the size of the pointer,
not the buffer. By the way, if you look up a few lines, you
will note that the buffer was allocated with a size of
"fsize + 1". So, what is this statement supposed to do?

if (!XML_Parse(parser, buffer, fsize, 0)) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
} while (!done); See my question about the assignment to "done" above.
Why do you bother processing the data in chunks when
you have read the entire file into memory?

XML_ParserFree(parser);

}

return 0;
}


I cannot comment on the correctness of the XML_*()
function calls since I don't have that header file
and you haven't supplied those declarations.
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 20 '05 #2
Thomas Matthews wrote:
#include <expat.h> Not a standard header. What is in here?


Expat - the XML parser.
buffer = (char *)malloc(fsize+1);

I say read in the data in chunks.


Well, this is just for testing with small XML files (probably not above
1M).
printf("%s\n", buffer);

You are printing the enter file here. Could take a while.
Is this necessary?


Debugging :-)
I didn't want to start gdb just for looking at the contents of buffer.
} while (!done);

See my question about the assignment to "done" above.
Why do you bother processing the data in chunks when
you have read the entire file into memory?


Because, later on, the data will be streamed in from a socket.
I cannot comment on the correctness of the XML_*()
function calls since I don't have that header file
and you haven't supplied those declarations.


There is quite a few:
http://guinness.cs.stevens-tech.edu/...reference.html

Anyway, I've tried cleaning up a bit and played around with
feeding the parser in a "stream-like" manner, but I still
get that pesky "junk after document element" message. If I
use UTF-8 I get a "not well-formed (invalid token)".

#include <stdio.h>
#include <expat.h>

void startElement(void *userData, const char *name, const char **atts)
{
printf("Got start-element: %s\n", name);
}

void endElement(void *userData, const char *name)
{
printf("Got end-element: %s\n", name);
}

int main(int argc, char *argv[])
{
FILE *fp;
char buffer[1];
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
XML_SetUserData(parser, &userData);
XML_SetElementHandler(parser, startElement, endElement);
do {
if (!feof(fp)) {
buffer[0] = fgetc(fp);
if (!XML_Parse(parser, buffer, strlen(buffer), feof(fp))) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
}
} while (!feof(fp));
XML_ParserFree(parser);
}
return 0;
}
--
Jakob Møbjerg Nielsen | "Nine-tenths of the universe is the
ja***@dataloger.dk | knowledge of the position and direction
http://www.jakobnielsen.dk/ | of everything in the other tenth."
| -- Terry Pratchett, Thief of Time
Jul 20 '05 #3
Examine for example elements.c Expat example file for more carefully,
copy the parsing loop (do loop) from there.

Replace only stdin with your FILE*. You might also want to open file in "rb"
(binary mode) to avoid CRLF translations.

it seems ou're trying something funny with strlen() in your code.

with respect,
Toni Uusitalo
Jul 20 '05 #4
In article <bp**********@sunsite.dk>,
Jakob Møbjerg Nielsen <ja***@dataloger.dk> wrote:

% Expat keeps telling me that there is "junk after document element".

% if ((fp = fopen(*++argv, "r")) == NULL) {
% fprintf(stderr, "%s: Can't open %s", prog, *argv);
% exit(1);
% } else {
% fseek(fp, 0, SEEK_END);
% fsize = ftell(fp);
% rewind(fp);
%
% buffer = (char *)malloc(fsize+1);
%
% if (buffer == NULL)
% exit(2);
%
% fread(buffer, 1, fsize, fp);

If you're not on a Unix system, ftell() might give you a larger value than
fread() returns. You might want to check the return value of fread().

% printf("%s\n", buffer);

You might want to do a hex dump rather than just printing up to the first
NULL. If there are trailing NULLS after the last >, expat while give you
an error message.

--

Patrick TJ McPhee
East York Canada
pt**@interlog.com
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Ingo Blank | last post by:
Hi, while 95% of my 'psycoed' applications run fine, it throws SIGSEGVs in conjunction with expat. Anybody noticed the same ? Python 2.3.2 (#4, Nov 13 2003, 02:10:49) on linux2 $ uname...
2
by: Thomas Guettler | last post by:
Hi! What are the difference between xml.parsers.expat and xml.sax? Up to now I used xml.sax.make_parser and subclass from ContentHandler. I think xml.sax.make_parser uses expat as default....
0
by: Karl Waclawek | last post by:
Our plan is to wait for a short while and re-release this version as the long-awaited Expat 2.0 if no major problems are identified. If significant problems are found, additional iterative releases...
0
by: Fabian Kr?ger | last post by:
Hello, I got a weird problem and need your help and ideas... I´ve written an php application which imports data in XML format and writes this data to a MySQL database to have a faster access....
0
by: Chris Waddingham | last post by:
I am experiencing 2 problems with CDATA sections. These are: 1. Expat appears to be collapsing adjacent linefeeds into one inside CDATA sections. 2. Expat (XML_CharacterDataHandler) returns the...
2
by: Steve Juranich | last post by:
I'm running into problems where Python and VTK both ship with their own distribution of the Expat parser. As long as you never use the Python XML package, everything is fine. But if you try using...
4
by: Jakob Møbjerg Nielsen | last post by:
Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the buffer is nul-terminated. I really have no idea to what the problem...
2
by: dwelch91 | last post by:
Hi, c.l.p.'ers- I am having a problem with the import of xml.parsers.expat that has gotten me completely stumped. I have two programs, one a PyQt program and one a command line (text) program...
1
by: vadlapatlahari | last post by:
Hi, I get the following error with Expat while configuring my application server. Can anyone suggest a solution? When i do an ldd, i get the following : $ldd Expat.so Expat.so needs:...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.