473,803 Members | 3,899 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Expat problems

Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?

X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)

-----Source code-----
#include <stdio.h>
#include <expat.h>

void startElement(vo id *userData, const char *name, const char **atts)
{
printf("Got element: %S\nwith userData:\n%s\n ", name, (char *)userData);
}

void endElement(void *userData, const char *name)
{
}

int main(int argc, char *argv[])
{
FILE *fp;
char *buffer;
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);

buffer = (char *)malloc(fsize+ 1);

if (buffer == NULL)
exit(2);

fread(buffer, 1, fsize, fp);

buffer[fsize] = '\0';

printf("%s\n", buffer);

fclose(fp);

parser = XML_ParserCreat e((XML_Char *)"ISO-8859-1");
XML_SetUserData (parser, &userData);
XML_SetElementH andler(parser, startElement, endElement);
do {
done = fsize < sizeof(buffer);
if (!XML_Parse(par ser, buffer, fsize, 0)) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString (XML_GetErrorCo de(parser)),
XML_GetCurrentL ineNumber(parse r));
return 1;
}
} while (!done);

XML_ParserFree( parser);

}

return 0;
}
-------------------

-----XML input-----
<?xml version="1.0" ?>
<a>
</a>

-------------------

/Jakob
Jul 20 '05 #1
4 1973
Jakob Møbjerg Nielsen wrote:
Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?

X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)

-----Source code-----
#include <stdio.h>
#include <expat.h> Not a standard header. What is in here?


void startElement(vo id *userData, const char *name, const char **atts)
{
printf("Got element: %S\nwith userData:\n%s\n ", name, (char *)userData); My understanding is that the printf() format specifiers are case
sensitive, although I'm sure somebody here will correct me if I'm
wrong.

}

void endElement(void *userData, const char *name)
{
}

int main(int argc, char *argv[])
{
FILE *fp;
char *buffer;
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);
There is no guarantee that the ending position of a file is the
same as the size of the file. Character translations and other
stuff may obscure the size. The only method to know the actual
size of the file is to open the file in binary mode and count
all the characters.


buffer = (char *)malloc(fsize+ 1); In the times when memory was small and precious, input data
was read in by "chunks" instead of the whole file into memory.
Granted, reading it into memory is the most efficient method,
there is no guarantee that your platform or the platform that
this program will run on will have enough memory for the largest
sized file. Harddisks are becoming larger these days.

I say read in the data in chunks.


if (buffer == NULL)
exit(2); You might want to be nice to the user and print a reason why
the program is aborting.

fread(buffer, 1, fsize, fp); See above about reading in chunks.

buffer[fsize] = '\0';

printf("%s\n", buffer); You are printing the enter file here. Could take a while.
Is this necessary?


fclose(fp);

parser = XML_ParserCreat e((XML_Char *)"ISO-8859-1");
XML_SetUserData (parser, &userData);
XML_SetElementH andler(parser, startElement, endElement);
do {
done = fsize < sizeof(buffer); The expression "sizeof(buffer) " returns the size of the pointer,
not the buffer. By the way, if you look up a few lines, you
will note that the buffer was allocated with a size of
"fsize + 1". So, what is this statement supposed to do?

if (!XML_Parse(par ser, buffer, fsize, 0)) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString (XML_GetErrorCo de(parser)),
XML_GetCurrentL ineNumber(parse r));
return 1;
}
} while (!done); See my question about the assignment to "done" above.
Why do you bother processing the data in chunks when
you have read the entire file into memory?

XML_ParserFree( parser);

}

return 0;
}


I cannot comment on the correctness of the XML_*()
function calls since I don't have that header file
and you haven't supplied those declarations.
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.l earn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Jul 20 '05 #2
Thomas Matthews wrote:
#include <expat.h> Not a standard header. What is in here?


Expat - the XML parser.
buffer = (char *)malloc(fsize+ 1);

I say read in the data in chunks.


Well, this is just for testing with small XML files (probably not above
1M).
printf("%s\n", buffer);

You are printing the enter file here. Could take a while.
Is this necessary?


Debugging :-)
I didn't want to start gdb just for looking at the contents of buffer.
} while (!done);

See my question about the assignment to "done" above.
Why do you bother processing the data in chunks when
you have read the entire file into memory?


Because, later on, the data will be streamed in from a socket.
I cannot comment on the correctness of the XML_*()
function calls since I don't have that header file
and you haven't supplied those declarations.


There is quite a few:
http://guinness.cs.stevens-tech.edu/...reference.html

Anyway, I've tried cleaning up a bit and played around with
feeding the parser in a "stream-like" manner, but I still
get that pesky "junk after document element" message. If I
use UTF-8 I get a "not well-formed (invalid token)".

#include <stdio.h>
#include <expat.h>

void startElement(vo id *userData, const char *name, const char **atts)
{
printf("Got start-element: %s\n", name);
}

void endElement(void *userData, const char *name)
{
printf("Got end-element: %s\n", name);
}

int main(int argc, char *argv[])
{
FILE *fp;
char buffer[1];
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
parser = XML_ParserCreat e((XML_Char *)"ISO-8859-1");
XML_SetUserData (parser, &userData);
XML_SetElementH andler(parser, startElement, endElement);
do {
if (!feof(fp)) {
buffer[0] = fgetc(fp);
if (!XML_Parse(par ser, buffer, strlen(buffer), feof(fp))) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString (XML_GetErrorCo de(parser)),
XML_GetCurrentL ineNumber(parse r));
return 1;
}
}
} while (!feof(fp));
XML_ParserFree( parser);
}
return 0;
}
--
Jakob Møbjerg Nielsen | "Nine-tenths of the universe is the
ja***@dataloger .dk | knowledge of the position and direction
http://www.jakobnielsen.dk/ | of everything in the other tenth."
| -- Terry Pratchett, Thief of Time
Jul 20 '05 #3
Examine for example elements.c Expat example file for more carefully,
copy the parsing loop (do loop) from there.

Replace only stdin with your FILE*. You might also want to open file in "rb"
(binary mode) to avoid CRLF translations.

it seems ou're trying something funny with strlen() in your code.

with respect,
Toni Uusitalo
Jul 20 '05 #4
In article <bp**********@s unsite.dk>,
Jakob Møbjerg Nielsen <ja***@dataloge r.dk> wrote:

% Expat keeps telling me that there is "junk after document element".

% if ((fp = fopen(*++argv, "r")) == NULL) {
% fprintf(stderr, "%s: Can't open %s", prog, *argv);
% exit(1);
% } else {
% fseek(fp, 0, SEEK_END);
% fsize = ftell(fp);
% rewind(fp);
%
% buffer = (char *)malloc(fsize+ 1);
%
% if (buffer == NULL)
% exit(2);
%
% fread(buffer, 1, fsize, fp);

If you're not on a Unix system, ftell() might give you a larger value than
fread() returns. You might want to check the return value of fread().

% printf("%s\n", buffer);

You might want to do a hex dump rather than just printing up to the first
NULL. If there are trailing NULLS after the last >, expat while give you
an error message.

--

Patrick TJ McPhee
East York Canada
pt**@interlog.c om
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
1619
by: Ingo Blank | last post by:
Hi, while 95% of my 'psycoed' applications run fine, it throws SIGSEGVs in conjunction with expat. Anybody noticed the same ? Python 2.3.2 (#4, Nov 13 2003, 02:10:49) on linux2 $ uname -a Linux euler 2.4.20-8 #1 Thu Mar 13 17:18:24 EST 2003 i686 athlon i386
2
3939
by: Thomas Guettler | last post by:
Hi! What are the difference between xml.parsers.expat and xml.sax? Up to now I used xml.sax.make_parser and subclass from ContentHandler. I think xml.sax.make_parser uses expat as default. Why should I want to use xml.parsers.expat?
0
1173
by: Karl Waclawek | last post by:
Our plan is to wait for a short while and re-release this version as the long-awaited Expat 2.0 if no major problems are identified. If significant problems are found, additional iterative releases will be made as fixes are made. Changes: - Fixed enum XML_Status issue (reported on SourceForge many times), so compilers that are properly picky will be happy. - Introduced an XMLCALL macro to control the calling
0
1766
by: Fabian Kr?ger | last post by:
Hello, I got a weird problem and need your help and ideas... I´ve written an php application which imports data in XML format and writes this data to a MySQL database to have a faster access. The application uses Expat 1.95.7 via php to render the xml data. First everything seemed to work fine. But now I noticed that something
0
2060
by: Chris Waddingham | last post by:
I am experiencing 2 problems with CDATA sections. These are: 1. Expat appears to be collapsing adjacent linefeeds into one inside CDATA sections. 2. Expat (XML_CharacterDataHandler) returns the wrong len value for CDATA sections containing ']'. I would be grateful of any help you can offer. My XML application contains code like this:
2
1244
by: Steve Juranich | last post by:
I'm running into problems where Python and VTK both ship with their own distribution of the Expat parser. As long as you never use the Python XML package, everything is fine. But if you try using the Python XML parser after doing an `import vtk', a nice little message saying "Segmentation Fault" is your reward. For now, the workaround is to save the `import vtk' until after I do all my XML parsing. However, we're starting to build a...
4
5569
by: Jakob Møbjerg Nielsen | last post by:
Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas? X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one) -----Source code----- #include <stdio.h> #include <expat.h>
2
3744
by: dwelch91 | last post by:
Hi, c.l.p.'ers- I am having a problem with the import of xml.parsers.expat that has gotten me completely stumped. I have two programs, one a PyQt program and one a command line (text) program that both eventually call the same code that imports xml.parsers.expat. Both give me different results... The code that gets called is (print statements have been added for debugging):
1
1883
by: vadlapatlahari | last post by:
Hi, I get the following error with Expat while configuring my application server. Can anyone suggest a solution? When i do an ldd, i get the following : $ldd Expat.so Expat.so needs: /usr/lib/libc.a(shr.o) Cannot find /unix --- Is there a problem here? /usr/lib/libcrypt.a(shr.o)
0
9700
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9564
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10546
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10310
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
7603
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6841
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5627
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4275
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2970
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.