Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?
X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)
-----Source code-----
#include <stdio.h>
#include <expat.h>
void startElement(vo id *userData, const char *name, const char **atts)
{
printf("Got element: %S\nwith userData:\n%s\n ", name, (char *)userData);
}
void endElement(void *userData, const char *name)
{
}
int main(int argc, char *argv[])
{
FILE *fp;
char *buffer;
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;
if(argc == 1) return 0;
if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);
buffer = (char *)malloc(fsize+ 1);
if (buffer == NULL)
exit(2);
fread(buffer, 1, fsize, fp);
buffer[fsize] = '\0';
printf("%s\n", buffer);
fclose(fp);
parser = XML_ParserCreat e((XML_Char *)"ISO-8859-1");
XML_SetUserData (parser, &userData);
XML_SetElementH andler(parser, startElement, endElement);
do {
done = fsize < sizeof(buffer);
if (!XML_Parse(par ser, buffer, fsize, 0)) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString (XML_GetErrorCo de(parser)),
XML_GetCurrentL ineNumber(parse r));
return 1;
}
} while (!done);
XML_ParserFree( parser);
}
return 0;
}
-------------------
-----XML input-----
<?xml version="1.0" ?>
<a>
</a>
-------------------
/Jakob 4 1973
Jakob Møbjerg Nielsen wrote: Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?
X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one) -----Source code----- #include <stdio.h> #include <expat.h>
Not a standard header. What is in here? void startElement(vo id *userData, const char *name, const char **atts) { printf("Got element: %S\nwith userData:\n%s\n ", name, (char *)userData);
My understanding is that the printf() format specifiers are case
sensitive, although I'm sure somebody here will correct me if I'm
wrong.
}
void endElement(void *userData, const char *name) { }
int main(int argc, char *argv[]) { FILE *fp; char *buffer; char *prog = argv[0]; long fsize; XML_Parser parser; int userData = 0; int done;
if(argc == 1) return 0;
if ((fp = fopen(*++argv, "r")) == NULL) { fprintf(stderr, "%s: Can't open %s", prog, *argv); exit(1); } else { fseek(fp, 0, SEEK_END); fsize = ftell(fp); rewind(fp);
There is no guarantee that the ending position of a file is the
same as the size of the file. Character translations and other
stuff may obscure the size. The only method to know the actual
size of the file is to open the file in binary mode and count
all the characters. buffer = (char *)malloc(fsize+ 1);
In the times when memory was small and precious, input data
was read in by "chunks" instead of the whole file into memory.
Granted, reading it into memory is the most efficient method,
there is no guarantee that your platform or the platform that
this program will run on will have enough memory for the largest
sized file. Harddisks are becoming larger these days.
I say read in the data in chunks. if (buffer == NULL) exit(2);
You might want to be nice to the user and print a reason why
the program is aborting.
fread(buffer, 1, fsize, fp);
See above about reading in chunks. buffer[fsize] = '\0';
printf("%s\n", buffer);
You are printing the enter file here. Could take a while.
Is this necessary? fclose(fp);
parser = XML_ParserCreat e((XML_Char *)"ISO-8859-1"); XML_SetUserData (parser, &userData); XML_SetElementH andler(parser, startElement, endElement); do { done = fsize < sizeof(buffer);
The expression "sizeof(buffer) " returns the size of the pointer,
not the buffer. By the way, if you look up a few lines, you
will note that the buffer was allocated with a size of
"fsize + 1". So, what is this statement supposed to do?
if (!XML_Parse(par ser, buffer, fsize, 0)) { fprintf(stderr, "%s at line %d\n", XML_ErrorString (XML_GetErrorCo de(parser)), XML_GetCurrentL ineNumber(parse r)); return 1; } } while (!done);
See my question about the assignment to "done" above.
Why do you bother processing the data in chunks when
you have read the entire file into memory? XML_ParserFree( parser);
}
return 0; }
I cannot comment on the correctness of the XML_*()
function calls since I don't have that header file
and you haven't supplied those declarations.
--
Thomas Matthews
C++ newsgroup welcome message: http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.l earn.c-c++ faq: http://www.raos.demon.uk/acllc-c++/faq.html
Other sites: http://www.josuttis.com -- C++ STL Library book
Thomas Matthews wrote: #include <expat.h> Not a standard header. What is in here?
Expat - the XML parser. buffer = (char *)malloc(fsize+ 1); I say read in the data in chunks.
Well, this is just for testing with small XML files (probably not above
1M). printf("%s\n", buffer); You are printing the enter file here. Could take a while. Is this necessary?
Debugging :-)
I didn't want to start gdb just for looking at the contents of buffer. } while (!done); See my question about the assignment to "done" above. Why do you bother processing the data in chunks when you have read the entire file into memory?
Because, later on, the data will be streamed in from a socket.
I cannot comment on the correctness of the XML_*() function calls since I don't have that header file and you haven't supplied those declarations.
There is quite a few: http://guinness.cs.stevens-tech.edu/...reference.html
Anyway, I've tried cleaning up a bit and played around with
feeding the parser in a "stream-like" manner, but I still
get that pesky "junk after document element" message. If I
use UTF-8 I get a "not well-formed (invalid token)".
#include <stdio.h>
#include <expat.h>
void startElement(vo id *userData, const char *name, const char **atts)
{
printf("Got start-element: %s\n", name);
}
void endElement(void *userData, const char *name)
{
printf("Got end-element: %s\n", name);
}
int main(int argc, char *argv[])
{
FILE *fp;
char buffer[1];
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;
if(argc == 1) return 0;
if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
parser = XML_ParserCreat e((XML_Char *)"ISO-8859-1");
XML_SetUserData (parser, &userData);
XML_SetElementH andler(parser, startElement, endElement);
do {
if (!feof(fp)) {
buffer[0] = fgetc(fp);
if (!XML_Parse(par ser, buffer, strlen(buffer), feof(fp))) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString (XML_GetErrorCo de(parser)),
XML_GetCurrentL ineNumber(parse r));
return 1;
}
}
} while (!feof(fp));
XML_ParserFree( parser);
}
return 0;
}
--
Jakob Møbjerg Nielsen | "Nine-tenths of the universe is the ja***@dataloger .dk | knowledge of the position and direction http://www.jakobnielsen.dk/ | of everything in the other tenth."
| -- Terry Pratchett, Thief of Time
Examine for example elements.c Expat example file for more carefully,
copy the parsing loop (do loop) from there.
Replace only stdin with your FILE*. You might also want to open file in "rb"
(binary mode) to avoid CRLF translations.
it seems ou're trying something funny with strlen() in your code.
with respect,
Toni Uusitalo
In article <bp**********@s unsite.dk>,
Jakob Møbjerg Nielsen <ja***@dataloge r.dk> wrote:
% Expat keeps telling me that there is "junk after document element".
% if ((fp = fopen(*++argv, "r")) == NULL) {
% fprintf(stderr, "%s: Can't open %s", prog, *argv);
% exit(1);
% } else {
% fseek(fp, 0, SEEK_END);
% fsize = ftell(fp);
% rewind(fp);
%
% buffer = (char *)malloc(fsize+ 1);
%
% if (buffer == NULL)
% exit(2);
%
% fread(buffer, 1, fsize, fp);
If you're not on a Unix system, ftell() might give you a larger value than
fread() returns. You might want to check the return value of fread().
% printf("%s\n", buffer);
You might want to do a hex dump rather than just printing up to the first
NULL. If there are trailing NULLS after the last >, expat while give you
an error message.
--
Patrick TJ McPhee
East York Canada pt**@interlog.c om This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Ingo Blank |
last post by:
Hi,
while 95% of my 'psycoed' applications run fine, it throws SIGSEGVs in
conjunction with expat. Anybody noticed the same ?
Python 2.3.2 (#4, Nov 13 2003, 02:10:49)
on linux2
$ uname -a
Linux euler 2.4.20-8 #1 Thu Mar 13 17:18:24 EST 2003 i686 athlon i386
|
by: Thomas Guettler |
last post by:
Hi!
What are the difference between xml.parsers.expat
and xml.sax?
Up to now I used xml.sax.make_parser and subclass
from ContentHandler.
I think xml.sax.make_parser uses expat as default.
Why should I want to use xml.parsers.expat?
|
by: Karl Waclawek |
last post by:
Our plan is to wait for a short while and re-release this version
as the long-awaited Expat 2.0 if no major problems are identified.
If significant problems are found, additional iterative releases
will be made as fixes are made.
Changes:
- Fixed enum XML_Status issue (reported on SourceForge many
times), so compilers that are properly picky will be happy.
- Introduced an XMLCALL macro to control the calling
|
by: Fabian Kr?ger |
last post by:
Hello,
I got a weird problem and need your help and ideas...
I´ve written an php application which imports data in XML format and
writes this data to a MySQL database to have a faster access.
The application uses Expat 1.95.7 via php to render the xml data.
First everything seemed to work fine. But now I noticed that something
|
by: Chris Waddingham |
last post by:
I am experiencing 2 problems with CDATA sections. These are:
1. Expat appears to be collapsing adjacent linefeeds into one inside CDATA
sections.
2. Expat (XML_CharacterDataHandler) returns the wrong len value for CDATA
sections containing ']'.
I would be grateful of any help you can offer.
My XML application contains code like this:
| |
by: Steve Juranich |
last post by:
I'm running into problems where Python and VTK both ship with their
own distribution of the Expat parser. As long as you never use the
Python XML package, everything is fine. But if you try using the
Python XML parser after doing an `import vtk', a nice little message
saying "Segmentation Fault" is your reward.
For now, the workaround is to save the `import vtk' until after I do
all my XML parsing. However, we're starting to build a...
|
by: Jakob Møbjerg Nielsen |
last post by:
Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?
X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)
-----Source code-----
#include <stdio.h>
#include <expat.h>
|
by: dwelch91 |
last post by:
Hi, c.l.p.'ers-
I am having a problem with the import of xml.parsers.expat that has
gotten me completely stumped. I have two programs, one a PyQt program
and one a command line (text) program that both eventually call the
same code that imports xml.parsers.expat. Both give me different
results...
The code that gets called is (print statements have been added for
debugging):
|
by: vadlapatlahari |
last post by:
Hi,
I get the following error with Expat while configuring my application server. Can anyone suggest a solution?
When i do an ldd, i get the following :
$ldd Expat.so
Expat.so needs:
/usr/lib/libc.a(shr.o)
Cannot find /unix --- Is there a problem here?
/usr/lib/libcrypt.a(shr.o)
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |