472,096 Members | 2,204 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,096 software developers and data experts.

How to parse a incomplete xml file

I have to write a program where we get the xml file incrementally. i.e the file is constantly updated in the form of xml feed. Thus the first chunk will not be a well formed xml document. However i need to parse the feeds regularly and display the parsed result.

How can this be achieved.?

Eg...

xml field can be following

First chunk is as follows: We can see it is not well formed
Expand|Select|Wrap|Line Numbers
  1. <xml> 
  2. <results> 
  3. <result1> 
  4. </result1? 
  5. <result2> 
  6. </result2> 
The next chunk will be

Expand|Select|Wrap|Line Numbers
  1. <result3> 
  2. </result3> 
  3. <result4> 
  4. </result4> 
  5. </results> 
  6. </xml> 
Please help!! I would be greatful if you could provide me with sample code or some links which i can follow.

TIA
Jan 13 '09 #1
11 7976
Dormilich
8,658 Expert Mod 8TB
any reasonable xml parser will throw an error if the document is not well-formed. however, there are ways to extend an xml file with additional data (I'm speaking of DOM) without ever using invalid xml, although that depends on the structure of the chunks.

I recommend changing the update process, since xml processing will be much easier on well formed xml.

if there is absolutely no other way, you can try to parse the file tag-wise (like the PHP xml_parse() function).

regards
Jan 13 '09 #2
@Dormilich

Hi Dormilich,

The problem is the xml feed is retrieved from the server and is constantly updated on the user screen. So i have to parse the given feed and then update the user screen as and when the xml feed is got.

Can you provide me with the sample code to do the same?
Jan 13 '09 #3
Dormilich
8,658 Expert Mod 8TB
if you get a feed from a server, it is mostly well-formed. in which way is the feed processed to show up on screen?
Jan 13 '09 #4
@Dormilich
Hi,
A well formed xml document is got in chunks and when we get all the chunks we get a well formed xml document. But it can happen that we get a chunk and there will be a delay before we get another chunk of data. But the initial chunk received has to be parsed. Currently for parsing a well formed xml body i am using libxml2.
Jan 13 '09 #5
Dormilich
8,658 Expert Mod 8TB
@praveenss
do you have to parse the chunk immediately? otherwise I'd wait until the document is complete.

@praveenss
If you must parse the chunk, consider sending chunks that are well formed. although it will result in more program code when it comes to putting the document together...
Jan 13 '09 #6
@Dormilich


Yes exaxtly. I have to parse the chunk immediately. As i mentioned the entire document is got as follows

First chunk is:
Expand|Select|Wrap|Line Numbers
  1. <result>
  2. <result1> .... </result1>
  3. <result2>...... </result2>
Next chunk is :
Expand|Select|Wrap|Line Numbers
  1. <result3>....</result3>
  2. <result4> .... </result4>
Finally the feed is terminated

Expand|Select|Wrap|Line Numbers
  1. </result>

If we concatenate the chunks then we get a well formed xml. But the problem arises that we get the xml in chunks and with different delays.
Jan 13 '09 #7
Dormilich
8,658 Expert Mod 8TB
@praveenss
then there will be no other option than to parse it peace-by-piece. that is, treating the chunks as string and applying string functions to get the data you need.

I still recommend sending the chunks as well-formed xml (unless that's impossible).
Jan 13 '09 #8
jkmyoung
2,057 Expert 2GB
You should be able to use an event driven XML parser, (eg SAX). If you treat the incoming data as a stream as opposed to one file it might be possible to do. The trick would be dealing with the EOF token, ignoring it so that the parser does not throw an error. Then you'd have to route the new input into the stream as it comes.
Jan 13 '09 #9
@jkmyoung

Can you please tell me how can we route the new input to the stream. The SAX parser xmlSAXUserParseMemory API can be called for every chunk. But the final output is the parsed content for the complete feed.

The requirement is that i need to refresh the UI with the parsed content of every chunk as and when it arrives.
It would be of great help if you provide some sample code or some link to follow.
TIA
Jan 14 '09 #10
jkmyoung
2,057 Expert 2GB
What language are you using to do this? You would be better asking the streaming question in the specific language forum, (eg Java? C++?).

You will need to specify exactly how you are getting these file fragments, specifically what format/class they are in, and the type of Stream or Reader class you want to end up with.
Jan 14 '09 #11
@jkmyoung
I am using C language!
Jan 15 '09 #12

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

10 posts views Thread by Stuart Rogers | last post: by
7 posts views Thread by Nova's Taylor | last post: by
2 posts views Thread by Mauricio Correa | last post: by
2 posts views Thread by Jan | last post: by
10 posts views Thread by Michael B. Trausch | last post: by
29 posts views Thread by gs | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.