473,407 Members | 2,359 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

How to parse a incomplete xml file

I have to write a program where we get the xml file incrementally. i.e the file is constantly updated in the form of xml feed. Thus the first chunk will not be a well formed xml document. However i need to parse the feeds regularly and display the parsed result.

How can this be achieved.?

Eg...

xml field can be following

First chunk is as follows: We can see it is not well formed
Expand|Select|Wrap|Line Numbers
  1. <xml> 
  2. <results> 
  3. <result1> 
  4. </result1? 
  5. <result2> 
  6. </result2> 
The next chunk will be

Expand|Select|Wrap|Line Numbers
  1. <result3> 
  2. </result3> 
  3. <result4> 
  4. </result4> 
  5. </results> 
  6. </xml> 
Please help!! I would be greatful if you could provide me with sample code or some links which i can follow.

TIA
Jan 13 '09 #1
11 8310
Dormilich
8,658 Expert Mod 8TB
any reasonable xml parser will throw an error if the document is not well-formed. however, there are ways to extend an xml file with additional data (I'm speaking of DOM) without ever using invalid xml, although that depends on the structure of the chunks.

I recommend changing the update process, since xml processing will be much easier on well formed xml.

if there is absolutely no other way, you can try to parse the file tag-wise (like the PHP xml_parse() function).

regards
Jan 13 '09 #2
@Dormilich

Hi Dormilich,

The problem is the xml feed is retrieved from the server and is constantly updated on the user screen. So i have to parse the given feed and then update the user screen as and when the xml feed is got.

Can you provide me with the sample code to do the same?
Jan 13 '09 #3
Dormilich
8,658 Expert Mod 8TB
if you get a feed from a server, it is mostly well-formed. in which way is the feed processed to show up on screen?
Jan 13 '09 #4
@Dormilich
Hi,
A well formed xml document is got in chunks and when we get all the chunks we get a well formed xml document. But it can happen that we get a chunk and there will be a delay before we get another chunk of data. But the initial chunk received has to be parsed. Currently for parsing a well formed xml body i am using libxml2.
Jan 13 '09 #5
Dormilich
8,658 Expert Mod 8TB
@praveenss
do you have to parse the chunk immediately? otherwise I'd wait until the document is complete.

@praveenss
If you must parse the chunk, consider sending chunks that are well formed. although it will result in more program code when it comes to putting the document together...
Jan 13 '09 #6
@Dormilich


Yes exaxtly. I have to parse the chunk immediately. As i mentioned the entire document is got as follows

First chunk is:
Expand|Select|Wrap|Line Numbers
  1. <result>
  2. <result1> .... </result1>
  3. <result2>...... </result2>
Next chunk is :
Expand|Select|Wrap|Line Numbers
  1. <result3>....</result3>
  2. <result4> .... </result4>
Finally the feed is terminated

Expand|Select|Wrap|Line Numbers
  1. </result>

If we concatenate the chunks then we get a well formed xml. But the problem arises that we get the xml in chunks and with different delays.
Jan 13 '09 #7
Dormilich
8,658 Expert Mod 8TB
@praveenss
then there will be no other option than to parse it peace-by-piece. that is, treating the chunks as string and applying string functions to get the data you need.

I still recommend sending the chunks as well-formed xml (unless that's impossible).
Jan 13 '09 #8
jkmyoung
2,057 Expert 2GB
You should be able to use an event driven XML parser, (eg SAX). If you treat the incoming data as a stream as opposed to one file it might be possible to do. The trick would be dealing with the EOF token, ignoring it so that the parser does not throw an error. Then you'd have to route the new input into the stream as it comes.
Jan 13 '09 #9
@jkmyoung

Can you please tell me how can we route the new input to the stream. The SAX parser xmlSAXUserParseMemory API can be called for every chunk. But the final output is the parsed content for the complete feed.

The requirement is that i need to refresh the UI with the parsed content of every chunk as and when it arrives.
It would be of great help if you provide some sample code or some link to follow.
TIA
Jan 14 '09 #10
jkmyoung
2,057 Expert 2GB
What language are you using to do this? You would be better asking the streaming question in the specific language forum, (eg Java? C++?).

You will need to specify exactly how you are getting these file fragments, specifically what format/class they are in, and the type of Stream or Reader class you want to end up with.
Jan 14 '09 #11
@jkmyoung
I am using C language!
Jan 15 '09 #12

Sign in to post your reply or Sign up for a free account.

Similar topics

10
by: Stuart Rogers | last post by:
I have just setup my website with a new linux hosting service. I have copied over my scripts and the jpgraph (the stable version) files from my working local lan linux server. The hosting service...
7
by: Nova's Taylor | last post by:
Hi folks, I am a newbie to Python and am hoping that someone can get me started on a log parser that I am trying to write. The log is an ASCII file that contains a process identifier (PID),...
4
by: oliver.lin | last post by:
In my simple test code, I tried to define my constructor outside of the class declaration headr file. The header file: file_handler.h ============================================================...
2
by: Vittal | last post by:
Hello All, I am trying to compile my application on Red Hat Linux 8 against gcc 3.2.2. Very first file in application is failing to compile. I tried compiling my application on Linux 7.2...
2
by: Mauricio Correa | last post by:
Hello, i try to consume a java web service from a asp .net page, i make without problems the web reference tu wsdl file http://machine/WebService/nameWebService?WSDL in the asp .net page put the...
2
by: Jan | last post by:
In a 2-page order form, not all applicants will complete it due to field validations. If I want to see data from incomplete orders, can this be done using some sort of session control so that each...
10
by: Michael B. Trausch | last post by:
Alright... I am attempting to find a way to parse ANSI text from a telnet application. However, I am experiencing a bit of trouble. What I want to do is have all ANSI sequences _removed_ from...
29
by: gs | last post by:
let say I have to deal with various date format and I am give format string from one of the following dd/mm/yyyy mm/dd/yyyy dd/mmm/yyyy mmm/dd/yyyy dd/mm/yy mm/dd/yy dd/mmm/yy mmm/dd/yy
50
by: Juha Nieminen | last post by:
I asked a long time ago in this group how to make a smart pointer which works with incomplete types. I got this answer (only relevant parts included): ...
5
by: praveenss | last post by:
I have to write a program where we get the xml file incrementally. i.e the file is constantly updated in the form of xml feed. Thus the first chunk will not be a well formed xml document. However i...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.