473,394 Members | 1,735 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

XML parsing with streamed XML

I am sending continuous stream of XML like this :
_____________________

<stream>
<balise>
test
</balise>
<balise>
test2
</balise>

[..........etc..]

</stream>
_____________________

as reading this stream (in a string builder), i need to extract the
<balise> tag in order to get this :
<balise>
test
</balise>
and remove it from the stream as we parse
exemple :

_____________________

<stream>
<balise>
test
</balise>
<balise> [waiting for the rest of the stream...]

_____________________
with this in the stream, we should extract :
<balise>
test
</balise>
so at the end we have :
_____________________

<stream>
<balise> [waiting for the rest of the stream...]

_____________________
And so on as we read the stream !
The thing is, i can receive this stream byte after byte, or more...

Currently, i am using RegEx.
But its a bit tricky with the CDATA
For exemple if we have something like this in our stream :
<balise ><![CDATA[<balise ></balise >
We should be waiting for the real end of <balise >, but i cant do it
with regex (or you have tips maybe ?)

So i thought maybe using some XMLreader, or xmlstreamreader or
whatever...

I need the fastest processing solution

Thanks

Mar 5 '06 #1
7 2873
gi******@gmail.com <gi******@gmail.com> wrote:
I am sending continuous stream of XML like this :
<snip>
And so on as we read the stream !
The thing is, i can receive this stream byte after byte, or more...

Currently, i am using RegEx.
But its a bit tricky with the CDATA
For exemple if we have something like this in our stream :
<balise ><![CDATA[<balise ></balise >
We should be waiting for the real end of <balise >, but i cant do it
with regex (or you have tips maybe ?)

So i thought maybe using some XMLreader, or xmlstreamreader or
whatever...

I need the fastest processing solution


XmlReader would certainly be the way to go for simplicity. When you say
you need the fastest processing solution - presumably you only need it
to go *acceptably* fast - it's very rare that you need the absolutely
fastest solution. Such a solution would almost certainly involve
writing your own custom parsing code which would be *extremely*
complicated.

I would strongly recommend trying XmlReader and seeing whether that's
good enough for your needs. I suspect it will be.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 5 '06 #2
ok for Xml Reader, but how?
i dont want to be testing for xml validation each time i receive a new
byte in the stream...

and i need reasonably fast, which means i dont want to be creating xml
objects everytime by exemple... lets just say i need Efficient ;), as i
ll be processing a lot of data, on multiple streams

Thanks Jon

Mar 5 '06 #3
gi******@gmail.com <gi******@gmail.com> wrote:
ok for Xml Reader, but how?
Well, we'd need more information about what you need to do in order to
give you sample code, but the normal thing is to create an XmlReader
(of some description, eg XmlTextReader) and then just let it read nodes
as you ask for them.
i dont want to be testing for xml validation each time i receive a new
byte in the stream...
Well, for one thing you can turn some of the validation off.
and i need reasonably fast, which means i dont want to be creating xml
objects everytime by exemple... lets just say i need Efficient ;), as i
ll be processing a lot of data, on multiple streams


You can give the XmlTextReader a stream of data. You don't need to
manually feed it each individual byte, although it will have to process
each byte in turn.

You *will* end up creating XML objects, but they're likely to be
short-lived unless you actually *need* to hold onto them for a long
time. You should definitely try the simple solution, see whether it
performs well enough for you. I'm sure you'll find it performs a lot
quicker than using regular expressions!

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 5 '06 #4
alright...

but how can i validate something like that :
// Begining of the stream
<stream>
<balise>
test
</balise>
<balise>

All i want here in this exemple, is to extract
<balise>
test
</balise>

Actually, i do this by using regex, but if you tell me that using
XMLReader would be faster, then i have to find a way to do what i want
(extracting tags in an unfinished xml file) with XML objects.

More precisions : the stream will evolve as we receive new bytes
exemple :
________
Step 1
<stream>
________
Step 2
<stream>
<balis
________
Step 3
<stream>
<balise>
test
________
Step 4
<stream>
<balise>
test
</balise>
<balise>
________
etc...

We should test at every step of the stream, and in this exemple, we
should be able to extract something (the 1st <balise>) on Step 4 only.

I hope i am clear enough, thank you for your help.

Mar 5 '06 #5
gi******@gmail.com <gi******@gmail.com> wrote:
alright...

but how can i validate something like that :
// Begining of the stream
<stream>
<balise>
test
</balise>
<balise>

All i want here in this exemple, is to extract
<balise>
test
</balise>
So you ask the XmlReader for the next node (from the start) and it will
return the <stream> element. After that, you'll keep asking for nodes,
keeping track of any text nodes you're given (the "test" part here) and
when you see an element which is an end "balise" element, do whatever
processing you need.
Actually, i do this by using regex, but if you tell me that using
XMLReader would be faster, then i have to find a way to do what i want
(extracting tags in an unfinished xml file) with XML objects.
<snip>
We should test at every step of the stream, and in this exemple, we
should be able to extract something (the 1st <balise>) on Step 4 only.

I hope i am clear enough, thank you for your help.


I suggest you read up on XmlTextReader, including the examples in MSDN.
I'm sure you'll find it useful.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 5 '06 #6
good, i am looking at XmlTextReader, it seems to do the trick (once i
completely figured how it actually works :p )

But i am still wondering if it is able to handle this :

<balise ><![CDATA[</balise > <---- here we are supposed to wait for
the REAL end "</balise>"

Its an extreme case, unlikely to happen, but well, you are never too
carefull !

I have to test it to check, but as i am still discovering i will try
this later.

Have a good day and thanks for the help !

Mar 5 '06 #7
gi******@gmail.com <gi******@gmail.com> wrote:
good, i am looking at XmlTextReader, it seems to do the trick (once i
completely figured how it actually works :p )

But i am still wondering if it is able to handle this :

<balise ><![CDATA[</balise > <---- here we are supposed to wait for
the REAL end "</balise>"

Its an extreme case, unlikely to happen, but well, you are never too
carefull !


XmlTextReader is a proper XML parser - it should cope fine with it. As
you say, the best way is to test it though :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 5 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: RandyB | last post by:
I wrote a web app that creates a Word XML file on the server. There is a "Word.Document" processing instruction in the XML to tell it that this XML should be handled by Word, not IE. But when a...
5
by: gregrjones | last post by:
I'm a musician who has a web site. I'd like to provide visitors of my web site with the means of streaming music off of my web site as one would if they were listening to a radio station. ...
0
by: Chris Hoare | last post by:
I am writing a prototype data collection application; teh data for this project is being streamed in from a TCP port. Example of the XM...
0
by: Randy | last post by:
Hi, is it possible (and how? ) to write an application, where I can get some informations like "artist" or "title" from a streamed file, which was played by a WindowsMediaPlayer ActiveX, which...
4
by: eSolTec, Inc. 501(c)(3) | last post by:
Thank you in advance for any and all assistance. It is greatly appreciated. I am working with Plimus for licensing my software. I can communicate with the server and I'm getting responses in XML....
1
by: MR | last post by:
i need to parse data that is presented in a JSP file. i am trying to get a handle on how to approach this. is there is way to parse this directly? do i need to convert it to some other format...
0
by: =?Utf-8?B?TWFya3VzU3Ryb2Js?= | last post by:
Hi! I'm currently developing a file upload and -downloadservice based on wcf. As the files transfered are very large (~300MB) i decided to use the streamed transfer mode. The communication...
3
by: =?Utf-8?B?Um9nZXIgTWFydGlu?= | last post by:
Note: My apologies for repeating this post from last week, but my nospam alias and profile account were incorrect. I think I have fixed this, so hopefully this post will trigger MS into a response...
5
by: arunairs | last post by:
Hi, How would one parse aspx pages? Is there an aspx parser available? I need to access the individual controls in an aspx page and parse them. thanks, Arun
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.