473,383 Members | 1,880 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

Parsing XML streams

I have a program that listens on an IRC channel and logs everything to
XML on standard output. The format of the XML is pretty
straightforward, looking like this:

<channel name='#sandbox'>
<message user='PeterScott'>Hello, my bot</message>
<message user='PeterScott'>This is a message</message>
<nickchange>
<oldnick>PeterScott</oldnick>
<newnick>PeterSc</newnick>
</nickchange>
</channel>

I'm writing another program that should parse that sort of XML on its
stdin, printing out a more user-friendly representation. For this, I
need to parse the XML as it comes in, not all at once.

I wrote a parser using xml.sax, and it works well---provided that I
read in the whole document. However, I want to be able to just read
the XML piece by piece, calling event handlers whenever something
happens and waiting for more to happen.

Is there some way to do this with the standard python xml parsers?
Will I need to use PyXML? Or what?

Thanks,
-Peter
Jul 18 '05 #1
2 3378
On Thu, 11 Sep 2003 16:30:18 -0700, Peter Scott wrote:
Is there some way to do this with the standard python xml parsers?
Will I need to use PyXML? Or what?


xml.parsers.expat can parse things in pieces. It shouldn't be *too* much
work to convert over.
Jul 18 '05 #2
Peter Scott wrote:
I'm writing another program that should parse that sort of XML on its
stdin, printing out a more user-friendly representation. For this, I
need to parse the XML as it comes in, not all at once.


Peter,

Check out the IncrementalParser class in the library module

Lib/xml/sax/xmlreader.py

This extension of the standard XMLReader class acts just like a SAX
parser, in that it delivers SAX2 events to your ContentHandler as it
processes the tokens from the source XML document.

But rather than the parser itself controlling when and how it gets its
input, you control that through the use of the .feed() method. So you
can "drip feed" the parser with input if you wish.

Not all XML parsers support an IncrementalParser interface. In order
for an XML parser to support incremental parsing, it must have been
coded specifically to do so. Fortunately, the expat wrapper supplied
with the base distribution of python does support incremental parsing.

Which I think should solve your problem quite nicely. When you start
up your process for the first time, feed() the IncrementalParser a
document element (all XML document must have one and only one document
element). Then simply feed the output of your logging stream directly
to the IncrementalParser, as and when you receive it.

You should not have any problems with XML tokens being split over two
different .feed() calls either. For example, this should work just
fine

ip = IncrementalParser()
ip.feed('<docu')
ip.feed('ment')
ip.feed('/>')

When your logging stream is closing, simply feed a close tag for your
document element to your IncrementalParser, and everything will clean
up nicely.

Here is some sample code:

#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
import xml.sax
from xml.sax.handler import ContentHandler

logentry = """
<channel name='#sandbox'>
<message user='PeterScott'>Hello, my bot</message>
<message user='PeterScott'>This is a message</message>
<nickchange>
<oldnick>PeterScott</oldnick>
<newnick>PeterSc</newnick>
</nickchange>
</channel>
"""

incr_parser = xml.sax.make_parser('xml.sax.expatreader')
incr_parser.setContentHandler(ContentHandler())
incr_parser.feed('<mylogstream>')
incr_parser.feed(logentry)
incr_parser.feed('</mylogstream>')
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

regards,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Ronald Legere | last post by:
The new itertools stuff is pretty cool. One thing that bothers me though is that there seems to be no way to copy an iterator. How does one work around this? Is there a trick that I am missing? ...
3
by: Tron Thomas | last post by:
What does binary mode for an ofstream object do anyway? Despite which mode the stream uses, operator << writes numeric value as their ASCII representation. I read on the Internet that it is...
2
by: The Determinant | last post by:
Hi, I need to write and read (parse) a stream of bytes from a file. The data will be stored/read with C++'s read/write file functions. The data will be tagged and store in an XML like format. Is...
8
by: bonj | last post by:
hello I hope somebody can help me get my head around this area of 'stream' programming... I know that streams are very fashionable nowadays so hopefully there'll be lots of replies. ;-) ...
11
by: Kobu | last post by:
I have a question about C's abstract "streams" (that I can't seem to FULLY understand from reading several tutorials). Streams seems to suggest that input can be treated continously if needed....
2
by: Peter Rilling | last post by:
One nice thing about collections and arrays is that they implement the IEnumerator and IEnumerable interfaces which allow for more then one iterator to walk the list of items without affecting the...
2
by: bonk | last post by:
Hello how do I connect streams in c# ? Imagine the followung scenario: I have a StreamWriter that writes Text to a Stream. How can I tell that Stream to pass that Data to another Stream...
1
by: Chris | last post by:
I'm reading up on streams and I have two articles that seem to conflict with each other. One article describes streams and lists a few of the major ones (FileStream, Memory Stream, Network...
8
by: Eric Anderson | last post by:
I have some files that sit on a FTP server. These files contain data stored in a tab-separated format. I need to download these files and insert/update them in a MySQL database. My current basic...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.