By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,285 Members | 1,659 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,285 IT Pros & Developers. It's quick & easy.

expat parser

P: n/a
I have this code:

import xml.parsers.expat
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
fh=open("/home/sbassi/bioinfo/smallUniprot.xml","r")
p.ParseFile(fh)

And I get this on the output:

....
Start element: sequence {u'checksum': u'E0C0CC2E1F189B8A', u'length': u'393'}
Character data: u'\n'
Character data: u'MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRK RL'
Character data: u'\n'
Character data: u'EAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKL IH'
....
End element: sequence
....

Is there a way to have the character data together in one string? I
guess it should not be difficult, but I can't do it. Each time the
parse reads a line, return a line, and I want to have it in one
variable.

(the file is here: http://sbassi.googlepages.com/smallUniprot.xml)
May 27 '07 #1
Share this Question
Share on Google+
1 Reply


P: n/a
Sebastian Bassi wrote:
I have this code:

import xml.parsers.expat
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
fh=open("/home/sbassi/bioinfo/smallUniprot.xml","r")
p.ParseFile(fh)

And I get this on the output:

...
Start element: sequence {u'checksum': u'E0C0CC2E1F189B8A', u'length':
u'393'}
Character data: u'\n'
Character data: u'MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRK RL'
Character data: u'\n'
Character data: u'EAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKL IH'
...
End element: sequence
...

Is there a way to have the character data together in one string? I
guess it should not be difficult, but I can't do it. Each time the
parse reads a line, return a line, and I want to have it in one
variable.
Any reason you are using expat and not cElementTree's iterparse?

Stefan
May 28 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.