472,127 Members | 2,099 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,127 software developers and data experts.

expat parser

I have this code:

import xml.parsers.expat
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
fh=open("/home/sbassi/bioinfo/smallUniprot.xml","r")
p.ParseFile(fh)

And I get this on the output:

....
Start element: sequence {u'checksum': u'E0C0CC2E1F189B8A', u'length': u'393'}
Character data: u'\n'
Character data: u'MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRK RL'
Character data: u'\n'
Character data: u'EAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKL IH'
....
End element: sequence
....

Is there a way to have the character data together in one string? I
guess it should not be difficult, but I can't do it. Each time the
parse reads a line, return a line, and I want to have it in one
variable.

(the file is here: http://sbassi.googlepages.com/smallUniprot.xml)
May 27 '07 #1
1 1514
Sebastian Bassi wrote:
I have this code:

import xml.parsers.expat
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
fh=open("/home/sbassi/bioinfo/smallUniprot.xml","r")
p.ParseFile(fh)

And I get this on the output:

...
Start element: sequence {u'checksum': u'E0C0CC2E1F189B8A', u'length':
u'393'}
Character data: u'\n'
Character data: u'MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRK RL'
Character data: u'\n'
Character data: u'EAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKL IH'
...
End element: sequence
...

Is there a way to have the character data together in one string? I
guess it should not be difficult, but I can't do it. Each time the
parse reads a line, return a line, and I want to have it in one
variable.
Any reason you are using expat and not cElementTree's iterparse?

Stefan
May 28 '07 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Will Stuyvesant | last post: by
reply views Thread by Fabian Kr?ger | last post: by
4 posts views Thread by Jakob Møbjerg Nielsen | last post: by
4 posts views Thread by Sridhar | last post: by
4 posts views Thread by Maarten Verhage | last post: by
1 post views Thread by vadlapatlahari | last post: by
1 post views Thread by porchemasi | last post: by
reply views Thread by leo001 | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.