473,385 Members | 1,542 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Splitting SAX results

Hi list,

I have a very simple SAX script from which I get results like
'Title1:Description','Title2:Description'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

class reportHandler(ContentHandler):
def __init__(self):
self.isReport = 0

def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''

def characters(self, ch):
if self.isReport:
self.reportText += ch

def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

Jun 7 '07 #1
6 1393
IamIan wrote:
I have a very simple SAX script from which I get results like
'Title1:Description','Title2:Description'. I want to split each result
on the colon, using the two resulting elements as key/value pairs in a
dictionary. I've tried a couple different approaches with lists etc,
but I keep getting an 'IndexError: list index out of range' when I go
to split the results. Probably an easy fix but it's my first hack at
SAX/XML. Thank you!
Sounds like a problem with the data to me rather than SAX.

However, SAX tends to make things much more complex than necessary, so you
loose the sight on the real problems. Try a library like ElementTree or lxml
to make your life easier. You might especially like lxml.objectify.

http://effbot.org/zone/element.htm
http://effbot.org/zone/element-iterparse.htm

http://codespeak.net/lxml/dev/
http://codespeak.net/lxml/dev/objectify.html

Stefan
Jun 7 '07 #2
Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1:Description' on the colon without getting an
IndexError. Ideas anyone?

Jun 8 '07 #3
On 6/8/07, IamIan <ia****@gmail.comwrote:
Well SAX isn't the problem... maybe I should repost this with a
different title. The SAX part works just as I want, but the results I
get back need to be manipulated. No matter what I try I can't split a
result like 'Title 1:Description' on the colon without getting an
IndexError. Ideas anyone?
I don't think you've showed us any examples of the code you're having
trouble with. I don't see anything in your original post that tries
to split strings. If you just want to know how split works, here's an
example:
>>t = 'Title1:Description'
key, value = t.split(':')
print key
Title1
>>print value
Description
>>>
If that doesn't help, show us a sample of some of the data you're
working with, what you've
tried so far, and what the end result is supposed to look like.

--
Jerry
Jun 8 '07 #4
I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.

The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>

I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):

from xml.sax import make_parser
from xml.sax.handler import ContentHandler

tracker = [] # Option 1
tracker = {} # Option 2

class reportHandler(ContentHandler):

def __init__(self):
self.isReport = 0

def startElement(self, name, attrs):
if name == 'title':
self.isReport = 1
self.reportText = ''

def characters(self, ch):
if self.isReport:
self.reportText += ch
tracker.append(ch) # Option 1
key, value = ch.split (':') # Option 2
tracker[key] = value

def endElement(self, name):
if name == 'title':
self.isReport = 0
print self.reportText

parser = make_parser()
parser.setContentHandler(reportHandler())
parser.parse('http://www.some.com/rss/')

print tracker
Option 1 returns a list with the markup included, looking like:
[u'Title1:", u'\n', u'Description ', u'\n', u'\t\t\t', u'Title2:',
u'\n', u'Description ', u'\n', u'\t\t\t', etc]

Option 2 fails with the traceback:
File "C:\test.py", line 21, in characters
key, value = ch.split(':')
ValueError: need more than 1 value to unpack

Thank you for the help!

Jun 12 '07 #5
En Tue, 12 Jun 2007 16:16:45 -0300, IamIan <ia****@gmail.comescribió:
I do know how split works, but thank you for the response. The end
result that I want is a dictionary made up of the title results coming
through SAX, looking like {'Title1: Description',
'Title2:Description'}.

The XML data looks like:
<item>
<title>Title1:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>
<item>
<title>Title2:Description</title>
<link>Link</link>
<description>Desc</description>
<author>Author</author>
<pubDate>Date</pubDate>
</item>

I've tried different approaches, a couple of which I've added to the
code below (only running one option at a time):
Forget about SAX. Use ElementTree instead

pyimport xml.etree.cElementTree as ET
pyf = open("x.xml","r")
pytree = ET.parse(f)
pyfor item in tree.getiterator('item'):
.... print item.findtext('title')
....
Title1:Description
Title2:Description

ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>

--
Gabriel Genellina

Jun 13 '07 #6
Gabriel Genellina wrote:
Forget about SAX. Use ElementTree instead
ElementTree is infinitely more flexible and easier to use.
See <http://effbot.org/zone/element-index.htm>
That's what I told him/her already :)

Rephrasing a famous word:

Being faced with an XML problem, you might think "Ok, I'll just use SAX". And
now you have two problems.

SAX is a great way to hide your real problems behind a wall of unreadable
code. If you want my opinion, lxml is currently the straightest way to get XML
work done in Python.

Stefan
Jun 20 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: qwweeeit | last post by:
Hi all, I am writing a script to visualize (and print) the web references hidden in the html files as: '<a href="web reference"> underlined reference</a>' Optimizing my code, I found that an...
6
by: Earl Anderson | last post by:
I have a A97/XP applet I've developed for my own use in my department. My boss "suggests" that since I built it, I share it with and instruct the other 6 members of my department on its use. I've...
5
by: fatted | last post by:
I'm trying to write a function which splits a string (possibly multiple times) on a particular character and returns the strings which has been split. What I have below is kind of (oh dear!)...
8
by: ronrsr | last post by:
I'm trying to break up the result tuple into keyword phrases. The keyword phrases are separated by a ; -- the split function is not working the way I believe it should be. Can anyone see what I"m...
1
by: =?Utf-8?B?R3VoYW5hdGg=?= | last post by:
Hi, Please suggest a solution for the following problem. We get a input xml that need to be processed for grouping and sorting. The example of one such xml <Requests> <Request> ......
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.